HOW IS MACHINE LEARNING UTILIZED FOR TIME SERIES FORECASTING?

摘要： Time series forecasting is one of the key topics of machine learning. The fact that so many prediction issues have a temporal component makes it crucial. In contrast to many other prediction tasks, time series issues are more challenging since the time component contributes more information.

images/20221126_1.jpg

▲圖片標題(來源：dataconomy.com)

WHAT IS TIME SERIES FORECASTING?

Time series forecasting is employed in various sectors, including finance, supply chain management, production, and inventory planning, making it one of the most widely used data science approaches. Time series forecasting has many applications, including resource allocation, business planning, weather forecasts, and stock price prediction.

The machine learning-based predictive models were widely used in time series projects needed by several enterprises to facilitate the prediction of the allocation of time and resources. This post shares our perspective on working on deep learning projects for time series forecasting.

MACHINE LEARNING TIME SERIES FORECASTING APPLICATIONS

Time series forecasting can be used by any business or organization dealing with continuously generated data and the requirement to adjust to operational shifts and changes. Here, machine learning acts as the greatest enabler, improving our ability to:

Web traffic forecasting: In order to forecast online traffic rates during certain periods, common data on typical traffic rates among competing websites is combined with input data on traffic-related trends.

Sales and demand forecasting: Customer behavior pattern data, in combination with inputs from purchase history, demand history, seasonal influence, etc., enables machine learning models to identify the most demanded items and pinpoint their placement in the dynamic market.

Weather prediction: Time-based data is routinely collected from a variety of globally networked weather stations, and machine learning approaches enable in-depth analysis and interpretation of the data for future forecasts based on statistical dynamics.

Stock price forecasting: In order to make accurate forecasts of the most likely impending stock price movements, one can integrate historical stock price data with information on regular and atypical spikes and decreases in the stock market.

Economic and demographic forecasting: Demographics and economics have a ton of statistical data that can be utilized to forecast time series data effectively. Consequently, the ideal target market can be determined, and the most effective strategies to communicate with that specific TA may be developed.

Academics: The concepts of machine learning and deep learning greatly speed up the processes of refining and launching scientific ideas. For instance, scientific data that must go through infinite analytical cycles may be analyzed considerably more quickly with machine learning patterns.

TIME SERIES FORECASTING IN MACHINE LEARNING

Reviewing what time series, time series analysis, and time series forecasting are indicating before moving on is crucial.

A time series forecasting process is a collection of observations made over time, whether daily, weekly, monthly, or annually. To characterize the observed time series and comprehend the “why” underlying its dataset, time series analysis entails creating models. This includes making predictions and interpretations based on the available facts. The best-fitting model is used in time series forecasting to anticipate future observations based on carefully processed current and historical data.

images/20221126_2.jpg

▲In order to use an appropriate deep learning model for time series forecasting, it is crucial to comprehend the elements of the time series data(來源：dataconomy.com)

Time series analysis forecasting using machine learning was shown to be the most successful in identifying patterns in both structured and unstructured data.

In order to use an appropriate deep learning model for time series forecasting, it is crucial to comprehend the elements of the time series data:

Cyclicity: To locate the recurring variations in a time series and to determine their cyclical nature.

Trends: To describe the rising or falling patterns of time series that are typically shown in linear modes.

Seasonality: To draw attention to the recurring behavior cycles across time.

Noise: To consider the non-systematic element of time series that deviates from the typical model values.

TIME SERIES FORECASTING MODELING

There are many techniques used in time series forecasting that try to achieve precision and reduce mistakes and losses. Still, several classical and contemporary machine learning techniques have demonstrated their efficacy and computational use. We shall discuss several different forms of time series analysis below.

BEST MACHINE LEARNING MODELS FOR TIME SERIES FORECASTING

For time series forecasting, a variety of models can be utilized. For instance, the LSTM Network is a unique type of neural network that makes predictions based on historical data. It is widely used for many things, including time series analysis and language recognition. By adding a series of delays to the input, models like the random forest, gradient boosting regressor, and time delay neural networks may include temporal information and represent the data at various periods in time. TDNNs are built as feedforward neural networks rather than recurrent neural networks because of their sequential nature.

NAÏVE MODEL

Naïve models are often implemented as a random walk and a seasonal random walk, with the most recent value observed serving as the unit for the forecast for the following period (a forecast is made using a value from the same time period as the most recent observation).

EXPONENTIAL SMOOTHING MODEL

An exponential smoothing time series forecasting technique can be expanded to support data with a systematic trend or seasonal component. It is a potent forecasting technique that can be employed in place of the well-known Box-Jenkins ARIMA family of techniques.

images/20221126_3.jpg

▲The machine learning-based predictive models were widely used in time series projects needed by several enterprises to facilitate the prediction of the allocation of time and resources (來源：dataconomy.com)

ARIMA/SARIMA

The acronym ARIMA stands for the combination of Autoregressive (AR) and Moving Average (MA) methods when creating a composite time series model. Seasonal and trend parameters are included in ARIMA models (for example, dummy variables for weekdays and their ability to differentiate). Additionally, they permit the incorporation of moving averages and autoregressive terms to handle the underlying autocorrelation in the data.

Seasonal Autoregressive Integrated Moving Average, or SARIMA, broadens the use of the ARIMA by integrating a linear mixture of past seasonal values and/or forecast errors.

LINEAR REGRESSION METHOD

Predictive modeling is frequently done using the straightforward statistical method known as linear regression. When it comes down to the bare essentials, it comes down to supplying an equation of independent variables on which our goal variable is based.

MULTI-LAYER PERCEPTRON (MLP)

The term “MLP” is used ambiguously; sometimes, it is used broadly to refer to any feedforward ANN and other times, it is used specifically to describe networks made up of several layers of perceptrons.

RECURRENT NEURAL NETWORK (RNN)

RNNs are essentially memory-enhanced neural networks that can anticipate time-dependent targets. Recurrent neural networks can remember the state of the input that was previously acquired to decide on the next time step. Recurrent Networks have recently seen a number of modifications to be applied to many fields.

LONG SHORT-TERM MEMORY (LSTM)

LSTM cells (special RNN cells) were created to find a solution to the gradient problem by providing the model with several gates to choose from. These gates let the model decide what information to identify as meaningful and what information to ignore. Another kind of gated recurrent network is the GRU.

In addition to the techniques stated above, CNNs, also known as convolutional neural network models, decision tree-based models like Random Forest, and versions of gradient boosting (LightGBM, CatBoost, etc.) can be used for time series forecasting.

images/20221126_4.jpg

▲It is important to note that it is not always possible to determine visually which machine learning model is the most accurate (來源：dataconomy.com)

KAGGLE

It is possible to perform effective web traffic time series forecasting in the coding and data processing environment Kaggle. This engine has technical features that a large group of enthusiasts has added throughout the years. This makes it a useful tool for dealing with the problem of multiple time series future value prediction.

LIGHTGBM

This is a popular machine learning technique that is primarily concerned with identifying intricate patterns in tabular datasets. As a result, sales data estimates are extremely accurate. When it comes to creating tabular-based forecasts, LightGBM occasionally performs better than the traditional ARIMA method.

DECISION TREES

Decision Trees based on machine learning are used to categorize things (products) in the database. Generated classes receive specific multivariate time series models that aid in forecasting an item’s future pricing. This one is obviously the best for analysis used for business.

XGBOOST

The machine learning technique being used here operates on tabular and structured data. Gradient-boosted decision trees are at the heart of it. Time series datasets must be converted into supervised learning problems to be used with XGBoost.

ADABOOST

Many people consider this kind of forecasting algorithm to be the finest out-of-the-box classifier. This means that it works best when combined with other effective algorithms to elaborate data classifications. For instance, when combined with decision trees, it gradually learns to identify the data instances that are the most difficult to classify.

HOW TO EVALUATE THE MODEL’S ACCURACY?

It is important to note that it is not always possible to determine visually which machine learning model is the most accurate.

When comparing the overall forecast accuracy of several time series forecasting models, the method of calculating the MAPE (Mean Absolute Percent Error) yields the best results.

The metrics show the average absolute error percentage of the occurrence. The general principle of calculations for evaluating model accuracy is the following: the better the forecast accuracy, the lower the MAPE.

THE PROCESS OF A TIME SERIES FORECASTING PROJECT

The following measures are being taken to deploy deep learning for time series forecasting to prevent any negative effects and guarantee the project’s success in terms of creating the predictive time model.

DEFINING THE PROJECT GOAL

Make sure you comprehend the subjective before going into detail about the project. It entails comprehending the particulars of the forecast operation business domain, including terminology and important definitions, as well as typical business models relevant to the specific domain. Therefore, this step necessitates thoroughly researching the subject matter to define the project’s specifics.

DATA EXPLORATION

By defining the fundamentals, you can clearly see how much data you need to gather to assist in the future discovery of data insights. The domain knowledge acquisition achieves the level necessary for strategic data exploration and estimating hinges and trends for further analyzing the variations volume with the implementation of generating plot graphs and visualization charts. Additionally, it aids in defining the forecasting task and successfully completing the initial exploratory investigation.

images/20221126_5.jpg

▲Time series forecasting has many potential applications, including resource allocation, business planning, weather forecasts, and stock price prediction (來源：dataconomy.com)

DATA PREPARATION

The development team is currently cleaning data to find key insights and further remove important variables. Launching the feature engineering data preparation procedure. Targeting the domain knowledge areas that are essential for creating new features in an existing dataset is the core element of feature engineering.

TIME SERIES FORECASTING METHOD

The team works with numerous models and selects one based on the relevance and predicted accuracy of the prediction. This is done based on preliminary data preparation and exploratory analysis of various time series forecasting undertaken in the previous stage. The proper model construction and consideration of factors necessary within the forecasting process are ensured by fitting the model for project performance.

COMPARING PERFORMANCES

The forecasting model parameters’ optimization and high-performance results are covered in this step. Data scientists train forecasting models with various sets of hyper-parameters by using a cross-validation tuning procedure indicating the data split. Applying performance score estimates and evaluating a variety of test datasets are required to finish this step. In order to acquire a sufficient performance evaluation when processing the specific data type, it is crucial to use an out-of-sample technique.

DEPLOYMENT

The integration of the forecasting model into production is a part of this stage. At this point, we strongly advise creating a pipeline to gather fresh data for the next AI features. It helps in the data preparation work you must do for upcoming tasks.

images/20221126_6.jpg

▲The implementation of a time series forecasting project requires the highest caliber of development (來源：dataconomy.com)

To obtain data, the iterative loops include a number of exploration and visualization steps. After performing visualization, it can be necessary to take a step back and gather more information. The models are modified and updated as fresh information and new insights become available.

As a result, at this stage, the emphasis is now on developing and improving one or more models until a necessary level of performance is attained.

CHALLENGES OF A TIME SERIES FORECASTING PROJECT

We would like to discuss the knowledge we have gained from working on time series forecasting projects and identify any potential difficulties the development team might encounter.

LACK OF DATA

Prediction accuracy increases as dataset sizes grow because the algorithm has access to more training data. However, there are drawbacks to employing machine learning when a target variable lacks historical or seasonality data. Therefore, a lack of data could lead to a general decline in forecasting accuracy.

LACK OF DOMAIN KNOWLEDGE

The feature engineering stage, a crucial part of ML implementation, appears to run a high risk without sufficient domain knowledge. In general, domain expertise can assist any project’s model quality. The experience of business niche specialists is needed to avoid the issue caused by a lack of domain knowledge.

Our primary worries when working on the stock price forecasting project were related to the heteroscedasticity and chaotic nature of stock prices, in addition to the issues already highlighted.

CONCLUSION

Implementing a time series forecasting project requires the highest caliber of development. The next stage of data-driven forecasting and prediction is undoubtedly machine learning forecasting. Furthermore, there are no excuses for a business or entrepreneur to pass up the opportunity to use ML’s revolutionary capabilities to strengthen data analytics. Nevertheless, this industry has several potential pitfalls and random challenges that an expert can only manage.

轉貼自： dataconomy.com

若喜歡本文，請關注我們的臉書 Please Like our Facebook Page： Big Data In Finance