How to find the autocorrelation function?
The autocorrelation function is a statistic that measures the strength of the relationship between the delay of the time series itself and itself, usually denoted by R(k). It is calculated as:
R(k)=(Σ(Xt-X̄)(Xt+k-X̄))/(Σ(Xt-X̄)²)
While Xt is the value of the time series at time t, X̄ is the mean value of the time series, and k denotes the number of time steps in the time series delay.
The meaning of this formula is that the time series is arranged in chronological order, and then at each time point, the cosine of the angle between the vector formed by the line connecting the time point and the time point after a delay of k time points and the vector of the mean value of the time series is computed and the cosines of the cosines of all the vectors are summed, and finally divided by the sum of the squares of the lengths of the vectors is computed.
Since the autocorrelation function of the time series decreases as the delay time k increases, the value of R(k) generally varies between [-1,1].
What methods are used for time series analysis with relatively few years
Time series analysis
Edit
Time series analysis (Timeseriesanalysis) is a statistical method for dynamic data processing. The method is based on the theory of stochastic processes and methods of mathematical statistics to study the statistical laws obeyed by random data sequences for use in solving practical problems.
Contents
1Introduction
2References
3Components
4Basic Steps
5Main Uses
▪System Description
▪System Analysis
▪Prediction of the Future
▪Decision Making and Control
6Concrete Algorithms
1Introduction to editing
It includes general statistical analysis (e.g., autocorrelation analysis, spectral analysis, etc.), statistical modeling and inference, as well as optimal prediction, control, and filtering with respect to time series. Classical statistical analysis assumes that data series are independent, while time series analysis focuses on the interdependence of data series. The latter is actually a statistical analysis of the stochastic process of discrete indicators, so it can also be regarded as a component of stochastic process statistics. For example, the rainfall of the first month, the second month,……, and the Nth month of a certain region is recorded, and the rainfall of the future months can be forecasted by using the time series analysis method.
With the development of computer-related software, mathematical knowledge is no longer an empty theory, time series analysis is mainly based on mathematical statistics and other knowledge, the application of relevant mathematical and scientific knowledge in the relevant aspects of the application and so on.
2 Reference Editor
Reference from: Dictionary of Scientific and Technological Methods
Time series is a set of numerical sequences in chronological order. Time series analysis is the use of this set of number series, the application of mathematical and statistical methods to deal with, in order to predict the development of future things. Time series analysis is one of the quantitative forecasting methods, its basic principles: First, recognize the continuity of the development of things. Application of past data, you can speculate on the development trend of things. The second is to take into account the randomness of the development of things. The development of any thing may be affected by chance factors, for this reason, the weighted average method of statistical analysis should be utilized to deal with historical data. The method is simple and easy to grasp, but the accuracy is poor, generally only applicable to short-term forecasting. Time series forecasting generally reflect three kinds of actual change rules: trend changes, cyclical changes, random changes.
Time series analysis is based on systematic observation of time series data, through the curve fitting and parameter estimation to establish the theory and method of mathematical model. It is generally carried out using curve fitting and parameter estimation methods (such as nonlinear least squares). Time series analysis is commonly used in the macro-control of the national economy, regional integrated development planning, business management, market potential forecasting, meteorological forecasting, hydrological forecasting, earthquake precursor forecasting, crop disease and pest disaster forecasting, environmental pollution control, ecological balance, astronomy and oceanography and other aspects.
3Components Edit
A time series usually consists of four elements: trend, seasonal variation, cyclic fluctuations and irregular fluctuations.
Trend: is the continuous upward or continuous downward movement that a time series exhibits over a long period of time.
Seasonal variations: are cyclical fluctuations in the time series that are repeated within a year. It is the result of the influence of various factors such as climatic conditions, production conditions, holidays or people’s customs.
Cyclical fluctuations: is the time series show a non-fixed length of the cycle. Cyclic fluctuations of the cycle may last for a period of time, but unlike the trend, it is not a continuous change in a single direction, but the same rise and fall of the alternating fluctuations.
Irregular fluctuations: are random fluctuations in a time series after the trend, seasonal variations and cyclical fluctuations are removed. Irregular fluctuations are usually always mixed in the time series, resulting in a time series to produce a wavy or oscillating movements. A series containing only random fluctuations is also called a smooth series.
4Basic Steps Editor
The basic steps of time series modeling are:
①Observation, survey, statistics, sampling and other methods to obtain the observed system time series dynamic data.
② According to the dynamic data for the correlation diagram, correlation analysis, autocorrelation function. Correlation charts can show the trend of change and cycle, and can find the jump point and inflection point. A jump point is an observation that is inconsistent with other data. If the jump points are correct observations, they should be taken into account in modeling, and if they are anomalies, the jump points should be adjusted to the desired values. Inflection points, on the other hand, are points at which the time series suddenly changes from an upward trend to a downward trend. If there is an inflection point, the time series must be modeled with a different model to fit the time series, such as the use of threshold regression model.
3) Identify a suitable stochastic model for curve fitting, i.e., use a generalized stochastic model to fit the observed data of the time series. For short or simple time series, trend models and seasonal models with errors can be used for fitting. For smooth time series, the generalized ARMA model (autoregressive sliding average model) and its special case autoregressive model, sliding average model or combined-ARMA model can be used to fit. The ARMA model is generally used when there are more than 50 observations. For non-stationary time series, the observed time series should first be differentiated into a stationary time series, and then the appropriate model should be used to fit this differentiated series.
5 Main Uses Editor
System Description
An objective description of a system based on time-series data obtained from observations of the system using curve-fitting methods.
System analysis
When observations are taken from more than two variables, changes in one time series can be used to explain changes in another, providing insight into the mechanisms that produce a given time series.
Forecasting the Future
The ARMA model is generally used to fit a time series and predict the future values of that time series.
Decision making and control
Based on the time series model the input variables can be adjusted to keep the system development process on target, i.e., the necessary controls can be made when the process is predicted to deviate from the target.
6 Specific Algorithmic Editing
Stochastic process theory and mathematical statistics are used to study the statistical laws obeyed by random data sequences for solving practical problems. It is called time series because in most problems the random data are sequenced in time order. It includes general statistical analysis (e.g., autocorrelation analysis, spectral analysis, etc.), statistical modeling and inference, as well as optimal prediction, control, and filtering of random sequences. While classical statistical analysis assumes independence of data series, time series analysis focuses on the interdependence of data series. The latter is actually a statistical analysis of the stochastic process of discrete indicators, so it can also be regarded as a component of stochastic process statistics. For example, using x(t) to represent the rainfall of a region in month t, {x(t), t=1, 2, …} is a time series. For t=1, 2, …, T, the rainfall data x(1), x(2), …, x(T) are recorded month by month and are called a sample sequence of length T. Time series analysis can then be used to forecast rainfall x(T+l) (l=1,2,…) for future months. Time series analysis was applied to economic forecasting before World War II. During and after World War II, its applications became more widespread in sectors such as military science, space science, and industrial automation.
As far as mathematical methods are concerned, the statistical analysis of smooth random sequences (see smooth process) is more mature in its theoretical development and thus forms the basis of time series analysis.
Frequency domain analysis A time series can be viewed as a superposition of various periodic perturbations, frequency domain analysis is to determine the distribution of vibration energy of each cycle, this distribution is called “spectrum”, or “power spectrum”. Therefore, frequency domain analysis is also called spectral analysis. An important statistic in spectral analysis is called the periodogram of a sequence. When a sequence contains deterministic periodic components, finding the periods of these components through the extreme points of I(ω)
Equation
is one of the important elements of spectral analysis. In a rainfall sequence recorded by month, the sequence x(t) can then be regarded as containing definite components with a period of 12, so that the sequence x(t) can be expressed as,which has an obvious extreme value at the periodogram I(ω).
When the spectral distribution function F(λ) of a smooth sequence has a spectral density ƒ(λ) (i.e., a power spectrum),
Equation
can be used to estimate ƒ(λ) by (2π)-1I(λ), which is an asymptotically unbiased estimate of ƒ(λ). If you want to seek ƒ(λ) of the conjugate estimate (see point estimate), can be used to estimate ƒ(λ) I (ω) of the appropriate value of the smoothing ƒ(λ), commonly used method for the spectral window estimation that is to take ƒ(λ) estimate of 弮(λ) for, where wt (ω) is known as the spectral window function. Spectral window estimation is one of the important methods in practical applications. A kind of conjugate estimate of the spectral distribution F(λ) itself can be obtained directly from the integral of I(ω), i.e.. Studying the statistical properties of the above various estimators and improving the estimation method is an important part of spectral analysis.
Equation
Time-domain analysis It aims to determine the interdependence of the values taken by the series at different moments, or, in other words, to determine the correlation structure of the series. This structure is described by the autocorrelation function 0,1,…), which is the value of the self-covariance function of the series, and m=Ex(t), which is the mean value of the smooth series. The following formulas
Equation
Equation is often used to give an estimate of m, γ(k),ρ(k):,Pass(k) Understanding the correlation structure of the series is called autocorrelation analysis. The study of their strong and weak collinearity and their asymptotic distributions is a fundamental problem in correlation analysis.
Equation
Model Analysis Since the 1970s, the most widely used time series model is the smooth autoregressive-sliding average model (referred to as ARMA model). Its shape is as follows: where ε(t) is an independent and identically distributed random series with zero mean and variance σ2; and σ2 are the parameters of the model, which satisfy: z holds for all complex numbers z with |z|≤1. p and q are the order of the model
Equation
numbers, which are non-negative integers. In particular, when q=0, the above model is called autoregressive model; when p=0,it is called sliding average model. Estimating these parameters and orders from sample values of x(t) is what the statistical analysis of such models is all about. There are simpler solutions to the problems of linear optimal prediction and control of smooth series that satisfy the ARMA model,especially the autoregressive model,which is more convenient to use.G.U.U. Juul proposed the concept of smooth autoregressive
formulas
between 1925 and 1930.In 1943,Η.Β.Mann and Α.Wald published some theoretical results on the statistical methods of this model and its asymptotic properties. The study of statistical analysis of ARMA models in general, on the other hand, was developed only after the 1960s
Formulas
. In particular, the theory of estimation of p, q values and their asymptotic approximation appeared somewhat later. In addition to the ARMA model, there are other studies of model analysis, among which the study of linear models is more mature, and all of them are closely related to the ARMA model analysis.
Equation
Equation
Regression analysis If the time series x(t) can be expressed as the sum of the deterministic component φ(t) and the stochastic component ω(t), estimating φ(t) and analyzing the statistical pattern of ω(t) according to the sample values x(1), x(2), …,x(T) belongs to the regression in time series analysis. analysis problem in time series analysis. The difference between it and classical regression analysis is that ω(t) is generally not
formula
independently and identically distributed, and thus more knowledge of stochastic processes must be involved here. When φ(t) is an unknown linear combination of a finite number of known functions, i.e., Eq. ω(t) is a smooth sequence with zero mean, α1,α2,…,αs are unknown parameters, φ1(t), φ2(t),…,φs(t) are known functions, the above equation is called a linear regression model, and its statistical analysis has been studied in The statistical analysis of this model has been studied in depth. The example of rainfall described above can be described by this type of model. Regression analysis includes: when ω(t) of the statistical law is known, the parameters α1, α2, …, αs for estimation, prediction of the value of x(T + l); when ω(t) of the statistical law is unknown, both to estimate the above parameters, but also to ω(t)
Equation
Statistical analysis, such as spectral analysis, modeling analysis, etc.. Among these, an important topic is: to show, in a fairly wide range of cases, that the least squares estimates of α1,α2,…,αs have the coincidence and asymptotic normal distribution properties, as do the unbiased estimates of their linear minimum variances. The least squares estimate 姙j(1≤j≤s) does not involve the statistical correlation structure of ω(t), and is directly calculated from the data x(1), x(2), …, x(T), from which Eq. (t)
Equation
Performula
Conducts a wide range of statistical analyses in the analysis of time series in lieu of the analysis of ω(t). It has also been shown theoretically that such substitutions have satisfactory asymptotic properties under appropriate conditions. Since the true value of ω(t) cannot be measured directly, these theoretical results clearly have important practical implications. Research in this area is still evolving.
Aspects of optimal prediction, control and filtering in time series analysis are described in the smooth process article. In recent years the research on multidimensional time series analysis has progressed and has been applied to industrial production automation and economic analysis. In addition, nonlinear modeling statistical analysis and nonparametric statistical analysis and other aspects have gradually attracted people’s attention.
Introduction to time series modeling
Contents
A time series is a collection of observations, each of which is observed at a time period (is a natural number). A given time series is called (weakly) stationary if, for any, it satisfies the following conditions:
i.
ii.
iii.
We call it a (weakly) stationary series. (Hereafter we will refer to it as a smooth series.)
In layman’s terms, the expectation, variance, and covariance of a stationary series do not change over time. For example, it is smooth if it follows the same distribution.
Example 1 The time series in the figure below was generated. Intuitively, this series is “smooth”.
Example 2 The time series in the figure below is generated by. It grows significantly at first and then levels off. Using the ADF test (see below for details), we find that the series is smooth (p-value<0.01).
Remark’s weak smoothness is mainly reflected in the fact that the time series is globally smooth, i.e., the time series fluctuates locally but is smooth in the whole, or its sample mean converges over time.
We use hypothesis testing in statistics to determine the smoothness of the sample. The AugmentedDickey-Fuller (ADF) test is commonly used[1].
At a significant level, we can accept or reject by calculating the p-value:
The ADF test is implemented by the adfuller function [3] in statsmodels.tas.stattools in Python3. The usage is shown below.
AR stands for Autoregression. Assuming that the time series is smooth, it can be expressed as follows:
MA stands for MovingAverage. Assuming that the time series is stable, it can be expressed as follows:
The ARMA model is a combination of AR and MA. Assuming the same assumptions as above. It can be expressed in the following form:
The ARIMA model is a generalization of the ARMA model, the full name is AutoregressiveIntegratedMovingAverage. When the time series does not satisfy the smoothness, we usually use the technique of differencing to make the series smooth, and then apply the ARMA model.
The parameter represents the order of the difference. Here is the formula for the difference (for the difference operator):
Example 3The following figure shows the original time series. It is observed that its mean has a clear upward trend and does not converge, so it is not a smooth series (the p-value of ADF test is 0.94).
After the first-order differencing of this series, we get the following smooth time series (p-value is 0.00).
This notation represents the seasonal (or periodic) ARIMA model, and the detailed expression can be found in [4](4.1SeasonalARIMAmodels), where
We can regard it as a two-stage model: the first stage uses ARIMA(p,d,q) globally; the second stage is performed by specifying the period length, and then using the ARIMA(p,q,d) model to consider the relationship between cycles.
Example 4 Consider the following periodic smooth time series ().
Periodic differencing of the series: the new time series is obtained as shown below (in red)
By using periodic differencing, we can remove the periodicity from the original time series. Similarly, by using periodic autoregressive and moving average coefficients, we can take into account the dependence between periods in the model.
Example 5 considers data with period s=18 (blue curve). The results of forecasting with and respectively are as follows.
The predictions of the ARIMA model without considering periodicity (gray curve) converge gradually to the mean of the time series. Since the time series is smooth, such a prediction result meets our expectation. Considering that the time series has a strong periodicity, and by observing the period. In this case, we only use the period difference, and finally we get the periodic prediction result as shown in the figure (red curve).
The full name of ARCH is AutoregressiveConditionallyHeteroscedasticity, which can be used to consider the time series in which the variance of the samples changes (or oscillates) over time. Given that the time series is smooth, the model can be expressed in the following form:
Where
GARCH is GeneralizedARCH, which is a generalization of ARCH model[6]. Let the time series be smooth, the model can be represented as follows:
Where
What kind of data is generated by the RemarkARCH/GARCH stochastic process? As mentioned earlier they allow the variance of the samples to vary over time, but since smoothness must be satisfied (a prerequisite assumption), the variance of the samples will vary (oscillate) locally, but should be a “smooth” series when viewed as a whole. For example, the following figure shows a process-generated time series ().
VAR is Vector Autoregression, which is a multivariate autoregressive model. Similarly, we have the VectorAutoregression, which is a vector version of the VAR. It is important to note that VARMA models deal with time series that can be trended. We will not expand on this in detail, interested readers can refer to [4] section 11.2: VectorAutoregressivemodelsVAR(p)models.
Given a sample of observations of a time series, how do you determine the parameters of the model after selecting a forecasting model? In this section, we introduce two common methods: 1. drawing ACF/PACF plots and then observing the values; 2. automating the selection of parameters by calculating relevant statistical indicators.
The full name of ACF is AutocorrelationFunction. For variables, the value of ACF represents the correlation between…
The full name of ACF is AutocorrelationFunction.
The full name of PACF is PartialAutocorrelationFunction. For a variable, the value of PACF represents the correlation between and under known conditions.
Example 6 Set up. Consider the time series generated by the following three models and compute the corresponding ACF/PACF.
The basic idea is to compute a number of indicators and choose the parameters to make the correlation as small as possible. Below we introduce some common indicators.
For the sake of description, let’s define some notations.
(A modified version of AIC, solving the problem of small sample overfitting)
(also known as SchwartzCriterion, SBC, SBIC)
Remark suggests that these metrics be taken into account in practice.
Python3codeonGithub
Time Series Analysis – Moving Average Method
The average of the sequence values in the recent N periods is used as a forecast for future periods.
The range of values of N: 5≤N≤200
Only applicable to recent forecasts, and the data development trend is not large
Error calculation:
Where, N is the actual amount, n represents n times of the moving average, take the smaller standard error as the number of times of the moving average and the final prediction value.
Adding weights according to the importance of the data, w is the weight (the weight of the near future is large, and the weight of the distant future is small), then the prediction formula is:
If the prediction value is low, it can be corrected by:
Calculating relative error:
Calculating the total relative error:
Corrected prediction value:
Multiple simple moving averages:
Knowing the first t items, predict the value of t+1 items
Where:
N: N items moving average