AI
Aug 22, 2022

Introduction to the statistical analysis of time series

This post introduces the use of statistical analysis of time series at Umaneo. Combining time series analysis with machine learning helps us to create models that have high accuracy for our clients. Our team has many years of experience using a wide range of statistical models: this experience has allowed us to be able to respond rapidly and efficiently to our clients’ AI needs.

Introduction

Time series data is a collection of ordered observations taken at regular intervals. One approach to gaining insight from data is through time series analysis which involves analyzing and modeling the information to identify patterns. Time series data is applicable to multiple AI use cases.

  • Forecasting: Time series analysis allows predicting future values by considering past observations. This has proven to be beneficial across multiple domains such as sales prediction, the anticipation of stock market trends. It also includes energy demand forecast and weather prediction.
  • Pattern recognition: Time series data may possess recurring patterns or trends that can be extracted with the help of machine learning techniques. Applications for this technology include recognizing speech, gestures and activities. For instance, analyzing time series data which represent audio signals enables us to recognize spoken words or phrases in speech recognition.
  • Natural language processing: Textual data in the form of word or character sequences can undergo time series analysis in natural language processing. This proves helpful for language translation, sentiment analysis, and speech recognition tasks.
  • Anomaly detection: Time series analysis can be used to detect anomalous patterns or outliers in data. The detection of variations in time series depends on understanding its typical behavior. Identifying fraudulent activities, predicting equipment failures, and recognizing abnormal events are all valuable applications of this technology.
  • Classification and clustering: Time series data can be categorized into various groups or clustered based on their similarities. In areas such as sentiment analysis where emotional trends in textual time series data are classified as positive, negative or neutral, this can be advantageous. Grouping similar time series together using clustering can help with further analysis or anomaly detection.

The following concepts will be covered:

1. White Noise (WN)

2. The Auto-Correlation Function (ACF)

3. Random Walks (RW)

4. Auto-Regressive (AR) models

5. The Partial Auto-Correlation Function (PACF)

6. Moving Average (MA) models

7. Overview of extensions and variations to AR-MA models

8. Combining time series models with machine learning

Images are from https://smac-group.github.io/ts/ (Creative Commons Attribution - Non Commercial-Share Alike 4.0 International License) and https://atsa-es.github.io/atsa-labs/ (Public domain)



1. White Noise (WN)

White noise is a fundamental concept used in various fields, including signal processing, econometrics, and stochastic processes. In statistics, white noise refers to a random signal or time series characterized by the following properties:

1. Zero mean: On average, the values of white noise have a mean of zero. This means that the noise has no systematic bias or trend.

E[X(t)] = 0

where E[X(t)] represents the expected value (mean) of the white noise at time t.

2. Constant variance: The variance of white noise remains constant over time. This means that the noise's amplitude or magnitude is not changing systematically.

VarXt=2

where Var(X(t)) represents the variance of the white noise at time t, and 2 represents the constant variance.

3. Independence: The values or observations in white noise are statistically independent of each other. The value at each point is unrelated to and cannot provide any information about the values at other points.

Cov(X(t), X(s)) = 0, for t ≠ s

where Cov(X(t), X(s)) represents the covariance between two white noise values at times t and s. 

4. Zero correlation: No relationship exists between its values at different times. The autocorrelation function (ACF) is defined as:

ρ(h) = Corr(X(t), X(t-h))

where ρ(h) represents the autocorrelation coefficient at lag h, Corr denotes the correlation, and X(t) and X(t-h) represent the white noise values at times t and t-h, respectively. For white noise, ρ(h) is zero for all non-zero lags (h ≠ 0).

Visually, white noise appears as a random sequence of values with no discernible pattern or structure. It is often represented as a series of random numbers or a graph of random fluctuations. Using white noise as a baseline or reference is common for analyzing other signals or time series. This feature enables researchers to spot patterns, trends and anomalies that deviate from randomness within the data.

2. The Auto-Correlation Function (ACF)

The autocorrelation function (ACF) is defined as:

ρ(h) = Corr(X(t), X(t-h))

where ρ(h) represents the autocorrelation coefficient at lag h, Corr denotes the correlation, and X(t) and X(t-h) represent the white noise values at times t and t-h, respectively.

The ACF is primarily used for the following purposes in practice:

  • Detecting patterns: By examining the ACF plot, analysts can identify patterns in the data that repeat at specific lags. This helps in understanding the underlying structure of the time series and identifying any seasonality or cyclic behavior.

  • Model selection: Auto-correlation analysis plays a crucial role in determining the appropriate model for forecasting or analyzing time series data. By examining the decay of auto-correlation coefficients, analysts can determine the order of autoregressive (AR) or moving average (MA) models.

  • Checking randomness: The ACF can help assess whether a time series exhibits random behavior or if there are persistent correlations. If the ACF shows significant non-zero correlations at multiple lags, it suggests the presence of serial correlation, indicating that previous observations influence future observations.

  • Diagnostic checking: After fitting a time series model, analysts can inspect the residuals' auto-correlation function to check if any systematic patterns or residual correlations remain. Departures from randomness in the residual ACF plot may indicate model inadequacy or the presence of additional structure that needs to be captured.

  • Seasonality analysis: The ACF is particularly useful for identifying the presence of seasonality in a time series. Seasonal patterns are characterized by high correlations at specific lags, indicating that previous values at those lags have a significant impact on the current value.

In the example above, the true autocorrelation is equal to zero at any lag h≠0, but the estimated autocorrelations are random variables and are not equal to their true values. It can be observed that most peaks lie within the interval ±2/σ suggesting that the true data generating process is uncorrelated. We should expect that approximately 1 in 20 of the ρ(h) will be statistically greater than zero based on chance alone, especially for relatively small sample sizes.

3. Random Walks (RW)

In a random walk, each consecutive element in a sequence is derived by adding an unpredictable and incremental change to its predecessor. Financial markets and other complex systems are frequently simulated using this model. The increments in a random walk are often modeled as white noise, assuming independence and constant variance. A random walk can be seen as an integrated version of white noise, where successive values are cumulatively summed. 

The following traits characterize the random walk stochastic process:

  1. The initial condition:

X(0) = c

where c is the starting value or initial condition of the random walk.

  1. The value at time t in a random walk:

X(t) = X(t-1) + ε(t)

where X(t) represents the value of the time series at time t, X(t-1) represents the previous value at time (t-1), and ε(t) is the random increment or "shock" at time t.

  1. The random increment:

ε(t) ~ N0,2

The random increment ε(t) is assumed to be normally distributed with mean zero (0) and variance 2. This implies that the increments are random and can take positive or negative values.

  1. Cumulative randomness: The values of a random walk are determined by cumulative random increments. At each step, the value is updated by adding a random value drawn from a distribution (often assumed to be a normal distribution) to the previous value. The random increments are independent of each other.
  2. No discernible trend: A random walk does not exhibit a systematic trend or pattern. The values fluctuate randomly around an initial starting point. This means that there is no long-term upward or downward trend in the series.
  3. Irregular fluctuations: Random walks can have irregular fluctuations in either direction. These fluctuations can sometimes appear to resemble trends, but they are entirely due to the random nature of the increments.
  4. Diffusion-like behavior: Over time, the random walk tends to spread out or diffuse. The fluctuations' magnitude usually grows proportionally to the square root of taken steps.

Finance, economics and other domains commonly use random walks to simulate varied phenomena such as stock price movements, changes in exchange rates or altering asset values. To test for trends or patterns in data recorded over time, random walks serve as a reference or default assumption. By including a drift term, a model of random walks becomes more complex. Introducing a non-zero mean increment can account for a systematic trend in the data.

The equation for a random walk with drift can be expressed as:

X(t) = c + X(t-1) + ε(t)

where c is the drift term, representing the deterministic component or trend.

In the plot above, two hundred simulated random walks are plotted along with theoretical 95% confidence intervals (red-dashed lines). The relationship between time and variance can clearly be observed: the variance of the process increases with the time.

Var(X(t)) = Var(X(t-1)) + Var(ε(t)) = Var(X(t-2)) + Var(ε(t-1)) + Var(ε(t)) = ... 

= Var(X(0)) + t Var(ε(t)) = t2

4. Auto-Regressive (AR) models

Autoregressive models with an order of 1 are referred to as AR(1) or first-order autoregressive models. They describe the relationship between an observation and its lagged value(s), and are widely used time series models. AR(1) models assume that the current value of a time series depends linearly on its immediate previous value.

The general form of an AR(1) model can be written as:

X(t) = c + φ * X(t-1) + ε(t)

where:

  • X(t) represents the value of the time series at time t.
  • c is a constant term or intercept.
  • φ (phi) is the autoregressive coefficient, which determines the impact of the lagged value on the current value. It typically lies between -1 and 1.
  • X(t-1) denotes the lagged value of the time series at time (t-1).
  • ε(t) ~ N0,2 is the error term or residual, representing the random component or noise in the model. The error term ε(t) is assumed to be normally distributed with mean zero (0) and variance 2.
  • The initial condition X(0) = c + ε(0).

The AR(1) model implies that the current value is a linear combination of the lagged value and a random disturbance term. The autoregressive coefficient, φ, determines the persistence of the series. When φ approaches 1, the series typically shows pronounced autocorrelation and a slow decay in the impact of previous values. Weak autocorrelation with rapidly diminishing influence from prior observations characterizes time-series data as φ approaches 0.

Estimating the parameters of an AR(1) model involves techniques such as maximum likelihood estimation or ordinary least squares. AR(1) models are useful for understanding and forecasting time series data, capturing dependencies between consecutive observations, and identifying trends or patterns in the data. They serve as a building block for more complex autoregressive models, such as AR(p), which include multiple lagged values in the model equation.

The time series with the smaller AR coefficient is more “choppy” and seems to stay closer to 0 whereas the time series with the larger AR coefficient appears to wander around more. As the coefficient in an AR(1) model goes to 0, the model approaches a White Noise sequence, which has constant mean and variance. As the coefficient goes to 1, however, the model approaches a random walk, which does not have constant variance and can also have a mean that drifts over time.

In these two examples, both time series vary around the mean by about the same amount, but the model with the negative coefficient produces a much more “sawtooth” time series. 

5. The Partial Auto-Correlation Function (PACF)

The Partial Autocorrelation Function (PACF) is a function used to determine the partial correlation between a time series observation and its lagged values, while controlling for the effects of intervening observations. It provides a measure of the direct relationship between a specific lag and the current observation, independent of the other lags.

Let's denote the PACF of a time series X at lag h as PACF(h), where h represents the lag. The PACF at lag h measures the correlation between X(t) and X(t-h), removing the influence of the intermediate observations X(t-1), X(t-2), ..., X(t-h+1). In other words, it quantifies the direct relationship between X(t) and X(t-h) after accounting for the effects of the observations at lags 1 to h-1.

Properties:

1. PACF(0) = 1: The PACF at lag 0 is always 1 since it represents the autocorrelation of a series with itself.

2. PACF(h) = 0 for h > p: For lags beyond the order of the autoregressive model (p), the PACF is zero. This property arises from the definition of an AR(p) process, where the direct influence of a lag decreases as the lag increases beyond p.

To compute the PACF, one common approach is to estimate the autoregressive coefficients of an AR(p) model using techniques such as least squares or maximum likelihood estimation. Once the coefficients are estimated, the PACF values can be obtained from them.

The PACF is a useful tool in time series analysis and model identification. It helps determine the order of an autoregressive model, as the significant PACF values often indicate the lags to be included in the model. By examining the decay of the PACF, one can also identify any hidden structures or dependencies in the data.

An autoregressive model of order p, often denoted as AR(p), is a time series model that captures the relationship between an observation and its previous p values. The ACF for an AR(p) process tails off toward zero very slowly, but the PACF goes to zero for lags > p. This is an important diagnostic tool when trying to identify the order of p. It is a generalization of the AR(1) model. 

The AR(p) model is defined by the following equation:

X(t) = c + φ₁ * X(t-1) + φ₂ * X(t-2) + ... + φₚ * X(t-p) + ε(t)

where:

- X(t) represents the value of the time series at time t.

- c is a constant term or intercept.

- φ₁, φ₂, ..., φₚ are the autoregressive coefficients, which determine the impact of the previous values on the current value. Each φ coefficient represents the weight or contribution of the respective lagged value.

- X(t-1), X(t-2), ..., X(t-p) denote the lagged values of the time series at times (t-1), (t-2), ..., (t-p), respectively.

- ε(t) is the error term or residual, representing the random component or noise in the model.

The AR(p) model captures the dependence of the current value on the p previous values. The autoregressive coefficients determine the strength and direction of this dependence. If a coefficient φ is positive and close to 1, it indicates a strong positive correlation and persistence of the series. Conversely, if φ is negative and close to -1, it suggests a strong negative correlation and persistence.

Techniques like maximum likelihood estimation or least squares are used to estimate the parameters of an AR(p) model. The appropriate order p is usually selected using model selection criteria. The Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) are among the criteria included.

Time series analysis employs AR(p) models extensively to forecast future trends, model interdependencies and capture autoregressive dynamics present in the data. They provide a flexible framework to model a variety of time series phenomena, including economic variables, financial markets, and weather patterns.

6. Moving Average (MA) models

MA(1) refers to a moving average model of order 1. It is a type of time series model that describes a series of observations as a linear combination of the current and one previous error term. MA(1) models are widely used in time series analysis to capture short-term dependencies in data. Here's the definition and formula for an MA(1) model. The value at time t in an MA(1) model can be expressed as:

X(t) = c + ε(t) + θ * ε(t-1)

where:

- X(t) represents the value of the time series at time t.

- c is the constant mean or intercept term.

- ε(t) is the error term or residual at time t. ε(t) ~ N0,2ε(t) is typically assumed to be normally distributed with mean zero (0) and variance 2. This represents the random component or noise in the model.

- θ (theta) is the moving average coefficient, which determines the impact of the previous error term on the current value.

- ε(t-1) represents the error term at the previous time step, t-1.

Assuming a zero average, there's no relationship between the error terms at different times. To determine the present values of time series through weighted sums that use θ for the moving average coefficients, MA(1) models incorporate both past (ε(t-1)) and current errors (ε(t)). Past errors are used to predict future values with this model.

Estimating the parameters (c, θ) of an MA(1) model typically involves techniques such as maximum likelihood estimation or least squares.

MA(1) models are often used to capture short-term dependencies or smooth out random fluctuations in time series data. 

An MA(q) (Moving Average) time series model is a type of model used to describe a stationary time series. MA(q) stands for a moving average model of order q, where "q" represents the number of lagged error terms or residuals included in the model. An MA(q) model expresses the current value of a time series as a linear combination of the current and past error terms or residuals. The general form of an MA(q) model is:

X(t) = c + ε(t) + θ₁ * ε(t-1) + θ₂ * ε(t-2) + ... + θₓ * ε(t-q)

where:

- X(t) represents the value of the time series at time t.

- c is the constant mean or intercept term of the time series.

- ε(t) denotes the error term or residual at time t.

- θ₁, θ₂, ..., θₓ are the coefficients associated with the lagged error terms. These coefficients capture the influence of past residuals on the current value. The subscript x ranges from 1 to q, representing the order of the moving average model.

The error terms including ε(t),ε(𝑡−1) ,…,ε(𝑡−𝑞) are assumed to be independently distributed with a constant variance 2.

The MA(q) model is a more powerful extension to the MA(1) model. MA(q) captures more complex short-term dependence or autocorrelation in a time series. The current and past error terms are taken into consideration when assuming the present value of the time series. The influences' strength is determined by the coefficients θ₁, θ₂, ..., θₓ. Having larger coefficient absolute values indicates a stronger effect from corresponding lagged error terms.

MA(q) models' parameters estimation generally involves applying methods such as maximum likelihood estimation or least squares. The estimated coefficients offer a way to understand how current and past error terms are linked. For MA(q) models, the mean, variance, and autocorrelation structure remain constant over time.

7. Overview of extensions and variations to AR-MA models

In the next parts, we will discuss some extensions and variations to AR(p) (Autoregressive), MA(q) (Moving Average) models, as well as related models and concepts:

1. ARMA (Autoregressive Moving Average): The ARMA model combines the autoregressive and moving average components into a single model. It incorporates both the lagged values of the time series (AR component) and the lagged error terms (MA component). The ARMA(p, q) model is expressed as:

X(t) = 

c + φ₁ * X(t-1) + φ₂ * X(t-2) + ... + φₚ * X(t-p) + θ₁ * ε(t-1) + θ₂ * ε(t-2) + ... + θₓ * ε(t-q) + ε(t)

The ARMA model is used to capture both short-term and long-term dependencies in a stationary time series.

2. ARIMA (Autoregressive Integrated Moving Average): The ARIMA model extends the ARMA model to handle non-stationary time series by incorporating differencing. It includes three components: autoregressive (AR), differencing (I), and moving average (MA). The ARIMA(p, d, q) model is expressed as:

(1 - φ₁B - φ₂B² - ... - φₚBᵖ)(1 - B)ᵈX(t) = c + (1 + θ₁B + θ₂B² + ... + θₓBᵠ)ε(t)

Here, d represents the order of differencing required to make the time series stationary.

3. Seasonality and SARIMA (Seasonal ARIMA): SARIMA models are used to account for seasonal patterns in time series data. It incorporates additional seasonal AR and MA terms, denoted as P and Q, respectively. The SARIMA(p, d, q)(P, D, Q, s) model is expressed as a combination of AR, I, MA, and seasonal components.

4. Exogenous variables: ARMAX and ARIMA-X models involve the inclusion of exogenous variables (X) that are not part of the time series but are believed to influence it. These models extend ARMA and ARIMA models by including the effect of exogenous variables on the dependent variable.

5. ARCH (Autoregressive Conditional Heteroscedasticity) models: ARCH models are used to capture time-varying volatility or heteroscedasticity in a time series. ARCH models introduce a conditional variance term that depends on the past squared error terms, allowing for clustering of high or low volatility periods.

6. GARCH (Generalized Autoregressive Conditional Heteroscedasticity) models: GARCH models are an extension of ARCH models that capture both short-term and long-term volatility patterns. GARCH models incorporate both the lagged squared error terms and lagged conditional variances to model time-varying volatility.

These extensions and variations provide more flexible and powerful tools for modeling and forecasting time series data. They account for different characteristics such as seasonality, non-stationarity, volatility clustering, and the influence of exogenous variables, enabling a more comprehensive analysis of time series behavior.

8. Combining time series models with machine learning

At Umaneo, we combine time series models with machine learning techniques in several ways to improve forecasting accuracy and capture complex patterns in the data. Considering characteristics such as seasonality, trend, and autocorrelation is essential when combining time series models with machine learning. Here are some approaches:

- Feature Engineering: Time series data can be transformed into a tabular format by extracting relevant features that capture important characteristics of the data. These features can include lagged values, moving averages, seasonality indicators, and other domain-specific attributes. Once the features are created, traditional machine learning algorithms can be applied.

- Model Stacking and Hybrid Models: Model stacking involves combining the predictions from several models by training a meta-model on top of their outputs. Training each base model can involve using diverse subsets of the data or employing varying algorithms. The final forecast is made by the meta-model learning to weigh or combine these predictions.

- Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs): Modeling time series data is a common application for Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). RNNs can process sequences of inputs and capture temporal dependencies through a feedback mechanism. Other layers or architectures can be combined with them to build more complex models by training using backpropagation through time.

Conclusion

Statistics are at the heart of artificial intelligence. We discussed how time series models can solve a wide variety of problems such as fraud detection, speech recognition, anomaly detection, econometrics analysis, stochastic processes analysis, stock price movements predictions, changes in exchange rates predictions, and weather patterns predictions.

The concepts we discussed in the present article can be used to enhance the performance of AI models to solve our clients’ business problems. The various time series analysis techniques discussed in this article embody a selection of tools that can be leveraged to address a variety of problem domains. The careful selection of such techniques can reliably produce powerful predictive models by exposing features in the data that are otherwise difficult to characterize.

If you have complex and challenging problems to solve, we would love to hear from you, and we would be happy to discuss with you over a cup of coffee :)

Continue reading