CS&IE Data Consulting GmbH

1. Application Area

Time Series Analysis can be used for many applications such as:

Economic Forecasting
Sales Forecasting
Budgetary Analysis
Stock Market Analysis
Yield Projections
Process and Quality Control
Inventory Studies
Workload Projections
Utility Studies
Census Analysis
and many more.

Most time series patterns can be described in terms of two basic classes of components: trend and seasonality.

2. Trend Analysis

There are no proven "automatic" techniques to identify the trend components in time series data. However, as long as the trend is monotonous (consistently increasing or decreasing), this part of data analysis is typically not very difficult. If time series data contain a considerable error, then data smoothing is to be applied as the first step in the process of trend identification.

2.1. Smoothing

Smoothing always involves some form of local averaging of data such that the nonsystematic components of individual observations cancel each other out. The most common technique is moving average smoothing which replaces each element of the series by either the simple or weighted average of n surrounding elements, where n is the width of the smoothing "window". Medians can be used instead of means. The main advantage of median as compared to moving average smoothing is that its results are l ess biased by outliers (within the smoothing window). Thus, if there are outliers in the data (e.g., due to measurement errors), median smoothing typically produces smoother or at least more "reliable" curves than moving average based on the same window width. The main disadvantage of median smoothing is that in the absence of clear outliers it may produce more "jagged" curves than moving average and it does not allow for weighting.

In the relatively less common cases (in time series data), when the measurement error is very large, the distance weighted least squares smoothing or negative exponentially weighted smoothing techniques can be used. All those methods will filter out the noise and convert the data into a smooth curve that is relatively unbiased by outliers (see the respective sections on each of those methods for more details). Series with relatively few and systematically distributed points can be smoothed with bicubic splines.

2.2. Fitting a function

Many monotonous time series data can be adequately approximated by a linear function; if there is a clear monotonous nonlinear component, the data first need to be transformed to remove the nonlinearity. Usually a logarithmic, exponential, or (less often) polynomial function can be used.

3. Analysis of Seasonality

Seasonal dependency (seasonality) is another general component of the time series pattern. The concept was illustrated in the example of the airline passenger data above. It is formally defined as correlation dependency of order k between each i'th element of the series and the (i-k)'th element and measured by autocorrelation (i.e. a correlation between the two terms); k is usually called the lag. If the measurement error is not too large, seasonality can be visually identified in the series as a pattern that repeats every k elements.

3.1. Autocorrelation correlogram

Seasonal patterns of time series can be examined via correlograms. The correlogram (autocorrelogram) displays graphically and numerically the autocorrelation function (ACF), that is, serial correlation coefficients (and their standard errors) for consecutive lags in a specified range of lags (e.g., 1 through 30). Ranges of two standard errors for each lag are usually marked in correlograms but typically the size of auto correlation is of more interest than its reliability because we are usually interested only in very strong (and thus highly significant) autocorrelations.

3.2. Examining correlograms

While examining correlograms one should keep in mind that autocorrelations for consecutive lags are formally dependent. Consider the following example. If the first element is closely related to the second, and the second to the third, then the first element must also be somewhat related to the third one, etc. This implies that the pattern of serial dependencies can change considerably after removing the first order auto correlation (i.e. after differencing the series with a lag of 1).

4. Measuring for Accuracy

One way of evaluating the accuracy of forecasts is to plot the observed values and the one-step-ahead forecasts in identifying the residual behavior over time. The widely used statistical measures of error that can help identify a method or the optimum value of the parameter within a method are:

4.1. Mean absolute error

The mean absolute error (MAE) value is the average absolute error value. Closer this value is to zero the better the forecast is.

4.2. Mean squared error

Mean squared error (MSE) is computed as the sum (or average) of the squared error values. This is the most commonly used lack-of-fit indicator in statistical fitting procedures. As compared to the mean absolute error value, this measure is very sensitive to any outlier; that is, unique or rare large error values will impact greatly MSE value.