Introductory time series with r download pdf






















After some algebra, Equations 2. The proof of 2. Equation 2. This result is essentially a generalisation of the result for the random walk, i. The following two examples illustrate the procedure for determining whether an AR process is stationary or non-stationary: 1. As the roots are greater than unity this AR 2 model is stationary. It can be shown Section 2. The partial autocorrelation at lag k is the correlation that results after remov- ing the effect of any correlations due to previous terms.

For example, the partial autocorrelation of an AR 1 process will be zero for all lags greater than 1. Hence, a plot of the partial autocorrelations can be useful when determining the order of an underlying AR process. The partial-correlogram has no significant correlations, except the value at lag 1 as expected Figure 2. Note that in the partial- correlogram c only the first lag is significant, which is usually the case when the underlying process is AR 1.

The correlogram of residual series for the AR 1 model fitted to the exchange rate data. Thus the fitted AR 4 model 2. The correlogram of the residual series for the AR 4 model fitted to the annual global temperature series. AR model has no deterministic trend component, the trends in the data can be explained by serial correlation and random variation, implying that it is possible that these trends are stochastic or could arise from a purely stochas- tic process.

Of course, this does not imply that there is no underlying reason for the trends — if a valid scientific explanation is known, such as a link with the increased use of fossil fuels, then this information would clearly need to be included in any future forecasts of the series. Comment on the plots. Do the model parameters fall within the confidence intervals? Explain your results.

Justify your answer. Comment on the plot. Compare the confidence intervals to the parameters used to simulate the data and explain the results. Create a residual series from the difference between the predicted value and the observed value and verify that within machine accuracy your residual series is identical to the series extracted from the fitted model in R.

Com- ment on the final plot and on any potential inadequacies in the fitted model. Prove Equation 2. A trend is stochastic when the underlying cause is not understood, and can only be attributed to high serial correlation with random error. Trends of this type, which are common in financial series, can be simulated in R using models such as the random walk or autoregressive process Chapter 2.

Other trends are deterministic, so-called because there is a greater understanding of the underlying cause of the trend. For example, a deterministic increasing trend in the data may be related to an increasing population, or a regular cycle may be related to a known seasonal frequency. Deterministic trends and seasonal variation can be modelled using regression, which is the main focus in this chapter.

Time series regression usually differs from a standard regression analysis because the residuals of the fitted model also form a time series and therefore tend to be serially correlated. In this chapter, we use generalised least squares to allow for autocorrelation in the residual series and to obtain improved estimates of the standard errors of the fitted model parameters. We begin the chapter by looking at linear models for trends, and then introduce regression models that account for seasonal variation using indica- tor and harmonic variables.

The logarithmic transformation, which is often used to stabilise the variance, is also considered along with an appropriate inverse transform and bias correction factor needed for making forecasts.

An introduction to non-linear models is also included in this chapter. In this case, the value of the line at time t is the trend mt. For the more general polynomial, the trend at time t is the value of the underlying polynomial evaluated at t, so in 3.

Many non-linear models can be transformed to linear models. However, this usually has the effect of biasing the predictions as the least squared residual errors are also transformed within the non-linear function for xt Equation 3. In Section 3. Most natural processes that generate time series are probably non-linear. In addition, many non-linear functions cannot be transformed to linear func- tions through a simple operation such as taking logs.

Hence, a linear model will often provide a good approximation even when the under- lying process is non-linear. Consequently, linear models play a central role in the study of time series. However, differencing can often transform a non-stationary series with a deterministic trend to a stationary series. In Section 2. If the underlying trend is a polynomial of order p, then p-th order differencing will be required to remove it.

The function summary can be used to obtain a standard regression output — to observe this, type summary x. Specific information can also be extracted using summary. In Figure 3. Again this should not be surprising, given that we used an AR 1 process to simulate this series.

The residual correlogram for the fitted straight line model. The residual partial-correlogram for the fitted straight line model. The significant value at lag 1 indicates the residuals follow an AR 1 process, as expected. Residual correlogram for the regression model fitted to the global temper- ature series: — Hence, based on this interval, there is statis- tical evidence that the slope is positive, i.

However, this deduction may be premature because the correlogram indicates that the residuals are autocorrelated Figure 3. A mathematical explanation follows. Equation 3. For a positive serial correlation in the residual series, this implies that the standard errors of the estimated regres- sion parameters are likely to be underestimated Equation 3.

A fitting procedure known as generalised least squares GLS can be used to provide better estimates of the standard errors of the regression parameters to account for autocorrelation in the residuals. The procedure is essentially based on maximising the likelihood function, conditional on the autocorrela- tion, and is implemented in the R function: gls located in the nlme library. We illustrate the use of gls in the following example. GLS fit to simulated series In Section 3. For example, the slope is estimated as 3.

Where discrepancies in the model parameter estimates exist, the ordinary least squares estimates should be used in preference to the GLS estimates because they tend to be computationally more robust.

For an historical series the lag 1 autocorrelation would need to be first estimated from the correlogram of the residuals of a fitted linear model. In the next example, we illustrate this procedure using the global temperature series. In the gls function, we approximate the residual series as an AR 1 process with a lag 1 autocorrelation of 0.

This implies, as before, that the slope is statistically significant and that there is statistical evidence of an increasing trend in the global temperatures over the period — For example, monthly global temperatures were available for the year period: — If the sampling interval is smaller than a year, e. In this section, we consider two regression methods suitable for seasonally varying time series. The second is based on using sine and cosine functions of time to account for seasonal changes.

This model takes the same form as the additive classical decomposition model Equation 1. Instead, the constant for the model will effectively be included within the seasonal terms st. Equa- tion 3. The factor function can be used with cycle to create the seasonal indicator explanatory variables for the regression model.

The procedure is fairly straightforward, as the following example illustrates. If this zero is left out, one of the sea- sonal terms terms would be dropped during the estimation procedure and an estimate for an overall constant would appear in the output. Both approaches actually give equivalent results and it is more a matter of personal preference which you use.

The parameters can also be estimated by GLS by replacing lm with gls in the above code. To forecast a future value, we can apply the above fitted model to a new series of times. For example, in the code below, a 2-year ahead future pre- diction for the temperature series is obtained by creating a new set of times using the seq function beginning at the year , and then applying the fitted model function to these times.

In the last line of the code below, the forecasts for the next six months in the series are extracted. The predict function is a generic function that can be applied to a range of different types of fitted time series models, so it is important to learn how to use predict correctly. We will revisit this function in Section 3.

However, seasonal effects often vary smoothly over a year, so that it may be more parameter efficient to use a smooth function instead of separate indices for each season. Sine and cosine functions can be used to build smooth variation into a seasonal model. The expression on the right-hand- side of Equation 3. Hence, the expression on the right-hand-side is preferred in the formulation of a seasonal regression model, so that OLS can be used to estimate the parameters.

The trend may take a polynomial form as in Equation 3. Hence, with a constant term included, the maximum number of parameters in the harmonic model equals that of the seasonal indicator variable model Equation 3. However, the addition of further harmonics has the effect of perturbing the underlying wave to make it less regular than a standard sine wave of period s.

This usually still gives a dominant seasonal pattern of period s, but with a more realistic underlying shape. In most practical cases s is even and so [ ] can be omitted. Two possible underlying seasonal patterns for monthly series based on the harmonic model Equation 3. Plot a above is the first harmonic evaluated over a year, which would usually be too regular for the seasonal variation in most practical applications.

Plot b shows the same wave with a further two harmonics added, and illustrates just one of many ways in which an underlying sine wave can be perturbed to produce a less regular, but still dominant, seasonal pattern of period 12 months.

This model has the same seasonal harmonic components as the model represented in Fig- ure 3. In the code below a monthly series of length 10 years is simulated and plotted Figure 3.

Ten years of simulated data for the model in Equation 3. In most cases the order of the harmonics and polynomial trend will be unknown.

This information, along with the estimated parameters, can be extracted using the summary function. Metcalfe is very useful for Mathematics Department students and also who are all having an interest to develop their knowledge in the field of Maths.

This Book provides an clear examples on each and every topics covered in the contents of the book to provide an every user those who are read to develop their knowledge. Metcalfe Free? You all must have this kind of questions in your mind. Below article will solve this puzzle of yours. Just take a look.

Hall Free Download. Second, themodelisdescribedandde? Themodelisthenusedtosimulatesyntheticdatausing Rcodethatcloselyre? Finally, themodel is? By using R, the whole procedure can be reproduced by the reader, 1 anditisrecommendedthatstudentsworkthroughmostoftheexamples.

Mathematical derivations are provided in separate frames and starred sec- 1 WeusedtheRpackageSweavetoensurethat, ingeneral, yourcodewillproduce thesameoutputasours. However, forstylisticreasonswesometimeseditedour code;e. The term filtering is also used for smoothing, particularly in the engineering literature. A more specific use of the term filtering is the process of obtaining the best estimate of some variable now, given the latest measurement of it and past measurements.

The measurements are subject to random error and are described as being corrupted by noise. Filtering is an important part of control algorithms which have a myriad of applications. Nesting the function within plot e. For example, with the electricity data, additive and multiplicative decomposition plots are given by the commands below; the last plot, which uses lty to give different line types, is the superposition of the seasonal effect on the trend Fig.

Electricity production data: trend with superimposed multiplicative seasonal effects. A correlation of 0. The expectation in this definition is an average taken across the ensemble of all the possible time series that might 4 In statistics, asymptotically means as the sample size approaches infinity. The ensemble constitutes the entire population. If we have a time series model, we can simulate more than one time series see Chapter 4. However, with historical data, we usually only have a single time series so all we can do, without assuming a mathematical structure for the trend, is to estimate the mean at each sample point by the corresponding observed value.

In practice, we make estimates of any apparent trend and seasonal effects in our data and remove them, using decompose for example, to obtain time series of the random component. Then time series models with a constant mean will be appropriate.

If the mean function is constant, we say that the time series model is stationary in the mean. Such models are known as ergodic, and the models in this book are all ergodic.

Given that we usually only have a single time series, you may wonder how a time series model can fail to be ergodic, or why we should want a model that is not ergodic. Environmental and economic time series are single realisations of a hypothetical time series model, and we simply define the underlying model as ergodic. There are, however, cases in which we can have many time series arising from the same time series model.

Suppose we investigate the acceleration at the pilot seat of a new design of microlight aircraft in simulated random gusts in a wind tunnel. Even if we have built two prototypes to the same design, we cannot be certain they will have the same average acceleration response because of slight differences in manufacture. In such cases, the number of time series is equal to the number of prototypes. Another example is an experiment investigating turbulent flows in some complex system.

It is possible that we will obtain qualitatively different results from different runs because they do depend on initial conditions. It would seem better to run an experiment involving turbulence many times than to run it once for a much longer time. The number of runs is the number of time series. It is straightforward to adapt 2 Correlation Ensemble population 32 t Time Fig. An ensemble of time series. The expected value E xt at a particular time t is the average taken over the entire population.

But we cannot estimate a different variance at each time point from a single time series. To progress, we must make some simplifying assumption. If the correlation is positive, Var x will tend to underestimate the population variance in a short series because successive observations tend to be relatively similar. In most cases, this does not present a problem since the bias decreases rapidly as the length n of the series increases. Similarly, in the study of time series models, a key role is played by the second-order properties, which include the mean, variance, and serial correlation described below.

Consider a time series model that is stationary in the mean and the variance. The variables may be correlated, and the model is second-order stationary if the correlation between variables depends only on the number of time steps separating them. The number of time steps between the variables is known as the lag.

A correlation of a variable with itself at different times is known as autocorrelation or serial correlation. This definition follows naturally from Equation 2. It is possible to set up a second-order stationary time series model that has skewness; for example, one that depends on time t. The term strictly stationary is reserved for more rigorous conditions. The acvf and acf can be estimated from a time series by their sample equivalents. The sampling interval is 0.

The waves were generated by a wave maker driven by a pseudo-random signal that was programmed to emulate a rough sea. There is no trend and no seasonal period, so it is reasonable to suppose the time series is a realisation of a stationary process.

There are no outlying values. The lower plot is of the first sixty wave heights. We can see that there is a tendency for consecutive values to be relatively similar and that the form is like a rough sea, with a quasi-periodicity but no fixed frequency.

Wave height at centre of tank sampled at 0. A scatter plot, such as Figure 2. In a similar way, we can draw a scatter plot corresponding to each autocorrelation. For example, for lag 1 we plot waveht[],waveht[] to obtain Figure 2. Autocovariances are obtained by adding an argument to acf.

Wave height pairs separated by a lag of 1. For example, Figure 2. Correlogram of wave heights. The unit of lag is the sampling interval, 0. Correlation is dimensionless, so there is no unit for the y-axis. However, we should be careful about interpreting multiple hypothesis tests. Secondly, the rk are correlated, so if one falls outside the lines, the neighbouring ones are more likely to be statistically significant.

This will become clearer when we simulate time series in Chapter 4. In the meantime, it is worth looking for statistically significant values at specific lags that have some practical meaning for example, the lag that corresponds to the seasonal period, when there is one.

For monthly series, a significant autocorrelation at lag 12 might indicate that the seasonal adjustment is not adequate. Its inclusion helps us compare values of the other autocorrelations relative to the theoretical maximum of 1. This is useful because, if we have a long time series, small values of rk that are of no practical consequence may be statistically significant. However, some discernment is required to decide what constitutes a noteworthy autocorrelation from a practical viewpoint.

Squaring the autocorrelation can help, as this gives the percentage of variability explained by a linear relationship between the variables. For example, a lag 1 autocorrelation of 0. It is a common fallacy to treat a statistically significant result as important when it has almost no practical consequence. This is typical of correlograms of time series generated by an autoregressive model of order 2. We cover autoregressive models in Chapter 4.

If you look back at the plot of the air passenger bookings, there is a clear seasonal pattern and an increasing trend Fig. It is not reasonable to claim the time series is a realisation of a stationary model. But, whilst the population acf was defined only for a stationary time series model, the sample acf can be calculated for any time series, including deterministic signals.

Usually a trend in the data will show in the correlogram as a slow decay in the autocorrelations, which are large and positive due to similar values in the series occurring close together in time. This can be seen in the correlogram for the air passenger bookings acf AirPassengers Fig.

If there is seasonal variation, seasonal spikes will be superimposed on this pattern. The annual cycle appears in the air passenger correlogram as a cycle of the same period superimposed on the gradually decaying ordinates of the acf.

Conversely, because the seasonal trend is approximately sinusoidal, values separated by a period of 6 months will tend to have a negative relationship. For example, higher values tend to occur in the summer months followed by lower values in the winter months. A dip in the acf therefore occurs at lag 6 months or 0. Although this is typical for seasonal variation that is approximated by a sinusoidal curve, other series may have patterns, such as high sales at Christmas, that contribute a single spike to the correlogram.

Correlogram for the air passenger bookings over the period — The gradual decay is typical of a time series containing a trend. The peak at 1 year indicates seasonal variation. In the code below, the air passenger series is seasonally adjusted and the trend removed using decompose.

To plot the random component and draw the correlogram, we need to remember that a consequence of using a centred moving average of 12 months to smooth the time series, and thereby estimate the trend, is that the first six and last six terms in the random component cannot be calculated and are thus stored in R as NA.

The random component and correlogram are shown in Figures 2. A plot of the data is shown in Figure 2. There was a slight decreasing trend over this period, and substantial seasonal variation. The trend and seasonal variation have been estimated by regression, as described in Chapter 5, and the residual series adflow , which we analyse here, can reasonably be considered a realisation from a stationary time series model.

The main difference between the regression approach and using decompose is that the former assumes a linear trend, whereas the latter smooths the time series without assuming any particular form for the trend. The correlogram is plotted in Figure 2. Adjusted inflows to the Font Reservoir, — There is a statistically significant correlation at lag 1. The physical interpretation is that the inflow next month is more likely than not to be above average if the inflow this month is above average.

The explanation is that the groundwater supply can be thought of as a slowly discharging reservoir. If groundwater is high one month it will augment inflows, and is likely to do so next month as well.

Given this 41 0. Correlogram for adjusted inflows to the Font Reservoir, — The explanation for this is that most of the inflow is runoff following rainfall, and in Northumberland there is little correlation between seasonally adjusted rainfall in consecutive months. An exponential decay in the correlogram is typical of a first-order autoregressive model Chapter 4. The correlogram of the adjusted inflows is consistent with an exponential decay. However, given the sampling errors for a time series of this length, estimates of autocorrelation at higher lags are unlikely to be statistically significant.

This is not a practical limitation because such low correlations are inconsequential. When we come to identify suitable models, we should remember that there is no one correct model and that there will often be a choice of suitable models.

The result tells us that the covariance of two sums of variables is the sum 42 2 Correlation of all possible covariance pairs of the variables. The proof of Equation 2. Draw a scatter plot for each set and then calculate the correlation. Comment on your results. Can you see a pattern? Can you guess what they represent?

Plot these components. Would you expect these data to have a substantial seasonal component? Compare the standard deviation of the original series with the deseasonalised series. Produce a plot of the trend with a superimposed seasonal effect. Comment on the plot, with particular reference to any statistically significant correlations. Use decompose on the time series and then plot the correlogram of the random component. Compare this with Figure 2. A very efficient method of forecasting one variable is to find a related variable that leads it by one or more time intervals.

The closer the relationship and the longer the lead time, the better this strategy becomes. The trick is to find a suitable lead variable. This provides valuable information on the likely demand over the next few months for all sectors of the building industry. A variation on the strategy of seeking a leading variable is to find a variable that is associated with the variable we need to forecast and easier to predict. In many applications, we cannot rely on finding a suitable leading variable and have to try other methods.

A second approach, common in marketing, is to use information about the sales of similar products in the past. The influential Bass diffusion model is based on this principle. A third strategy is to make extrapolations based on present trends continuing and to implement adaptive estimates of these trends. The statistical technicalities of forecasting are covered throughout the book, and the purpose of this chapter is to introduce the general strategies that are available.

One source of such information is World Shipyard Monitor, which gives brief details of orders in over shipyards. The paint company has set up a database of ship types and sizes from which it can P. The company monitors its market share closely and uses the forecasts for planning production and setting prices. The data in the file ApprovActiv.

We start by reading the data into R and then construct time series objects and plot the two series on the same graph using ts. Building approvals solid line and building activity dotted line. In Figure 3. The cross-correlation function, 3. A plot of the cross-correlation function against lag is referred to as a cross-correlogram. Cross-correlation Suppose we have time series models for variables x and y that are stationary in the mean and the variance.

The variables may each be serially correlated, and correlated with each other at different time lags. If x is the input to some physical system and y is the response, the cause will precede the effect, y will lag x, the ccvf will be 0 for positive k, and there will be spikes in the ccvf at negative lags. Some textbooks define ccvf with the variable y lagging when k is positive, but we have used the definition that is consistent with R.

If ts. Correlogram and cross-correlogram for building approvals and building activity. The time unit for lag is one year, so a correlation at a lag of one quarter appears at 0. Several of the cross-correlations at negative lags do pass these lines, indicating that the approvals time series is leading the activity.

Numerical values can be printed using the print function, and are 0. The ccf can be calculated for any two time series that overlap, but if they both have trends or similar seasonal effects, these will dominate Exercise 1.



0コメント

  • 1000 / 1000