🎳Intro to Econometrics Unit 8 Review

Autoregressive models are a core tool for analyzing time series data in econometrics. They capture how past values of a variable influence its current value, which lets you model persistence in economic data and generate short-term forecasts.

The central idea is straightforward: the current value of a variable depends linearly on its own past values plus a random error term. By estimating these relationships, you can quantify how strongly the past predicts the present.

Definition of autoregressive models

An autoregressive (AR) model expresses the current value of a time series as a linear combination of its own past values plus an error term. For example, a first-order autoregressive model, AR(1), looks like this:

$Y_t = c + \phi_1 Y_{t-1} + \varepsilon_t$

Here, $Y_t$ is the current value, $Y_{t-1}$ is the value one period ago, $\phi_1$ is the autoregressive coefficient, $c$ is a constant, and $\varepsilon_t$ is the error term. A second-order model, AR(2), adds another lag:

$Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \varepsilon_t$

The general AR(p) model includes $p$ lagged values as predictors. AR models are widely used in econometrics to model and forecast economic and financial time series.

Components of AR models

Dependent variable vs lagged variables

The dependent variable is the current value $Y_t$ of the series you're modeling. The lagged variables (autoregressive terms) are past values like $Y_{t-1}$ , $Y_{t-2}$ , etc., used as predictors.

The number of lags you include determines the order of the model. An AR(1) uses one lag, an AR(3) uses three, and so on. Choosing the right order matters a lot for model performance.

Autoregressive coefficients

The autoregressive coefficients ( $\phi_1, \phi_2, \ldots, \phi_p$ ) determine how strongly each past value influences the current value. A coefficient of 0.8 on the first lag, for instance, means that 80% of last period's value carries forward (after accounting for the constant and error).

The sign tells you the direction of influence (positive means the series tends to continue in the same direction; negative means it tends to reverse).
These coefficients are typically estimated using ordinary least squares (OLS) or maximum likelihood estimation (MLE).

Error term assumptions

The error term $\varepsilon_t$ represents the random shock not explained by the lagged values. For valid estimation and inference, you need several assumptions to hold:

$\varepsilon_t$ has a mean of zero and constant variance (homoscedasticity)
$\varepsilon_t$ is uncorrelated across time periods (no autocorrelation in the errors)
$\varepsilon_t$ is uncorrelated with the lagged dependent variables
For hypothesis testing, normality of $\varepsilon_t$ is also assumed

If these assumptions break down, your coefficient estimates or standard errors may be unreliable.

Stationarity in AR models

Definition of stationarity

Stationarity means the statistical properties of the time series (its mean, variance, and autocovariance structure) don't change over time. A stationary series fluctuates around a constant mean with constant spread, and the correlation between any two observations depends only on the time gap between them, not on when they occur.

This is a crucial assumption for AR models. If the data-generating process is shifting over time, the relationships you estimate won't be stable or meaningful.

Importance for valid inference

Non-stationary data can produce spurious regression results, where you find statistically significant relationships that don't actually exist. With non-stationary series, coefficient estimates and standard errors can be biased and inconsistent, leading to incorrect conclusions.

Stationarity ensures that the coefficients you estimate reflect a genuine, stable relationship. It also makes forecasting meaningful, since you're projecting from a process whose behavior doesn't fundamentally change.

Testing for stationarity

Before fitting an AR model, you should test whether your series is stationary. The most common tests look for unit roots, which indicate non-stationarity:

Augmented Dickey-Fuller (ADF) test: Tests the null hypothesis that a unit root is present (series is non-stationary). Rejection suggests stationarity.
Phillips-Perron (PP) test: Similar to ADF but uses a non-parametric correction for serial correlation.
KPSS test: Reverses the null hypothesis, testing whether the series is stationary. This is a useful complement to the ADF test.

If your series turns out to be non-stationary, you can often achieve stationarity by differencing (subtracting consecutive values: $\Delta Y_t = Y_t - Y_{t-1}$ ) or applying other transformations like log differencing.

Estimation of AR models

Dependent variable vs lagged variables, Autoregressive model - Wikipedia

Ordinary least squares (OLS)

OLS is the most common method for estimating AR model coefficients. It minimizes the sum of squared residuals between observed and predicted values.

For AR models, OLS works well when the error term assumptions hold (especially no autocorrelation in the errors). Under those conditions, OLS gives you unbiased and consistent estimates. It's also computationally simple, which is a practical advantage.

One thing to keep in mind: with lagged dependent variables as regressors, OLS is biased in small samples but remains consistent as the sample size grows. This is why having a reasonably long time series matters.

Maximum likelihood estimation (MLE)

MLE finds the parameter values that maximize the probability of observing your data given the model. It's an alternative to OLS that becomes especially useful when:

The error term may not follow a normal distribution
There are missing observations in the data
You want asymptotically efficient estimates (smallest possible variance among consistent estimators)

MLE estimates are consistent and asymptotically normal, meaning they behave well in large samples. In practice, for a standard AR model with normally distributed errors, OLS and MLE produce very similar results.

Model selection for AR models

Determining AR order

Picking the right number of lags $p$ is one of the most important decisions when building an AR model. Too few lags and you miss important dynamics; too many and you overfit the data and waste degrees of freedom.

Two main approaches:

Partial autocorrelation function (PACF): Plot the PACF of your series. The PACF measures the correlation between $Y_t$ and $Y_{t-k}$ after removing the effects of intermediate lags. For an AR(p) process, the PACF cuts off sharply after lag $p$ . So if the PACF is significant at lags 1 and 2 but drops to near zero at lag 3, an AR(2) is a good candidate.
Information criteria: Use AIC or BIC to compare models of different orders (see below).

Information criteria (AIC, BIC)

Information criteria balance goodness of fit against model complexity. Both AIC and BIC penalize you for adding more parameters, which discourages overfitting.

AIC (Akaike Information Criterion): Tends to select slightly larger models. Better when your priority is forecast accuracy.
BIC (Bayesian Information Criterion): Imposes a heavier penalty for extra parameters, so it tends to select more parsimonious models. BIC is consistent, meaning it picks the true model order as the sample size grows.

In practice, you estimate AR(1), AR(2), AR(3), etc., compute AIC and BIC for each, and choose the model with the lowest value. If AIC and BIC disagree, BIC's choice is often preferred for its parsimony.

Diagnostic checking of AR models

Residual analysis

After fitting your AR model, check the residuals (the differences between observed and fitted values) to see if the model is adequate. Well-behaved residuals should:

Show no autocorrelation (they should look like white noise)
Have roughly constant variance over time
Be approximately normally distributed

Plot the residuals against time and against fitted values to look for patterns. If you see trends, clusters of large residuals, or fanning patterns, the model may be misspecified. You can also use the Durbin-Watson test or the Breusch-Godfrey test to formally test for remaining autocorrelation in the residuals.

Ljung-Box test for autocorrelation

The Ljung-Box test is a widely used diagnostic that checks whether any of the first $m$ autocorrelations of the residuals are significantly different from zero.

Null hypothesis: The residuals are independently distributed (no autocorrelation up to lag $m$ ).
Alternative hypothesis: At least some autocorrelation exists in the residuals.

A significant result (low p-value) means your model hasn't fully captured the time series dynamics, and you may need to increase the AR order or consider a different model specification.

Forecasting with AR models

Dependent variable vs lagged variables, time series - AR(1) selection using sample ACF-PACF - Cross Validated

One-step ahead forecasts

To forecast one period ahead, plug the most recent observed values and estimated coefficients into the AR equation. For an AR(1):

$\hat{Y}_{T+1} = \hat{c} + \hat{\phi}_1 Y_T$

This is straightforward because you're using actual observed data as inputs. One-step ahead forecasts tend to be the most accurate.

Multi-step ahead forecasts

For forecasts further into the future, you generate predictions iteratively. The two-step ahead forecast for an AR(1) would be:

$\hat{Y}_{T+2} = \hat{c} + \hat{\phi}_1 \hat{Y}_{T+1}$

Notice that $\hat{Y}_{T+1}$ is itself a forecast, not an observed value. Each step forward uses previously forecasted values as inputs, so forecast errors accumulate and uncertainty grows with the forecast horizon.

Techniques like bootstrapping or simulation can help you construct prediction intervals that reflect this growing uncertainty. As a general rule, AR models are most reliable for short-term forecasts.

Advantages vs disadvantages of AR models

Advantages:

Simple to estimate and interpret

Capture linear persistence and momentum in time series

Work well for short-term forecasting when assumptions hold

Serve as building blocks for more complex models (ARMA, VAR)

Disadvantages:

Cannot capture non-linear relationships

Struggle with structural breaks (sudden shifts in the data-generating process)

Forecast accuracy degrades quickly over longer horizons

Ignore external factors that may influence the variable of interest

Extensions of AR models

Autoregressive moving average (ARMA)

ARMA models combine AR terms with moving average (MA) terms. While AR terms capture dependence on past values of the series, MA terms capture dependence on past error terms:

$Y_t = c + \phi_1 Y_{t-1} + \varepsilon_t + \theta_1 \varepsilon_{t-1}$

This is an ARMA(1,1) model. The general ARMA(p,q) includes $p$ AR terms and $q$ MA terms. ARMA models can represent a wider range of time series patterns more parsimoniously than a pure AR model with many lags.

Vector autoregressive (VAR) models

VAR models extend the AR framework to multiple time series simultaneously. Each variable is modeled as a function of its own past values and the past values of every other variable in the system.

For example, a VAR model of GDP growth and inflation would let GDP depend on lagged GDP and lagged inflation, and simultaneously let inflation depend on lagged inflation and lagged GDP. This captures dynamic interactions and feedback effects between variables, making VAR models a staple of macroeconomic analysis.

Applications of AR models in econometrics

Modeling economic time series

AR models are commonly applied to series like GDP growth, inflation rates, unemployment rates, and exchange rates. These variables often exhibit persistence (high values tend to follow high values), which AR models are designed to capture.

By estimating an AR model, you can quantify how quickly a variable returns to its mean after a shock and analyze the cyclical behavior embedded in the data.

Forecasting financial variables

In financial econometrics, AR models help forecast stock returns, volatility, and trading volumes. Financial series often display autocorrelation that AR models can exploit for short-term prediction.

AR models are frequently combined with GARCH models to handle time-varying volatility, which is a common feature of financial data. These combined models support applications in risk management, portfolio optimization, and trading strategy development.

🎳Intro to Econometrics Unit 8 Review