ARCH models explain why financial time series like stock returns and exchange rates go through alternating stretches of calm and turbulence. By letting the variance of the error term change over time, these models formalize volatility clustering, one of the most robust stylized facts in finance. This section covers the model structure, how parameters are estimated, how to read the output, and where ARCH falls short.

Structure of ARCH models

A standard time series regression assumes constant variance in the error term. ARCH relaxes that assumption by making the variance depend on recent squared errors.

An ARCH(q) model has two equations:

Conditional mean equation:

$y_t = \mu_t + \varepsilon_t$

$\mu_t$ is the conditional mean of $y_t$ given past information. It could be a constant, a moving average, or a regression model.
$\varepsilon_t$ is the error term, whose variance is not constant but instead changes over time.

Conditional variance equation:

$\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + \alpha_2 \varepsilon_{t-2}^2 + \cdots + \alpha_q \varepsilon_{t-q}^2$

$\alpha_0 > 0$ and $\alpha_i \geq 0$ for $i = 1, \ldots, q$ . These constraints guarantee that the conditional variance stays positive.
$q$ is the order of the model. An ARCH(1) uses only the most recent squared error; an ARCH(3) looks back three periods.

The error term is assumed conditionally normal:

$\varepsilon_t \mid \Omega_{t-1} \sim N(0,\, \sigma_t^2)$

where $\Omega_{t-1}$ is the information set available at time $t-1$ (past returns, past volatility, etc.). Notice that unconditionally the distribution of $\varepsilon_t$ will have heavier tails than a normal, which aligns with what we actually see in financial returns.

The key intuition: when a large shock (positive or negative) hits at time $t-1$ , the squared error $\varepsilon_{t-1}^2$ is large, which pushes $\sigma_t^2$ up. That's volatility clustering in equation form.

Structure of ARCH models, Frontiers | Financial Forecasting With α-RNNs: A Time Series Modeling Approach

Parameter estimation for ARCH

ARCH parameters are estimated by maximum likelihood estimation (MLE).

The log-likelihood function for an ARCH(q) model is:

$\ln L(\theta) = -\frac{1}{2} \sum_{t=1}^{T} \left[ \ln(2\pi) + \ln(\sigma_t^2) + \frac{\varepsilon_t^2}{\sigma_t^2} \right]$

$\theta = (\mu,\, \alpha_0,\, \alpha_1,\, \ldots,\, \alpha_q)$ is the full parameter vector.
$T$ is the number of observations (daily, weekly, or monthly returns).

The estimation process in practice:

Choose starting values for $\theta$ .
For each observation $t$ , compute $\varepsilon_t = y_t - \mu_t$ and then $\sigma_t^2$ from the variance equation.
Evaluate the log-likelihood using the formula above.
Use a numerical optimizer (common choices: BFGS, Nelder-Mead) to find the $\theta$ that maximizes $\ln L$ .
Obtain standard errors from the inverse of the Hessian matrix (the matrix of second derivatives of the log-likelihood) evaluated at the optimum.

Because there's no closed-form solution, software does the heavy lifting. But understanding the steps helps you diagnose convergence problems when they arise.

Structure of ARCH models, Financial models with long-tailed distributions and volatility clustering - Wikipedia, the free ...

Interpretation of ARCH coefficients

$\alpha_0$ represents the baseline level of variance. It's the portion of volatility that persists regardless of recent shocks.
$\alpha_i$ measures how much the squared error from $i$ periods ago contributes to today's conditional variance. A larger $\alpha_i$ means past shocks of that lag have a stronger influence on current volatility.

For an ARCH(1), the unconditional variance of $\varepsilon_t$ is $\frac{\alpha_0}{1 - \alpha_1}$ , which only exists when $\alpha_1 < 1$ . If $\alpha_1$ is close to 1, shocks die out very slowly and volatility is highly persistent.

Goodness-of-fit tools:

Likelihood ratio test compares the ARCH model against a restricted constant-variance model. A significant test statistic means the ARCH terms add explanatory power.
AIC and BIC balance fit against complexity. Lower values indicate a better model. BIC penalizes extra parameters more heavily than AIC, so it tends to favor more parsimonious specifications.
Ljung-Box test on squared standardized residuals checks whether the model has captured all the volatility clustering. If the test is significant, autocorrelation remains in the squared residuals, and you may need a higher order or a different model.

Limitations of ARCH models

ARCH models are a foundational tool, but they have well-known drawbacks:

Non-negativity constraints are restrictive. Every $\alpha_i$ must be non-negative. In practice this can prevent the model from fitting certain volatility patterns, especially when the number of lags is large.
High orders are often needed. Capturing slow-decaying volatility persistence may require a large $q$ , which leads to many parameters and potential convergence difficulties during estimation. This is actually the main motivation for GARCH models, which achieve long memory with far fewer parameters.
Symmetric treatment of shocks. The variance equation uses $\varepsilon_{t-1}^2$ , so a positive shock and a negative shock of the same size have identical effects on future volatility. In equity markets, this is unrealistic.
No leverage effect. Empirically, negative returns tend to increase future volatility more than positive returns of the same magnitude (Black, 1976). Standard ARCH cannot capture this asymmetry. Extensions like EGARCH and GJR-GARCH were specifically designed to address it.

These limitations don't make ARCH useless. They make it a building block. Understanding where ARCH falls short is the clearest path to understanding why GARCH and its variants exist.

2,589 studying →