The Durbin-Watson test detects autocorrelation in regression residuals, specifically whether errors in a time series regression are correlated with their own lagged values. When autocorrelation is present, your OLS standard errors become unreliable, which means hypothesis tests and confidence intervals can't be trusted. This section covers how the test works, how to calculate and interpret the statistic, and what to do when autocorrelation shows up.

Overview of Durbin-Watson test

The Durbin-Watson test checks whether the residuals from a regression are autocorrelated. In a well-specified OLS model, residuals should be independent of one another. If they're not, you've violated a key Gauss-Markov assumption, and your coefficient estimates, while still unbiased, will have incorrect standard errors.

The test was developed by James Durbin and Geoffrey Watson in 1950 and remains one of the most common diagnostic checks in time series econometrics.

Purpose of the test

Autocorrelation means the error term in one period is correlated with the error term in a previous period. Think of it this way: if your model consistently underpredicts for several periods in a row, then overpredicts for several periods, the residuals are following a pattern rather than bouncing randomly. That pattern is autocorrelation.

Why does this matter? OLS assumes errors are independent. When that assumption fails:

Standard errors are typically underestimated, making t-statistics too large
You'll reject null hypotheses too often (false positives)
Coefficient estimates remain unbiased but are no longer efficient (no longer the best you can get)

Assumptions behind the test

The Durbin-Watson test is valid only under specific conditions:

The regression model includes an intercept term
The explanatory variables are non-stochastic (fixed in repeated sampling)
The errors follow a first-order autoregressive process (AR(1)), meaning $e_t = \rho e_{t-1} + u_t$
The model does not include a lagged dependent variable (e.g., $Y_{t-1}$ ) as a regressor

If any of these conditions are violated, the test results may be misleading.

Test for autocorrelation

The Durbin-Watson test is built to detect first-order autocorrelation, which is the correlation between consecutive residuals. It can identify both positive and negative autocorrelation.

Positive vs negative autocorrelation

Positive autocorrelation means a positive residual in one period tends to be followed by another positive residual, and negative residuals tend to follow negative ones. If you plot the residuals over time, you'll see smooth, wave-like patterns. This is the more common type in economic time series data.

Negative autocorrelation means residuals tend to alternate in sign: a positive residual is likely followed by a negative one, and vice versa. The residual plot looks like a rapid zigzag pattern.

First-order autocorrelation

First-order autocorrelation is the correlation between $e_t$ and $e_{t-1}$ . The autoregressive parameter $\rho$ captures the strength and direction of this relationship:

$\rho > 0$ : positive autocorrelation
$\rho < 0$ : negative autocorrelation
$\rho = 0$ : no autocorrelation

The Durbin-Watson statistic is directly related to $\rho$ . Approximately, $d \approx 2(1 - \rho)$ , which is why the statistic centers around 2 when there's no autocorrelation.

Higher-order autocorrelation

The Durbin-Watson test cannot detect higher-order autocorrelation, such as the correlation between $e_t$ and $e_{t-2}$ or $e_{t-3}$ . If you suspect autocorrelation at longer lags (common with quarterly or monthly data where seasonal patterns exist), use the Breusch-Godfrey LM test instead. The Breusch-Godfrey test is more flexible and also works when lagged dependent variables are in the model.

Calculating Durbin-Watson statistic

Formula for test statistic

The Durbin-Watson statistic is computed from the OLS residuals:

$d = \frac{\sum_{t=2}^{n} (e_t - e_{t-1})^2}{\sum_{t=1}^{n} e_t^2}$

where $e_t$ is the residual at time $t$ and $n$ is the number of observations.

The numerator sums the squared differences between each residual and the one before it. The denominator is just the residual sum of squares. If consecutive residuals are similar to each other (positive autocorrelation), the numerator will be small relative to the denominator, pushing $d$ toward 0.

Purpose of the test, regression - Interpreting the residuals vs. fitted values plot for verifying the assumptions of ...

Range of possible values

The statistic $d$ always falls between 0 and 4:

$d \approx 2$ : No autocorrelation (residuals are independent)
$d$ close to 0: Strong positive autocorrelation (consecutive residuals move together)
$d$ close to 4: Strong negative autocorrelation (consecutive residuals alternate in sign)

A quick rule of thumb: if $d$ is between roughly 1.5 and 2.5, autocorrelation is probably not severe. But you should always check against the formal critical values.

Interpreting the test statistic

You can't just compare $d$ to a single critical value. The Durbin-Watson distribution depends on the specific data matrix $X$ , so exact critical values vary. Instead, Durbin and Watson established lower ( $d_L$ ) and upper ( $d_U$ ) bounds that apply regardless of the data configuration. This creates zones of rejection, non-rejection, and inconclusiveness.

Critical values for the test

Lower and upper bounds

The critical values $d_L$ and $d_U$ are found in published Durbin-Watson tables. They depend on three things:

The significance level ( $\alpha$ )
The number of observations ( $n$ )
The number of regressors ( $k$ ), excluding the intercept

For testing positive autocorrelation:

If $d < d_L$ : reject $H_0$ (evidence of positive autocorrelation)
If $d > d_U$ : do not reject $H_0$
If $d_L \leq d \leq d_U$ : the test is inconclusive

The inconclusive region is a real drawback of this test. With small samples or many regressors, this zone can be quite wide.

Significance level

The significance level $\alpha$ is typically set at 0.05 (5%) or 0.01 (1%). A smaller $\alpha$ makes the test more conservative, meaning you need stronger evidence to reject the null. Most Durbin-Watson tables provide bounds for both levels.

Number of regressors

As $k$ increases, the gap between $d_L$ and $d_U$ widens, making the inconclusive region larger. With many regressors and a small sample, the test becomes less useful because you're more likely to land in the inconclusive zone.

Testing procedure

Here's how to carry out the Durbin-Watson test step by step:

Null and alternative hypotheses

$H_0$ : $\rho = 0$ (no first-order autocorrelation)
$H_1$ for a one-sided test of positive autocorrelation: $\rho > 0$
$H_1$ for a one-sided test of negative autocorrelation: $\rho < 0$

Most applications test for positive autocorrelation first, since it's far more common in economic time series.

Purpose of the test, Introduction to Assessing the Fit of a Line | Concepts in Statistics

Rejection regions

For a two-sided test (checking for both positive and negative autocorrelation), the decision rules are:

$d < d_L$ : Reject $H_0$ , conclude positive autocorrelation
$d > 4 - d_L$ : Reject $H_0$ , conclude negative autocorrelation
$d_U < d < 4 - d_U$ : Do not reject $H_0$ , no evidence of autocorrelation
$d_L \leq d \leq d_U$ or $4 - d_U \leq d \leq 4 - d_L$ : Test is inconclusive

Notice the symmetry around 2. The test for negative autocorrelation uses $4 - d_L$ and $4 - d_U$ as bounds.

Examples of test application

Suppose you estimate a consumption function with quarterly data ( $n = 60$ , $k = 3$ ) and compute $d = 0.95$ . At $\alpha = 0.05$ , the table gives $d_L = 1.48$ and $d_U = 1.69$ . Since $0.95 < 1.48$ , you reject $H_0$ and conclude there's positive autocorrelation in the residuals. Your next step would be to address it using one of the methods below.

Limitations of the test

Inconclusive regions

The inconclusive zone is the most frustrating aspect of the Durbin-Watson test. When $d$ falls between $d_L$ and $d_U$ , you can't make a definitive call. In practice, many researchers treat values in the inconclusive region as suggestive of autocorrelation and run additional tests (like Breusch-Godfrey) to confirm.

Lagged dependent variables

If your model includes a lagged dependent variable (e.g., $Y_{t-1}$ ) on the right-hand side, the Durbin-Watson test is biased toward 2. This means it will tend to suggest no autocorrelation even when autocorrelation exists. For dynamic models, use the Durbin h-test or the Breusch-Godfrey test instead.

Misspecification of the model

A significant Durbin-Watson result doesn't always mean autocorrelation is the real problem. Omitted variables, incorrect functional form, or structural breaks can all produce patterns in the residuals that look like autocorrelation. Before applying a correction for autocorrelation, check whether your model specification is correct. Fixing the specification often resolves the apparent autocorrelation.

Addressing autocorrelation

If the Durbin-Watson test confirms autocorrelation, you have several options for producing reliable inference.

Generalized least squares

Generalized least squares (GLS) transforms the model to eliminate the autocorrelation structure. If you know $\rho$ , you can transform each observation:

$Y_t^* = Y_t - \rho Y_{t-1}$

$X_t^* = X_t - \rho X_{t-1}$

Then run OLS on the transformed data. The resulting estimates are efficient (best linear unbiased). In practice, $\rho$ is usually unknown and must be estimated, which leads to Feasible GLS (FGLS).

Cochrane-Orcutt procedure

The Cochrane-Orcutt procedure is an iterative approach to FGLS:

Run OLS on the original model and obtain residuals $e_t$
Regress $e_t$ on $e_{t-1}$ to estimate $\hat{\rho}$
Transform the data using $\hat{\rho}$ (as shown above)
Run OLS on the transformed data
Repeat steps 2-4 until $\hat{\rho}$ converges (stops changing meaningfully)

One drawback: this procedure drops the first observation. The Prais-Winsten method is a variant that retains it, which matters more in small samples.

Newey-West standard errors

If your main concern is valid inference rather than efficiency, Newey-West (HAC) standard errors are a practical alternative. They adjust the standard errors to be robust to both autocorrelation and heteroskedasticity, without transforming the model.

The coefficient estimates stay the same as OLS
Only the standard errors (and therefore t-statistics and p-values) change
You need to choose a bandwidth (maximum lag length), often set to roughly $n^{1/4}$

Newey-West standard errors are widely used because they don't require you to specify the exact autocorrelation structure. They're especially common in applied time series work.