📊Advanced Quantitative Methods

Key Concepts in Econometric Models

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Econometric models are the workhorses of quantitative analysis—they're how you move from "these variables seem related" to "here's the precise magnitude and statistical significance of that relationship." In Advanced Quantitative Methods, you're being tested on your ability to select the right model for a given data structure, diagnose violations of key assumptions, and interpret output in ways that answer real research questions. The models here span linear regression, limited dependent variables, time series, panel data, and causal inference techniques.

Don't just memorize formulas and model names. For each technique, know when to use it, what assumptions it requires, and what happens when those assumptions fail. Exam questions will present you with a scenario—endogeneity, censored data, autocorrelation—and expect you to identify the appropriate estimation strategy. Master the why behind each model, and the mechanics will follow.

Linear Regression Foundations

These models form the backbone of econometrics. OLS and its extensions assume a linear relationship between predictors and the outcome, with well-behaved error terms. Understanding when these assumptions hold—and when they don't—is fundamental to everything else.

Ordinary Least Squares (OLS) Regression

Minimizes the sum of squared residuals—this is the optimization criterion that produces the "best fit" line through your data
Gauss-Markov assumptions (linearity, independence, homoscedasticity, no perfect multicollinearity) must hold for OLS to be BLUE—Best Linear Unbiased Estimator
Coefficient interpretation: a one-unit change in $X$ produces a $\beta$ change in $Y$ , holding other variables constant

Multiple Linear Regression

Extends OLS to multiple predictors—allows you to isolate the effect of each independent variable while controlling for confounders
Multicollinearity becomes a key concern; high correlation among predictors inflates standard errors and destabilizes coefficient estimates
Adjusted $R^2$ penalizes adding weak predictors, making it preferable to raw $R^2$ for model comparison

Generalized Least Squares (GLS)

Corrects for heteroskedasticity or autocorrelation—when OLS assumptions about error terms fail, GLS reweights observations to restore efficiency
Requires specifying the variance-covariance structure of errors; feasible GLS (FGLS) estimates this structure from the data
More efficient than OLS when error assumptions are violated, but consistency still depends on correct model specification

Compare: OLS vs. GLS—both estimate linear relationships, but OLS assumes homoscedastic, uncorrelated errors while GLS explicitly models error structure. If an exam question mentions "clustered standard errors" or "serial correlation," GLS or robust alternatives are your answer.

Limited Dependent Variable Models

When your outcome isn't continuous and unbounded, standard OLS breaks down. These models handle binary outcomes, censored data, and bounded distributions by specifying appropriate functional forms.

Logistic Regression

Models binary outcomes (0/1) by estimating the probability that $Y = 1$ given the predictors
Uses the logit link function: $\log\left(\frac{p}{1-p}\right) = X\beta$ , transforming probabilities to an unbounded scale
Coefficients represent log-odds; exponentiate to get odds ratios for interpretation

Probit Models

Alternative to logit for binary outcomes—uses the cumulative normal distribution $\Phi(X\beta)$ instead of the logistic function
Marginal effects must be computed at specific values of $X$ since the relationship between predictors and probability is nonlinear
Produces nearly identical results to logit in most applications; choice often depends on disciplinary convention

Tobit Models

Handles censored dependent variables—when outcomes are truncated at a boundary (e.g., hours worked can't be negative)
Jointly estimates the probability of censoring and the conditional mean of the latent variable
OLS on censored data yields biased estimates; Tobit corrects by modeling the censoring mechanism explicitly

Compare: Logit vs. Probit—both model binary outcomes, but logit uses the logistic distribution while probit uses the normal. Logit's odds ratio interpretation is often more intuitive; probit's marginal effects integrate more naturally with other normal-distribution-based models.

Quantile Regression

Estimates conditional quantiles (median, 25th percentile, etc.) rather than the conditional mean
Robust to outliers and reveals how predictors affect different parts of the outcome distribution
Essential for heterogeneous effects—if you suspect the relationship between $X$ and $Y$ varies across the distribution, quantile regression captures this

Time Series and Dynamic Models

When observations are ordered in time, standard cross-sectional methods fail because errors are correlated across periods. These models exploit the autocorrelation structure to forecast and understand dynamic relationships.

Time Series Models (ARIMA, SARIMA)

ARIMA(p, d, q) combines autoregression (p lags), differencing (d times), and moving average (q error lags) to model non-stationary series
Stationarity is required—the differencing component transforms trending or unit-root series into stationary ones
SARIMA adds seasonal components—use when data shows regular periodic patterns (quarterly GDP, monthly sales)

Vector Autoregression (VAR)

Multivariate time series model—each variable is regressed on its own lags and lags of all other variables in the system
No need to specify endogenous vs. exogenous; all variables are treated symmetrically
Impulse response functions trace how a shock to one variable propagates through the system over time

Compare: ARIMA vs. VAR—ARIMA models a single series using its own history; VAR captures interdependencies among multiple series. Use ARIMA for univariate forecasting, VAR when you need to understand dynamic interactions between variables.

Panel Data Methods

Panel data combines cross-sectional and time-series dimensions, creating powerful opportunities to control for unobserved heterogeneity. The key challenge is handling entity-specific effects that might bias estimates.

Panel Data Models

Fixed effects control for time-invariant unobserved heterogeneity by including entity-specific intercepts (or demeaning)
Random effects assume unobserved heterogeneity is uncorrelated with predictors, allowing more efficient estimation
Hausman test compares fixed and random effects; rejection suggests fixed effects is preferred due to correlation between effects and regressors

Causal Inference and Endogeneity Solutions

The biggest threat to causal claims is endogeneity—when your independent variable is correlated with the error term. These techniques provide consistent estimates even when standard regression fails.

Instrumental Variables (IV) Regression

Addresses endogeneity by finding an instrument $Z$ that affects $Y$ only through $X$ —the exclusion restriction
Instrument must be relevant (correlated with $X$ ) and exogenous (uncorrelated with the error term)
Weak instruments produce biased and inconsistent estimates; always check first-stage F-statistics

Two-Stage Least Squares (2SLS)

Stage 1: Regress the endogenous variable $X$ on instruments $Z$ to get predicted values $\hat{X}$
Stage 2: Regress $Y$ on $\hat{X}$ —this isolates the exogenous variation in $X$
Standard errors must be adjusted since $\hat{X}$ is estimated; software handles this automatically

Simultaneous Equation Models

Multiple equations where dependent variables appear as regressors in other equations—creates feedback loops that violate OLS exogeneity
Identification requires restrictions—you need enough excluded instruments in each equation to identify parameters
2SLS or 3SLS estimation provides consistent estimates by accounting for the simultaneity

Compare: IV vs. 2SLS—2SLS is the specific estimation procedure for IV regression. All 2SLS is IV, but IV can be implemented through other methods (LIML, GMM). If asked about "solving endogeneity," IV is the concept; 2SLS is your go-to implementation.

Flexible Estimation Frameworks

These advanced techniques provide greater flexibility when standard parametric assumptions are too restrictive. They're especially valuable when you have many moment conditions or complex error structures.

Generalized Method of Moments (GMM)

Uses moment conditions $E[g(X, \theta)] = 0$ to estimate parameters without fully specifying the distribution
More efficient than 2SLS when you have more instruments than endogenous variables (overidentification)
Hansen's J-test checks overidentifying restrictions—rejection suggests invalid instruments or misspecification

Maximum Likelihood Estimation (MLE)

Maximizes the likelihood function—finds parameter values that make observed data most probable
Asymptotically efficient—achieves the Cramér-Rao lower bound for variance as sample size grows
Requires correct distributional assumptions; misspecification leads to inconsistent estimates

Compare: GMM vs. MLE—MLE requires specifying the full distribution of errors; GMM only needs moment conditions. GMM is more robust to distributional misspecification, but MLE is more efficient when the distribution is correctly specified.

Quick Reference Table

Concept	Best Examples
Linear continuous outcomes	OLS, Multiple Regression, GLS
Binary/categorical outcomes	Logistic Regression, Probit
Censored/truncated data	Tobit
Distributional heterogeneity	Quantile Regression
Single time series	ARIMA, SARIMA
Multiple time series	VAR
Cross-sectional + time variation	Panel Data (Fixed/Random Effects)
Endogeneity correction	IV, 2SLS, GMM
Simultaneous relationships	Simultaneous Equations, 2SLS
Flexible estimation	GMM, MLE

Self-Check Questions

You have panel data on firms over 10 years and suspect that unobserved firm culture affects both your independent variable and outcome. Should you use fixed effects or random effects, and why?
Compare logistic regression and probit models: what do they have in common, and when might you prefer one over the other?
A researcher finds that OLS residuals show a fan-shaped pattern when plotted against fitted values. Which assumption is violated, and what estimation approach would correct it?
You want to estimate the effect of education on wages, but ability is unobserved and correlated with both. Describe how instrumental variables would address this, and give an example of a potentially valid instrument.
When would you choose GMM over 2SLS for instrumental variables estimation? What diagnostic test would you use to evaluate your overidentifying restrictions?