Intro to Econometrics

🎳Intro to Econometrics Unit 4 – Gauss-Markov Assumptions & OLS Properties

The Gauss-Markov assumptions are crucial for understanding the properties of Ordinary Least Squares (OLS) estimators in linear regression. These assumptions ensure that OLS estimators are the Best Linear Unbiased Estimators (BLUE), providing a foundation for reliable statistical inference. Linear regression models the relationship between variables, with OLS minimizing the sum of squared residuals. Key concepts include linearity, random sampling, no perfect collinearity, zero conditional mean, and homoscedasticity. Understanding these assumptions helps identify and address potential violations in real-world applications.

Key Concepts and Definitions

  • Gauss-Markov assumptions a set of conditions that ensure OLS estimators are BLUE (Best Linear Unbiased Estimators)
  • Linear regression a statistical method for modeling the linear relationship between a dependent variable and one or more independent variables
  • Ordinary Least Squares (OLS) a method for estimating the parameters of a linear regression model by minimizing the sum of squared residuals
  • Estimator a rule or formula used to estimate the value of an unknown parameter based on sample data
  • Unbiasedness an estimator is unbiased if its expected value is equal to the true value of the parameter being estimated
  • Efficiency an estimator is efficient if it has the smallest variance among all unbiased estimators
  • Homoscedasticity the assumption that the variance of the error term is constant across all observations
  • Multicollinearity the presence of high correlation among independent variables in a regression model

Linear Regression Basics

  • Linear regression models the relationship between a dependent variable and one or more independent variables
  • The goal is to find the line of best fit that minimizes the sum of squared residuals (differences between observed and predicted values)
  • Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables
  • The regression equation is represented as Y=β0+β1X1+β2X2+...+βkXk+εY = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_kX_k + \varepsilon, where:
    • YY is the dependent variable
    • β0\beta_0 is the intercept
    • β1,β2,...,βk\beta_1, \beta_2, ..., \beta_k are the slope coefficients for each independent variable
    • X1,X2,...,XkX_1, X_2, ..., X_k are the independent variables
    • ε\varepsilon is the error term
  • The slope coefficients represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant
  • The intercept represents the value of the dependent variable when all independent variables are zero

The Gauss-Markov Assumptions Explained

  • The Gauss-Markov assumptions ensure that OLS estimators are BLUE (Best Linear Unbiased Estimators)
  • Assumption 1: Linearity the relationship between the dependent variable and independent variables is linear
  • Assumption 2: Random sampling observations are randomly sampled from the population
  • Assumption 3: No perfect collinearity no independent variable is a perfect linear combination of other independent variables
  • Assumption 4: Zero conditional mean the expected value of the error term is zero given any values of the independent variables (E[εX]=0E[\varepsilon|X] = 0)
  • Assumption 5: Homoscedasticity the variance of the error term is constant across all levels of the independent variables (Var[εX]=σ2Var[\varepsilon|X] = \sigma^2)
    • Violation of this assumption is called heteroscedasticity
  • When these assumptions hold, OLS estimators are BLUE, meaning they are unbiased and have the smallest variance among all linear unbiased estimators

Ordinary Least Squares (OLS) Method

  • OLS is a method for estimating the parameters of a linear regression model by minimizing the sum of squared residuals
  • The OLS estimators for the slope coefficients and intercept are obtained by solving the minimization problem: minβ0,β1,...,βki=1n(Yiβ0β1X1i...βkXki)2\min_{\beta_0, \beta_1, ..., \beta_k} \sum_{i=1}^n (Y_i - \beta_0 - \beta_1X_{1i} - ... - \beta_kX_{ki})^2
  • The OLS estimators for the slope coefficients are given by: β^j=i=1n(XjiXˉj)(YiYˉ)i=1n(XjiXˉj)2\hat{\beta}_j = \frac{\sum_{i=1}^n (X_{ji} - \bar{X}_j)(Y_i - \bar{Y})}{\sum_{i=1}^n (X_{ji} - \bar{X}_j)^2}, where Xˉj\bar{X}_j and Yˉ\bar{Y} are the sample means of XjX_j and YY, respectively
  • The OLS estimator for the intercept is given by: β^0=Yˉβ^1Xˉ1...β^kXˉk\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}_1 - ... - \hat{\beta}_k\bar{X}_k
  • OLS estimators are consistent and asymptotically normal under the Gauss-Markov assumptions

Properties of OLS Estimators

  • Under the Gauss-Markov assumptions, OLS estimators have several desirable properties:
    • Unbiasedness the expected value of the OLS estimator is equal to the true value of the parameter (E[β^j]=βjE[\hat{\beta}_j] = \beta_j)
    • Efficiency OLS estimators have the smallest variance among all linear unbiased estimators (Gauss-Markov Theorem)
    • Consistency as the sample size increases, the OLS estimators converge in probability to the true parameter values
    • Asymptotic normality as the sample size increases, the OLS estimators follow a normal distribution
  • The variance of the OLS estimator for βj\beta_j is given by: Var[β^j]=σ2i=1n(XjiXˉj)2Var[\hat{\beta}_j] = \frac{\sigma^2}{\sum_{i=1}^n (X_{ji} - \bar{X}_j)^2}, where σ2\sigma^2 is the variance of the error term
  • The standard errors of the OLS estimators are the square roots of their variances and are used for hypothesis testing and constructing confidence intervals

Violations and Consequences

  • Violation of the Gauss-Markov assumptions can lead to biased, inefficient, or inconsistent OLS estimators
  • Omitted variable bias occurs when a relevant variable is excluded from the model, leading to biased and inconsistent estimators
  • Measurement error in the independent variables can cause attenuation bias, where the OLS estimators are biased towards zero
  • Heteroscedasticity (non-constant variance of the error term) leads to inefficient estimators and invalid standard errors
    • Heteroscedasticity can be detected using tests like the Breusch-Pagan test or White test
    • Robust standard errors (e.g., White's heteroscedasticity-consistent standard errors) can be used to obtain valid inference in the presence of heteroscedasticity
  • Autocorrelation (correlation between error terms across observations) leads to inefficient estimators and invalid standard errors
    • Autocorrelation can be detected using tests like the Durbin-Watson test or Breusch-Godfrey test
    • Generalized Least Squares (GLS) or robust standard errors can be used to address autocorrelation

Real-World Applications

  • Linear regression is widely used in various fields, such as economics, finance, social sciences, and natural sciences
  • Examples of applications include:
    • Estimating the impact of education on earnings (Mincer equation)
    • Analyzing the relationship between advertising expenditure and sales
    • Examining the effect of price and income on consumer demand
    • Predicting housing prices based on property characteristics (hedonic pricing models)
    • Assessing the impact of government policies on economic outcomes
  • In practice, researchers must carefully consider the assumptions underlying the linear regression model and address any violations to obtain reliable and meaningful results

Common Pitfalls and How to Avoid Them

  • Misspecification of the functional form assuming a linear relationship when the true relationship is nonlinear can lead to biased estimates
    • Use scatterplots and residual plots to assess the linearity assumption
    • Consider transformations (e.g., logarithmic, polynomial) or nonlinear regression methods if necessary
  • Omitting relevant variables can cause omitted variable bias and lead to incorrect conclusions
    • Carefully consider the theoretical foundations and previous literature to identify important variables
    • Use techniques like stepwise regression or Lasso to select relevant variables
  • Ignoring multicollinearity can lead to unstable and difficult-to-interpret estimates
    • Check for high correlations among independent variables using a correlation matrix
    • Use variance inflation factors (VIF) to detect multicollinearity
    • Consider dropping one of the highly correlated variables or using techniques like principal component analysis (PCA) to combine them
  • Failing to check for heteroscedasticity or autocorrelation can result in invalid inference
    • Always perform diagnostic tests for heteroscedasticity and autocorrelation
    • Use robust standard errors or appropriate estimation methods (e.g., GLS) to address these issues
  • Overfitting the model by including too many independent variables can lead to poor out-of-sample performance
    • Use model selection criteria like adjusted R-squared, AIC, or BIC to balance goodness-of-fit and model complexity
    • Perform cross-validation to assess the model's performance on unseen data


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary