🎳Intro to Econometrics Unit 4 – Gauss-Markov Assumptions & OLS Properties
The Gauss-Markov assumptions are crucial for understanding the properties of Ordinary Least Squares (OLS) estimators in linear regression. These assumptions ensure that OLS estimators are the Best Linear Unbiased Estimators (BLUE), providing a foundation for reliable statistical inference.
Linear regression models the relationship between variables, with OLS minimizing the sum of squared residuals. Key concepts include linearity, random sampling, no perfect collinearity, zero conditional mean, and homoscedasticity. Understanding these assumptions helps identify and address potential violations in real-world applications.
Gauss-Markov assumptions a set of conditions that ensure OLS estimators are BLUE (Best Linear Unbiased Estimators)
Linear regression a statistical method for modeling the linear relationship between a dependent variable and one or more independent variables
Ordinary Least Squares (OLS) a method for estimating the parameters of a linear regression model by minimizing the sum of squared residuals
Estimator a rule or formula used to estimate the value of an unknown parameter based on sample data
Unbiasedness an estimator is unbiased if its expected value is equal to the true value of the parameter being estimated
Efficiency an estimator is efficient if it has the smallest variance among all unbiased estimators
Homoscedasticity the assumption that the variance of the error term is constant across all observations
Multicollinearity the presence of high correlation among independent variables in a regression model
Linear Regression Basics
Linear regression models the relationship between a dependent variable and one or more independent variables
The goal is to find the line of best fit that minimizes the sum of squared residuals (differences between observed and predicted values)
Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables
The regression equation is represented as Y=β0+β1X1+β2X2+...+βkXk+ε, where:
Y is the dependent variable
β0 is the intercept
β1,β2,...,βk are the slope coefficients for each independent variable
X1,X2,...,Xk are the independent variables
ε is the error term
The slope coefficients represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant
The intercept represents the value of the dependent variable when all independent variables are zero
The Gauss-Markov Assumptions Explained
The Gauss-Markov assumptions ensure that OLS estimators are BLUE (Best Linear Unbiased Estimators)
Assumption 1: Linearity the relationship between the dependent variable and independent variables is linear
Assumption 2: Random sampling observations are randomly sampled from the population
Assumption 3: No perfect collinearity no independent variable is a perfect linear combination of other independent variables
Assumption 4: Zero conditional mean the expected value of the error term is zero given any values of the independent variables (E[ε∣X]=0)
Assumption 5: Homoscedasticity the variance of the error term is constant across all levels of the independent variables (Var[ε∣X]=σ2)
Violation of this assumption is called heteroscedasticity
When these assumptions hold, OLS estimators are BLUE, meaning they are unbiased and have the smallest variance among all linear unbiased estimators
Ordinary Least Squares (OLS) Method
OLS is a method for estimating the parameters of a linear regression model by minimizing the sum of squared residuals
The OLS estimators for the slope coefficients and intercept are obtained by solving the minimization problem: minβ0,β1,...,βk∑i=1n(Yi−β0−β1X1i−...−βkXki)2
The OLS estimators for the slope coefficients are given by: β^j=∑i=1n(Xji−Xˉj)2∑i=1n(Xji−Xˉj)(Yi−Yˉ), where Xˉj and Yˉ are the sample means of Xj and Y, respectively
The OLS estimator for the intercept is given by: β^0=Yˉ−β^1Xˉ1−...−β^kXˉk
OLS estimators are consistent and asymptotically normal under the Gauss-Markov assumptions
Properties of OLS Estimators
Under the Gauss-Markov assumptions, OLS estimators have several desirable properties:
Unbiasedness the expected value of the OLS estimator is equal to the true value of the parameter (E[β^j]=βj)
Efficiency OLS estimators have the smallest variance among all linear unbiased estimators (Gauss-Markov Theorem)
Consistency as the sample size increases, the OLS estimators converge in probability to the true parameter values
Asymptotic normality as the sample size increases, the OLS estimators follow a normal distribution
The variance of the OLS estimator for βj is given by: Var[β^j]=∑i=1n(Xji−Xˉj)2σ2, where σ2 is the variance of the error term
The standard errors of the OLS estimators are the square roots of their variances and are used for hypothesis testing and constructing confidence intervals
Violations and Consequences
Violation of the Gauss-Markov assumptions can lead to biased, inefficient, or inconsistent OLS estimators
Omitted variable bias occurs when a relevant variable is excluded from the model, leading to biased and inconsistent estimators
Measurement error in the independent variables can cause attenuation bias, where the OLS estimators are biased towards zero
Heteroscedasticity (non-constant variance of the error term) leads to inefficient estimators and invalid standard errors
Heteroscedasticity can be detected using tests like the Breusch-Pagan test or White test
Robust standard errors (e.g., White's heteroscedasticity-consistent standard errors) can be used to obtain valid inference in the presence of heteroscedasticity
Autocorrelation (correlation between error terms across observations) leads to inefficient estimators and invalid standard errors
Autocorrelation can be detected using tests like the Durbin-Watson test or Breusch-Godfrey test
Generalized Least Squares (GLS) or robust standard errors can be used to address autocorrelation
Real-World Applications
Linear regression is widely used in various fields, such as economics, finance, social sciences, and natural sciences
Examples of applications include:
Estimating the impact of education on earnings (Mincer equation)
Analyzing the relationship between advertising expenditure and sales
Examining the effect of price and income on consumer demand
Predicting housing prices based on property characteristics (hedonic pricing models)
Assessing the impact of government policies on economic outcomes
In practice, researchers must carefully consider the assumptions underlying the linear regression model and address any violations to obtain reliable and meaningful results
Common Pitfalls and How to Avoid Them
Misspecification of the functional form assuming a linear relationship when the true relationship is nonlinear can lead to biased estimates
Use scatterplots and residual plots to assess the linearity assumption
Consider transformations (e.g., logarithmic, polynomial) or nonlinear regression methods if necessary
Omitting relevant variables can cause omitted variable bias and lead to incorrect conclusions
Carefully consider the theoretical foundations and previous literature to identify important variables
Use techniques like stepwise regression or Lasso to select relevant variables
Ignoring multicollinearity can lead to unstable and difficult-to-interpret estimates
Check for high correlations among independent variables using a correlation matrix
Use variance inflation factors (VIF) to detect multicollinearity
Consider dropping one of the highly correlated variables or using techniques like principal component analysis (PCA) to combine them
Failing to check for heteroscedasticity or autocorrelation can result in invalid inference
Always perform diagnostic tests for heteroscedasticity and autocorrelation
Use robust standard errors or appropriate estimation methods (e.g., GLS) to address these issues
Overfitting the model by including too many independent variables can lead to poor out-of-sample performance
Use model selection criteria like adjusted R-squared, AIC, or BIC to balance goodness-of-fit and model complexity
Perform cross-validation to assess the model's performance on unseen data