🎳Intro to Econometrics Unit 3 – Econometric Model Design
Econometric model design is a crucial skill for analyzing economic phenomena and testing hypotheses. It combines economic theory, mathematics, and statistical inference to create models that explain relationships between variables. This unit covers key concepts, types of models, and techniques for specification and evaluation.
Students learn about dependent and independent variables, error terms, and estimation methods like OLS. They explore various model types, including linear regression, time series, and panel data models. The unit also covers data collection, model specification techniques, and methods for evaluating model performance and addressing common pitfalls.
Econometrics combines economic theory, mathematics, and statistical inference to analyze economic phenomena and test hypotheses
Dependent variable (Y) represents the outcome or effect being studied and is influenced by independent variables (X)
Independent variables (X) are factors that explain or predict changes in the dependent variable
Stochastic error term (ε) captures the unexplained variation in the dependent variable not accounted for by the independent variables
Ordinary Least Squares (OLS) is a common estimation method that minimizes the sum of squared residuals to find the best-fit line
Hypothesis testing evaluates the statistical significance of estimated coefficients using t-tests or F-tests
Null hypothesis (H0) represents the default assumption that there is no significant relationship between variables
Alternative hypothesis (Ha) suggests a significant relationship exists
Multicollinearity occurs when independent variables are highly correlated with each other, leading to unreliable coefficient estimates
Types of Econometric Models
Simple linear regression models the relationship between one dependent variable and one independent variable: Y=β0+β1X+ε
Multiple linear regression extends simple regression to include multiple independent variables: Y=β0+β1X1+β2X2+...+βkXk+ε
Time series models analyze data collected over regular time intervals (daily stock prices, monthly unemployment rates)
Autoregressive (AR) models use lagged values of the dependent variable as independent variables
Moving Average (MA) models use lagged values of the error term as independent variables
Panel data models combine cross-sectional and time series data (household income across multiple years)
Fixed effects models control for unobserved, time-invariant factors within each cross-sectional unit
Random effects models assume unobserved factors are uncorrelated with the independent variables
Logistic regression models binary dependent variables using a logistic function to estimate probabilities (pass/fail, yes/no)
Data Collection and Preparation
Primary data is collected directly by the researcher through surveys, experiments, or observations
Secondary data is obtained from existing sources such as government databases, financial reports, or published research
Data cleaning involves identifying and correcting errors, inconsistencies, and missing values in the dataset
Outliers are extreme values that can significantly influence the results and should be carefully examined
Transforming variables (logarithmic, square root) can improve model fit and interpretation
Logarithmic transformations are useful for variables with skewed distributions or to interpret coefficients as elasticities
Interaction terms capture the combined effect of two independent variables on the dependent variable
Standardizing variables by subtracting the mean and dividing by the standard deviation allows for comparison of coefficients across different scales
Model Specification Techniques
Economic theory guides the selection of relevant variables and the expected relationships between them
Stepwise regression iteratively adds or removes variables based on statistical criteria (forward selection, backward elimination)
Best subset selection evaluates all possible combinations of independent variables to find the optimal model
Ramsey RESET test assesses whether the functional form of the model is correctly specified
A significant test result suggests the presence of omitted variables or incorrect functional form
Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) balance model fit and complexity to select the best model
Lower AIC or BIC values indicate a better trade-off between fit and parsimony
Dummy variables represent categorical or qualitative factors (gender, region) and take values of 0 or 1
Lagged variables account for the delayed impact of independent variables on the dependent variable (Yt = β0 + β1Xt-1 + ε)
Estimation Methods
Ordinary Least Squares (OLS) is the most common estimation method for linear regression models
OLS assumes the error terms are independently and identically distributed with a mean of zero and constant variance
Maximum Likelihood Estimation (MLE) finds the parameter values that maximize the likelihood function of the observed data
MLE is often used for non-linear models (logistic regression) or when the error terms are not normally distributed
Instrumental Variables (IV) estimation addresses endogeneity issues when independent variables are correlated with the error term
Valid instruments are correlated with the endogenous variable but uncorrelated with the error term
Generalized Method of Moments (GMM) is a more flexible estimation approach that allows for heteroskedasticity and autocorrelation in the error terms
Two-Stage Least Squares (2SLS) is an IV estimation method that first regresses the endogenous variable on the instruments and then uses the predicted values in the main regression
Model Evaluation and Testing
R-squared measures the proportion of variation in the dependent variable explained by the independent variables
Adjusted R-squared penalizes the addition of irrelevant variables and is preferred for model comparison
F-test assesses the overall significance of the model by testing the joint hypothesis that all coefficients (except the intercept) are zero
t-tests evaluate the individual significance of each coefficient, testing the null hypothesis that the coefficient is zero
Durbin-Watson test detects the presence of first-order autocorrelation in the error terms
Test statistic values close to 2 indicate no autocorrelation, while values close to 0 or 4 suggest positive or negative autocorrelation
Breusch-Pagan test checks for heteroskedasticity, which occurs when the variance of the error terms is not constant across observations
Variance Inflation Factor (VIF) measures the degree of multicollinearity among independent variables
VIF values greater than 5 or 10 indicate severe multicollinearity and may require variable transformation or removal
Practical Applications and Case Studies
Demand estimation models the relationship between price, income, and quantity demanded for a product (elasticities)