Machine Learning and Instrumental Variables Analysis

with Edward Rubin and Glen Waddell

 
  • Machine learning (ML) primarily evolved to solve “prediction problems.” The first stage of two-stage least squares (2SLS) is a prediction problem---suggesting gains from utilizing ML in 2SLS's first stage. However, little guidance exists on when ML helps 2SLS---or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings---and their interaction. Through simulation, we show linear ML methods (eg post-Lasso) work “well”, while nonlinear methods (eg random forests, neural nets) generate substantial bias in second-stage estimates---some exceeding the bias of endogenous OLS.

  • Soon to come…

 
Previous
Previous

The Role of Property in Wildfire Suppression Costs: Revisiting Cost Prediction Models

Next
Next

Sorting Over Wildfire Risk in the WUI