Document Type

Discussion Paper

Publication Date

12-9-2024

CFDP Number

2421

CFDP Pages

70

Journal of Economic Literature (JEL) Code(s)

C11, C51, C55

Abstract

It has become common practice for researchers to use AI-powered information retrieval algorithms or other machine learning methods to estimate variables of economic interest, then use these estimates as covariates in a regression model. We show both theoretically and empirically that naively treating AI- and ML-generated variables as “data” leads to biased estimates and invalid inference. We propose two methods to correct bias and perform valid inference: (i) an explicit bias correction with bias-corrected confidence intervals, and (ii) joint maximum likelihood estimation of the regression model and the variables of interest. Through several applications, we demonstrate that the common approach generates substantial bias, while both corrections perform well.

Included in

Economics Commons

Share

COinS