Date of Award
Spring 1-1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Management
First Advisor
Barberis, Nicholas
Abstract
This dissertation consists of three essays on machine learning in finance, focusing on its applications to understanding investor behavior and asset prices. The first essay is titled "Professional Investors and Media Narratives." I investigate the impact of media narratives on the portfolio strategies of active equity mutual funds. Using 1.5 million Wall Street Journal articles from 1984 to 2023, I use ChatGPT to distill media narratives into 59 distinct topics, and quantify each topic's time-varying share of news attention and sentiment. I then define a fund as having exposure to a topic if it overweights stocks expected to perform well when the topic grows in importance, and hence attention. I find that the topics that fund managers choose to have high exposure to are high-sentiment topics, but not those with high attention. This strategy leads to mutual fund underperformance but attracts investor flows. Topic-oriented strategies account for a large fraction, specifically 37%, of mutual fund tilts, and are a key driver of the underperformance associated with active tilts. The second essay, joint with Bryan Kelly and Semyon Malamud, is titled "The Virtue of Complexity in Return Prediction." Much of the extant literature predicts market returns with "simple" models that use only a few parameters. Contrary to conventional wisdom, we theoretically prove that simple models severely understate return predictability compared to "complex" models in which the number of parameters exceeds the number of observations. We empirically document the virtue of complexity in US equity market return prediction. Our findings establish the rationale for modeling expected returns through machine learning. The third essay, joint with Bryan Kelly and Semyon Malamud, is titled "The Virtue of Complexity Everywhere." We investigate the performance of non-linear return prediction models in the high complexity regime, i.e., when the number of model parameters exceeds the number of observations. We document a "virtue of complexity": Return prediction R-squared and optimal portfolio Sharpe ratio generally increase with model parameterization in all asset classes that we study (US equities, international equities, bonds, commodities, currencies, and interest rates). The virtue of complexity is present even in extremely data-scarce environments, e.g., for predictive models with less than twenty observations and tens of thousands of predictors. The empirical association between model complexity and out-of-sample model performance exhibits a striking consistency with theoretical predictions.
Recommended Citation
Zhou, Kangying, "Machine Learning in Finance: Applications to Understanding Investor Behavior and Asset Prices" (2025). Yale Graduate School of Arts and Sciences Dissertations. 1686.
https://elischolar.library.yale.edu/gsas_dissertations/1686