"Interpretable Machine Learning and Causal Inference in Medicine and Pu" by Colleen Elise Chan

Date of Award

Spring 2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Statistics and Data Science

First Advisor

Sekhon, Jasjeet

Abstract

Machine learning (ML) methods have seen rapidly increasing prevalence and adoption across many industries due to their ability to learn complex patterns from large amounts of data with high accuracy. However, these models often lack transparency, operating as opaque ``black boxes’’, which is especially concerning in high-stakes domains. Two key avenues to improve transparency and user trust are 1) developing interpretable ML models that explain their reasoning to users and 2) developing causal methods to estimate the effects of interventions from observational data when randomized controlled trials are infeasible. This dissertation addresses challenges and advances methods in interpretable ML and causal inference, with a focus on applications in medicine and public health. Chapter 1 provides background on interpretable ML, large language models (LLMs), causal inference, and their interconnections, and presents an overview of the dissertation. Chapter 2 illustrates their application in developing an interpretable ML risk prediction model for acute gastrointestinal bleeding from electronic health records. Chapter 3 discusses the growing prominence of LLMs and their potential for enhancing clinical decision support while outlining key challenges around mitigating hallucinations and building clinician trust. It describes GutGPT, an LLM-based system to aid gastrointestinal bleeding management, and details a randomized controlled trial evaluating GutGPT’s impact on physician trust compared to traditional interpretability methods, such as the ML model described in Chapter 2. The dissertation then shifts focus to causal inference, essential for understanding intervention effects using observational data. Chapter 4 introduces nonparametric methods to estimate the potential impact fraction, which quantifies the proportion of cases preventable by modifying an exposure, using individual-level data and aggregated data. Chapter 5 introduces a design-based framework for two-stage randomized experiments to estimate direct and spillover effects under interference. Through novel methods and applications, this work advances the fields of interpretable ML and causal inference to ultimately improve evidence-based decision-making in healthcare.

Share

COinS