On Uncertainty Quantification and Bayesian Reasoning in Clinical Applications of Large Language Models

Date of Award

Spring 1-1-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational Biology and Bioinformatics

First Advisor

Taylor, Richard

Abstract

This thesis evaluates clinical uncertainty in black-box large language models (LLMs) through experiments conducted across diverse clinical contexts, leveraging principles of cognitive heuristics and biases. We begin by evaluating an LLM’s ability to manage qualitative uncertainty in a real-world task: deprescribing medications based on ED clinical notes. While the LLM effectively identifies deprescribing criteria, its assigned role adversely influences its reasoning, leading to logical errors and unwarranted recommendations as it attempts to align with the perceived requirements of the task. Next, we investigate how clinical context influences the LLM’s internal model of uncertainty by examining its ability to estimate post-test probabilities following diagnostic test results. Our findings reveal substantial disease-specific variability in the LLM’s estimation of diagnostic test performance, highlighting its limited understanding of real-world clinical probabilities, even within a probabilistic framework. In the final chapter, we integrate these findings in a conversational diagnostic task involving chest pain patients presenting to the emergency department. By evaluating two prompting strategies: (1) removing disease-specific role information and (2) incorporating probabilistic information via prevalence data for rare, life-threatening conditions, we demonstrate that LLMs remain poorly calibrated to real-world clinical probabilities and struggle to adapt their uncertainty models to assigned roles. Although the internal cognitive structures of LLMs differ fundamentally from those of humans, identifying their heuristics and biases through a cognitive framework offers valuable insights into their reasoning strengths and limitations. By uncovering systematic error patterns, we identify gaps in their reasoning processes and propose hypotheses about their internal reasoning structures, which future mechanistic studies could explore. These findings underscore the importance of integrating probabilistic information into LLMs to align them more closely with real-world clinical evidence, enhancing human-AI collaboration and supporting advancements in patient care outcomes.

This document is currently not available here.

Share

COinS