Date of Award

January 2024

Document Type


Degree Name

Medical Doctor (MD)



First Advisor

Dennis Shung


Recurrent gastrointestinal bleeding (GIB) occurs in up to 20% of patients hospitalized with GIB and is a major cause of morbidity and mortality. Early identification of recurrent bleeding may improve patient management and outcomes, especially for patients on anticoagulant, antiplatelet, and antithrombotic therapies who experience acute upper GIB. However, criteria for defining recurrent bleeding are complex and require close monitoring for a combination of overt signs of GIB (hematemesis, melena, hematochezia), vital signs, and/or hemoglobin levels. Identifying new or persistent signs of overt GIB can be particularly challenging since descriptions of stool are typically documented in medical notes as free text. Recently, large language models (LLMs) have been shown to outperform traditional natural language processing (NLP) methods at extracting information from medical texts. Few studies have used NLP and machine learning methods to identify overt signs of GIB, and no studies to our knowledge have evaluated the robustness of a LLM approach. We propose an electronic health record (EHR)-based algorithm using LLMs for automated identification of recurrent bleeding after endoscopy in patients hospitalized with GIB.

Training and internal validation were performed on a cohort of 546 patients who presented for acute GIB and underwent endoscopy from July 2014 to June 2023 at an academic medical center. External validation was performed on 562 patients who presented for acute GIB and underwent endoscopy at separate hospitals during the same timeframe. Gold standard labels for recurrent bleeding were derived via manual EHR review. Automated decision rules were constructed based on six recurrent bleeding criteria adapted from international consensus guidelines using heart rate, blood pressure, and hemoglobin trends extracted directly from structured EHR tables, as well as documented signs of overt GIB extracted from nursing plan-of-care notes using a hybrid NLP pipeline based on regular expressions and a LLM (Platypus2-Instruct LLaMA 2). The binary outputs of the automated decision rules were combined in an ensemble decision rule algorithm. A machine learning model was trained on the binary outputs of the automated decision rules and trends in hemoglobin, heart rate, systolic blood pressure, and documented symptoms of overt GIB.

The hybrid NLP pipeline achieved high PPV and sensitivity for identifying persistent melena (PPV = 0.972; sensitivity = 0.900), hematochezia (PPV = 0.900; sensitivity = 0.908), and hematemesis (PPV = 0.871; sensitivity = 0.915). The ensemble decision rule identified recurrent bleeding with a PPV of 0.838, NPV of 0.998, sensitivity of 0.984, and specificity of 0.975. The machine learning model for identification of recurrent bleeding achieved an AUROC of 0.986 on external validation. Compared to the ensemble algorithm at a matched 97.5% specificity, the machine learning model had higher, but not statistically significant, PPV (0.868 vs. 0.838).

An automated EHR-based machine learning algorithm using LLMs can robustly and efficiently identify recurrent bleeding in GIB patients after endoscopy. This model allows ongoing, real-time monitoring for recurrent bleeding, providing the opportunity for more timely identification and intervention in these high-risk patients with less utilization of personnel and resources than in current clinical practice.


This thesis is restricted to Yale network users only. It will be made publicly available on 04/30/2025