Date of Award

January 2025

Document Type

Open Access Thesis

Degree Name

Medical Doctor (MD)

Department

Medicine

First Advisor

R. Andrew Taylor

Abstract

Objective: Incarceration is a significant social driver of health and patients with a history of incarceration face systemic healthcare disparities, higher morbidity, mortality, and racialized health inequities. Incarceration status is largely invisible due to poor electronic health record (EHR) capture. In this thesis, we aim to develop, train, and validate a novel natural language processing technique to more effectively identify incarceration status in the EHR and apply this method in the emergency department setting to demonstrate proof of concept and elucidate care process disparities.Methods: The study population consisted of adult patients (≥ 18 y.o.) who presented to the emergency department between June 2013 and August 2021. The EHR database was filtered for notes for specific incarceration-related-terms, and then a random selection of 1,000 notes were annotated for incarceration and further stratified into specific statuses of prior history, recent, and current incarceration. For natural language processing (NLP) model development, 80% of the notes were used to train the Longformer-based and RoBERTa algorithms. The remaining 20% of the notes underwent analysis with GPT-4. The fine-tuned Clinical-Longformer model was subsequently applied to 480,374 notes from the ED setting. Socio-demographics, co-morbidities, and care processes were compared between patients with and without history of incarceration as identified by the LLM. We utilized a multivariable logistic regression to assess independent correlation of incarceration history and care processes in the ED. Results: Manual annotation revealed that 559 of 1000 notes (55.9%) contained evidence of incarceration history. ICD-10 code (sensitivity: 4.8%, specificity: 99.1%, F1-score: 0.09) demonstrated inferior performance to RoBERTa NLP (sensitivity: 78.6%, specificity: 73.3%, F1-score: 0.79), Longformer NLP (sensitivity: 94.6%, specificity: 87.5%, F1-score: 0.93) and GPT-4 (sensitivity: 100%, specificity: 61.1%, F1-score: 0.86). In a separate cohort of 177,987 ED encounters, 1,734 involved patients with a history of incarceration. These patients were more likely to be male, Black, Hispanic, or of other race/ethnicity, unemployed or disabled, and have smoking or substance use histories. Compared to those without incarceration histories, they had higher odds of eloping (OR: 3.59 [2.41–5.12]), leaving AMA (OR: 2.39 [1.46–3.67]), and being subjected to sedation (OR: 3.89 [3.19–4.70]) and restraints (OR: 3.76 [3.06–4.57]). After adjusting for covariates, only the association with elopement remained significant (aOR: 1.65 [1.08–2.43]). Conclusions: Our advanced LLM demonstrates a high degree of accuracy in identifying incarceration status from clinical notes. Leveraging this method to identify highly representative cohorts of patients with history of incarceration presenting to the ED highlights the feasibility of NLP methods for means of identification. This method delineates differences in ED patient characteristics and care processes for individuals with incarceration histories, underscoring the utility of NLP in uncovering care disparities in underserved and stigmatized populations.

Comments

This is an Open Access Thesis.

Open Access

This Article is Open Access

Share

COinS