Date of Award

January 2024

Document Type

Open Access Thesis

Degree Name

Master of Public Health (MPH)


School of Public Health

First Advisor

Leah M. Ferrucci

Second Advisor

Chenxi Huang


Background: Percutaneous coronary intervention (PCI) is widely used for treating coronary arterydisease, but carries the risk of acute kidney injury (AKI). Accurate risk stratification for post-PCI AKI is essential for improving patient outcomes. Traditional AKI risk models do not fully account for complex interactions or the breadth of clinical data available in electronic health records (EHRs). Therefore, we sought to enrich an existing AKI risk prediction model, the National Cardiovascular Data Registry (NCDR) model, by integrating additional risk factors derived from EHR data.

Methods: This retrospective cohort study utilized data from the CathPCI registry and the Yale NewHaven Hospital (YNHH) EHR for patients who underwent PCI between 2018 and 2022. We developed predictive models using multivariable logistic regression, Random Forests, and Gradient Boosting Machine (GBM). Variables were selected through backward selection, and model performance was evaluated using area under the receiver operating characteristic curve (AUC). Data was randomly split into a training and a test set, and the test set was not used in variable selection and model development. Model performance was compared using bootstrapped samples.

Results: Among 8,636 PCI procedures, AKI was observed in 639 cases (7.4%). The incorporation ofEHR-derived variables into traditional logistic regression models resulted in significant improvements in the AUC (p-values <0.001). Specifically, the Random Forest model improved the AUC from 0.840 to 0.853 (p < 0.001), and the Gradient Boosting Machine (GBM) model increased it further to 0.855 (p < 0.001).

Conclusions: Our study confirms that enriching the traditional NCDR AKI risk model with EHRderived variables substantially enhanced predictive accuracy, utilizing both logistic regression andadvanced tree-based machine learning techniques. This integration could facilitate more dynamic and real-time risk stratification, allowing for highly personalized patient care strategies. Future research should focus on multicenter validation and prospective studies to confirm these findings and explore additional variables that could further refine the models' predictive capabilities and their clinical applicability.


This is an Open Access Thesis.

Open Access

This Article is Open Access