Date of Award

January 2019

Document Type

Open Access Thesis

Degree Name

Medical Doctor (MD)



First Advisor

Richard A. Taylor


SEARCHING FOR PHENOTYPES OF SEPSIS: AN APPLICATION OF MACHINE LEARNING TO ELECTRONIC HEALTH RECORDS. Michael J. Boyle (Sponsored by R. Andrew Taylor). Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT.

Sepsis has historically been categorized into discrete subsets based on expert consensus-driven definitions, but there is evidence to suggest it would be better described as a continuum. The goal of this study was to perform an exhaustive search for distinct phenotypes of sepsis using various unsupervised machine learning techniques applied to the electronic health record (EHR) data of 41,843 Yale New Haven Health System emergency department patients with infection between 2013 and 2016. Specifically, the aims were to develop an autoencoder to reduce the high-dimensional EHR data to a latent representation amenable to clustering, and then to search for and assess the quality of clusters within that representation using various clustering methods (partitional, hierarchical, and density-based) and standard evaluation metrics. Autoencoder training was performed by minimizing the mean squared error of the reconstruction. With this exhaustive search, no convincing consistent clusters were found. Various clustering patterns were produced by the different methods but all had poor quality metrics, while evaluation metrics meant to find the ideal number of clusters did not agree on a consistent number but seemed to suggest fewer than two clusters. Inspection of one promising arrangement with eight clusters did not reveal a statistically significant difference in admission rate. While it is impossible to prove a negative, these results suggest there are not distinct phenotypic clusters of sepsis.


This is an Open Access Thesis.

Open Access

This Article is Open Access