Date of Award

January 2020

Document Type


Degree Name

Medical Doctor (MD)



First Advisor

Richard A. Taylor


In the emergency department (ED), patients are first sorted by acuity in order to prioritize those requiring urgent medical intervention. This sorting process, called "triage", is typically performed by a member of the nursing staff based on the patient's demographics, chief complaint, and vital signs. The Emergency Severity Index (ESI), a 5-level triage system developed in 1999 and most recently updated in 2012, is recommended by the American College of Emergency Physicians as the national standard for triage and has become the most widely utilized triage system in the United States.

With the advent of electronic health records (EHR), healthcare systems now store comprehensive elements of a patient’s medical history in both structured and free-text formats. Concurrent advances in machine learning has enabled us to extract information from large datasets. A new system of triage that uses all elements of a patient’s medical history, including but not limited to prior labs and vitals, hospital usage statistics, and outpatient medications may more accurately identify those requiring urgent care as well as improve patient flow in the ED. Moreover, the process of triage could be extended into the patient’s ED course, by including variables collected during the current visit, to inform the provider’s decision regarding the patient’s ultimate disposition. Finally, given that patient care in the ED is primarily guided by the patient’s chief complaint, a triage system that categorizes patients into a standard ontology of chief complaints may improve patient care. In this paper, we show the blueprint of such a system in two parts.

In Part I, we use gradient boosting and deep neural networks on a dataset of 700,000 adult ED visits from the YaleNewHaven Health System to predict in-hospital admission at the time of triage, 30-day all-cause mortality at the time of discharge, and 72-hour and 9-day ED returns at the time of discharge. We show that machine learning can accurately predict the above outcomes and that the addition of patient history improves predictive performance significantly compared to using triage information alone. We also identify variables of importance for each outcome to create low-dimensional models for implementation into EHR systems.

In Part II, we use Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art natural language processing model, on a dataset of 1.8 million adult and pediatric ED chief complaints to learn contextual embeddings for chief complaints. We show that the learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. Such a model may be used to automatically map free-text chief complaints to structured fields and to derive a standardized, data-driven ontology of chief complaints for healthcare institutions.

It is important to note that our study does not address the implementation and efficacy barriers present in clinical practice. While we propose a low-dimensional model with the explicit intent of facilitating implementation into an EHR system, there is no uniform method by which clinical decision support tools are implemented. Future work will be required to analyze methods of implementation and their effects on patient outcomes, with the ultimate goal of improving patient care in the ED.


This thesis is restricted to Yale network users only. This thesis is permanently embargoed from public release.