Date of Award

January 2022

Document Type


Degree Name

Medical Doctor (MD)



First Advisor

Jonathan Grauer

Second Advisor

Michael O'Brien


The American College of Surgeons National Surgical Quality Improvement Program (NSQIP) database is a national quality improvement database that is frequently used for outcomes studies in fields such as orthopedics. However, a number of variables collected by this database changed between 2010 and 2011, coinciding with a rapid increase in the number of patients included per year. The effects of such changes are studied in this thesis to better understand the impact of variable definitions, data collection practices, and sample populations on related research.

The aim of this thesis was to address the above-noted questions in the context of three, previously published, novel studies conducted by the authors of this thesis, as well as a review of literature related to missing data treatment and cross-database comparisons.

Studies 1 and 2 aimed to investigate the influence of changing data elements and growth of the NSQIP database on the results of longitudinal lumbar fusion outcomes studies and total hip arthroplasty (THA) studies, respectively. In Study 1, the NSQIP database was retrospectively queried to identify 19,755 patients who underwent elective posterior lumbar fusion surgery with or without interbody fusion between 2005 and 2014. In Study 2, NSQIP was similarly queried to identify 102,411 THA patients from 2005 to 2015. Patients were split into two groups based on year of surgery. Demographic data and 30-day perioperative outcomes were compared between the groups. As an example analysis, multivariate Poisson regression was used to determine the correlation between age and perioperative outcomes for each group. In both studies, a number of preoperative characteristics and perioperative outcomes were significantly different, and multivariate analyses differed, between era groups.

Study 3 aimed to investigate the influence of changes in the NSQIP database on the calculation of the modified Frailty Index (mFI) and the modified Charlson Comorbidity Index (mCCI) for posterior lumbar fusion studies. The mFI was calculated for each patient using three methods: treating conditions for which data was missing as not present, dropping patients with missing values, and normalizing by dividing the raw score by the number of variables collected. The mCCI was calculated by the first two of these methods. Mean American Society of Anesthesiologists (ASA) scores were used for comparison. Systematic changes in the NSQIP database resulted in missing data for many of the variables included in the mFI and mCCI, and differing methods of calculation yielded different results.

These three studies add to the body of existing literature showing differing results for outcomes studies conducted with varying methods of missing data treatment, as well as studies conducted using different large, national databases. As a whole, these studies suggest that systematic changes in the NSQIP database through the years can lead to different study results and conclusions. While studied in one specific dataset, NSQIP, the principles highlighted here apply to large database studies in general, and should be considered whenever possible when perfomring related studies.


This thesis is restricted to Yale network users only. This thesis is permanently embargoed from public release.