Date of Award

January 2022

Document Type


Degree Name

Medical Doctor (MD)



First Advisor

Sanjay Aneja


The application of machine learning methods to challenges in medicine, with the hope of enabling precision medicine, is a topic that is ever-growing in popularity. One area where machine learning could show significant clinical utility is in prognostics, whether in the realm of modelling patient survival or predicting the likelihood of success of certain therapeutics in specific patients. In the realm of breast cancer treatment, new algorithmic approaches to predict pathologic complete response (PCR) to neoadjuvant chemotherapy (NAC) could be critical for refining treatment guidelines. We hypothesized that a convolutional neural network (CNN) using pre-treatment MRI imaging as input would be able to accurately predict PCR following NAC.

We analyzed 126 tumors from distinct patients treated with neoadjuvant chemotherapy for T3-stage breast tumors at a single institution from 2002 to 2006, with 3780 MRI slices analyzed in total. Each slice was used as an individual input for our single-input CNN. The primary outcome variable of interest was PCR. After training our 2-D CNN over 50 epochs on the training set, we tested the model's ability to predict PCR on the 30% of samples reserved as the test set.

Average model prediction accuracy using a single phase of contrast (pre-, immediate post-, or delayed post-) was 90.4% with model accuracy increasing to 97.6% when discordant cases were excluded. Stratifying the test set by age and tumor size showed that despite differences in PCR rates among these subsets, the model performed similarly. However, when stratified by hormone receptor profile, the model performed significantly less accurately on the triple-negative subset, which had a higher PCR rate. Overall, our findings represent the promise of deep learning algorithms in providing personalized prognostic data for physicians and patients regarding the utility of NAC prior to beginning treatment for breast cancer.


This thesis is restricted to Yale network users only. This thesis is permanently embargoed from public release.