Date of Award

January 2022

Document Type


Degree Name

Medical Doctor (MD)



First Advisor

Sanjay Aneja


Deep learning models are known to be powerful image classifiers and have demonstrated excellent performance on medical image datasets. However, one of their major limitations are that they can sometimes have limited performance on unseen datasets. The difference between model performance on seen and unseen data is known as the generalization gap. It is of value to be able to predict the generalization gap before using the model on unseen data or real-world data. We analyzed 1,696 scanned film mammograms from the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) and 3,306 lung nodule CT images from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). Multiple VGG16 models of varying hyperparameters were trained to predict the presence of malignancy and their performances on the training and validation datasets were used to calculate their respective generalization gaps. A margin signature was calculated at four evenly spaced layers and used in a linear regression along with the training accuracy to predict the generalization gap. The use of margin signatures with training accuracy was able to predict the generalization gap with great accuracy. The adjusted R2 of the models analyzing the breast mammogram dataset was 0.914 whereas the adjusted R2 of the models analyzing the lung nodule CT dataset was 0.912. This represents a promising method to predict model performance in real world clinical settings before implementation and can have great implications in patient safety and aid regulatory approval.


This thesis is restricted to Yale network users only. This thesis is permanently embargoed from public release.