Date of Award

January 2011

Document Type

Open Access Thesis

Degree Name

Medical Doctor (MD)



First Advisor

Annette M. Molinaro

Subject Area(s)

Oncology, Biostatistics



Elliot James Rapp, Jena P. Giltnane, David L. Rimm, Annette Molinaro. Department of Biostatistics, Yale School of Public Health, Yale University School of Medicine, New Haven, CT.

Our hypothesis was that prognostic models for breast cancer that incorporate both clinical variables and biomarkers in the PI3 Kinase molecular pathway will improve upon the clinical models of TNM staging and the Nottingham Prognostic Index (NPI). Our specific aim was to develop models that misclassify fewer patients than TNM and NPI with the outcome of dead of disease at ten years. Our population cohort was the YTMA49 cohort, a series of 688 samples of invasive ductal breast carcinoma collected between 1961 and 1983 by the Yale University Department of Pathology. Tissue MicroArray (TMA) analysis was performed and biomarker expression level was determined using Automated Quantitative Analysis (AQUA) technology for thirteen biomarkers in the PI3 Kinase pathway, including an overall expression level and expression levels by subcellular compartment. Eleven clinical variables were also assembled from our cohort. Exhaustively searching the multivariate space, we used logistic regression to predict our outcome of dead of disease at ten years. Validation was performed using Leave One Out Cross Validation (LOOCV). Misclassification estimates provided the means to compare different models, with lower misclassification estimates indicating superior models. Confidence intervals were constructed using bootstrapping with one thousand iterations. We developed a helper computer program named Combination Magic to enable us to develop sophisticated models that included both interactions between variables and transformations of variables (e.g. logarithm).

Overall our best univariate models were NPI (misclassification estimate (ME): 0.326, confidence interval (CI): 0.292 to 0.359), Nodal status (ME: 0.353, CI: 0.322 to 0.493), and TNM (ME: 0.367, CI: 0.313 to 0.447). Our best univariate models from the PI3 Kinase biomarkers were FOX01_NU (ME: 0.369, CI: 0.336 to 0.415), AKT1_TM (ME: 0.373, CI: 0.335 to 0.412), and PI3Kp110_TM (ME: 0.377, CI: 0.343 to 0.431). Our best bivariate models were pTumor*PathER (ME: 0.328, CI: 0.308 to 0.443), pNode + NuGrade (ME: 0.333, CI: 0.305 to 0.434), and AKT1_NN + Fox01_NU (ME: 0.338, CI: 0.307 to 0.391). Our best trivariate models were pTumor + mTOR_NN + PI3Kp110_TM + pTumor*PI3Kp110_TM (ME: 0.296, CI: 0.273 to 0.375), pTumor + AKT1_NU + Fox01_NU + pTumor*AKT1_NU (ME: 0.298, CI: 0.275 to 0.38), and pTumor + mTOR_TM + PI3Kp110_TM + pTumor*PI3Kp110_TM (ME: 0.299, CI: 0.276 to 0.378). Our best multi-variate model was Fox01_NU + AKT1_NU + mTOR_MB + p70S6K_NU + AVG_BCL2_TM + Fox01_NU*AKT1_NU*mTOR_MB (ME: 0.295, CI: 0.274 to 0.393). None of these models was statistically superior to the clinical models of TNM and NPI.