Development and validation of a risk prediction model for premenopausal breast cancer in 19 cohorts.
Brantley KD., Jones ME., Tamimi RM., Rosner BA., Kraft P., Nichols HB., O'Brien KM., Adami H-O., Aizpurua A., de Gonzalez AB., Blot WJ., Braaten T., Chen Y., DeHart JC., Dossus L., Elias S., Fortner RT., Garcia-Closas M., Gram IT., Håkansson N., Hankinson SE., Kitahara CM., Koh W-P., Linet MS., MacInnis RJ., Masala G., Mellemkjær L., Milne RL., Muller DC., Park HL., Ruddy KJ., Sandin S., Shu X-O., Tin Tin S., Truong T., Vachon CM., Vatten LJ., Visvanathan K., Weiderpass E., Willett W., Wolk A., Yuan J-M., Zheng W., Sandler DP., Schoemaker MJ., Swerdlow AJ., Eliassen AH.
BACKGROUND: Incidence of premenopausal breast cancer (BC) has risen in recent years, though most existing BC prediction models are not generalizable to young women due to underrepresentation of this age group in model development. METHODS: Using questionnaire-based data from 19 prospective studies harmonized within the Premenopausal Breast Cancer Collaborative Group (PBCCG), representing 783,830 women, we developed a premenopausal BC risk prediction model. The data were split into training (2/3) and validation (1/3) datasets with equal distribution of cohorts in each. In the training dataset variables were chosen from known and hypothesized risk factors: age, age at menarche, age at first birth, parity, breastfeeding, height, BMI, young adulthood BMI, recent weight change, alcohol consumption, first-degree family history of BC, and personal history of benign breast disease (BBD). Hazard ratios (HR) and 95% confidence intervals (CI) were estimated by Cox proportional hazards regression using age as time scale, stratified by cohort. Given that complete information on all risk factors was not available in all cohorts, coefficients were estimated separately in groups of cohorts with the same available covariate information, adjusted to account for the correlation between missing and non-missing variables and meta-analyzed. Absolute risk of BC (in situ or invasive) within 5 years, was determined using country-, age-, and birth cohort-specific incidence rates. Discrimination (area under the curve, AUC) and calibration (Expected/Observed, E/O) were evaluated in the validation dataset. We compared our model with a literature-based model for women