Breast Cancer Risk Prediction Modeling Project Group

Project Details

Project Title

Project Type

Standalone Project

Project Group

(N/A)

Project Status

Active

Primary Contact Information

Name

Phillip Kraft

Title within Organization

Senior Investigator

Email

phillip.kraft@nih.gov

Organization / Institution

Division of Cancer Epidemiology and Genetics, National Cancer Institute

Cohort Affiliation

Alternate Contact Information

Name

Mia Gaudet

Title within Organization

Senior Scientist

Email

mia.gaudet@nih.gov

Organization / Institution

DCEG/NCI

Cohort Affiliation

N/A

Additional Point(s) of Contact

Nilanjan Chatterjee: nchatte2@jhu.edu
Montserrat Garcia-Closas: montse.garcia-closas01@icr.ac.uk

Project Details

Tumor Site(s)

Breast

Plan for Funding

We have submitted a pre-application for funding to the Department of Defense Breast Cancer Research Program that would support data preparation by US-based cohorts. We will seek additional funding to support non-US cohorts interested in participating.

Background and Significance

Breast cancer risk models are used in clinical settings to identify women at elevated risk of developing breast cancer who could benefit from preventive therapies or enhanced screening. Subtype-specific risk estimation is important since the effectiveness of these strategies varies by subtype (Muñoz et al. 2014; Cuzick J et al. 2013). However, current models only predict the risk of overall disease (Cintolo Gonzalez et al. 2017). Of particular importance is evaluation of risk models across multiple ethnic/racial populations, as most research to date has been conducted in women of European descent. We have shown that polygenic risk scores (PRS), which aggregate risk from multiple common susceptibility loci, can substantially improve risk stratification for breast cancer overall and by ER status (Mavaddat et al. 2015). Integrated risk models incorporating information on genetics and other known risk factors such as reproductive/hormonal history, lifestyle factors and mammographic density, are important to improve our ability to identify women at elevated risk of breast cancer (Garcia-Closas et al. 2014).

An important impediment for the continued development and improvement of risk models is the availability of large study populations with information on a comprehensive set of risk factors for estimation of risk, as well as independent study populations for the validation of risk models. The large sample sizes required to obtain accurate estimates of risk can only be attained in pooled analyses or meta analyses of multiple studies. This is particularly important for subtype-specific risk. We have previously developed a breast cancer risk model using relative risks from a multivariate logistic regression analysis of eight prospective cohort studies in women on European descent, 50 years of age or older (Maas et al. 2013a). We used a flexible modeling approach that integrated information on classical risk factors and PRS for overall breast cancer risk prediction (Maas et al. 2013b). Further model development and validation is required for subtype-specific risk (e.g. in-situ and invasive breast cancer by ER status) in women of different ethnicities with a broader age range. Additional improvements in model development include incorporating age interactions, accounting for temporal changes in modifiable risk factors (e.g. BMI and HRT use), as well as incorporating improved subtype-specific PRS based on recent GWAS discoveries, mammographic breast density and benign breast disease.

The proposed large-scale project will greatly advance and transform clinical risk management by delivering a much-needed tool for overall and subtype-specific risk prediction in women across race/ethnicity groups. The risk tool will integrate a comprehensive set of risk factors, including reproductive and lifestyle factors, genetic testing, and mammographic density.

References:
Cintolo Gonzalez JA et al. Breast Cancer Res Treat. 2017;164(2):263-284
Cuzick J et al. The Lancet. 2013; 381(9880):1827-1834.
Garcia-Closas M, et al. J Natl Cancer Inst. 2014;106(11)
Maas P, et al. JAMA Oncol. 2016a;2(10):1295-1302.
Maas P, et al N 2016b https://www.biorxiv.org/content/early/2016/10/12/079954/.
Mavaddat N et al. J Natl Cancer Inst. 2015 Apr 8;107(5).
Muñoz D et al.. J Natl Cancer Inst. 2014 Nov; 106(11): dju289.

Overall Goal

The overall goal of this proposal is to pool data from prospective cohort studies to develop and validate overall and subtype-specific breast cancer risk models for risk-based prevention and screening.

Specific Aims

To build an integrated breast cancer risk prediction model for overall and subtype-specific risk through a large-scale pooled analysis of prospective cohort studies, including women of European, Asian, African and Hispanic/Latino ancestry.

Design and Analysis Plan

We will conduct a pooled analysis of data from participating cohorts. We propose to build a Cloud hosted data environment to facilitate data governance agreed by the collaborative network, data aggregation/harmonization, data sharing and analyses through remote access. We will work in collaboration with investigators leading existing breast cancer pooling projects and use the Cohort Metadata Repository (CMR) to avoid duplication of efforts in data cleaning and harmonization. Data management and analysis working groups will be formed with interested cohort PIs/delegates.

Proportional hazard models will be used to build risk models. Subtypes will be defined as in-situ vs invasive, ER-positive vs ER-negative and by other clinically relevant features( aggressive vs non-aggressive phenotype, interval vs screen-detected). We will use a novel statistical method (gmeta) to combine models from cohorts with different sets of risk factors available (https://arxiv.org/abs/1708.03818). This will allow efficient use of data from all cohorts, without the need to restrict analyses to cohorts with information on all variables of interest. Risk model development and validation will be conducted using the Individualized Coherent Absolute Risk Estimation (iCARE) R package (https://www.biorxiv.org/content/biorxiv/early/2016/10/12/079954.full.pdf). This tool will integrate information on 1) relative risk estimates for risk factors including polygenic risk scores, 2) population specific disease and mortality rates and 3) risk factor distributions, specific to a given target population. This tool efficiently accommodates for missing data on risk factors for individualized risk projection by utilizing the risk factor distribution in the target population. The iCARE package has been extended to include a validation component implementing standardized model validation methods (Choudhury et al. In preparation). Cohorts will be divided into training and test sets for model calibration and discrimination.

Why a Cohort Consortium Approach is Necessary

Development and validation of risk models for subtype and ethnic specific risk prediction requires very large sample sizes to obtain precise estimates of relative risk. This can only be attained through large pooling projects. Prospective design is required to minimize biases, evaluate changing exposures and for prospective validation of risk scores.

Minimum Number of Cases Per Cohort Needed to Answer Primary Aim

We would include only studies with a minimum of 200 incident breast cancer cases with information on pathology characteristics of the tumors (at least ER status).

Required Outcome Data

Incident breast cancer (in-situ and invasive) with variables to define clinically relevant subtypes: age at diagnosis, stage, grade and histology, ER, PR, HER2, Ki67, mode of detection, if available. Follow up variables: age of study exit (i.e. last contact/linkage or death); for breast cancer cases, time to event will be defined as the time in years from study entry to breast cancer diagnosis; for non-cases, time to event will be defined as the time in years from study entry to study exit.

Required Exposure Data

Reproductive and hormone history, height, pre- and post- menopausal BMI, weight change, alcohol, smoking, diet, physical activity, family history of cancer, benign breast disease, mammographic breast density and PRS. Data will be collected at baseline, and updated follow up if available. The level of detail of data will be discussed and agreed with participating cohorts. Having data on all risk factors (including genotypes) is not required.

Required Covariate Data

Age at study entry, study center (if applicable), race/ethnicity

Are Biospecimens Required?

Last Updated: 05 Jan, 2026