Skip to Main Content
An official website of the United States government
Epidemiology and Genomics Research Program

Colorectal Cancer Pooling Project (C2P2)

Project Title

Colorectal Cancer Pooling Project (C2P2)

Primary Contact Information

Peter T. Campbell


Albert Einstein College of Medicine

Cancer Prevention Studies (CPS I, CPS II, CPS III, & CPS II Nutrition Cohort)

Alternate Contact Information

Marc Gunter



European Prospective Investigation into Cancer and Nutrition (EPIC)

Project Details


Conditional on ACS intramural approval, we plan to have funding to support each cohort's efforts toward data preparation. At present, existing ACS funds will support data harmonization and analysis of existing data. This work will be aligned with ongoing projects to avoid duplication of efforts (e.g., GECCO and Diabetes and Cancer Initiative, through sharing computer code to harmonize data). After we demonstrate feasibility and scientific relevance (through data harmonization, analysis and publishing papers), we aim to identify new blood-based biomarkers (e.g., gut dysbiosis, systemic inflammation, metabolome and proteome) and submit a grant for extramural funding or secure ACS intramural funding.

Colorectal cancer (CRC) is a leading cause of morbidity and mortality worldwide. From 1975 to 2015, CRC incidence and mortality rates declined overall by approximately 37% and 50%, respectively, in the U.S. ( These trends are largely attributed to uptakes in early detection, temporal changes in CRC risk factors, and improved CRC treatments (for mortality) (PMID:28248415). However, these rate reductions are observed only among adults aged 50 years and older, for whom CRC screening/early detection is broadly recommended. Conversely, CRC incidence rates in men and women aged 20-49 years ('early-onset') have been increasing since the late 1980s in the U.S. and in other areas of the world, including Asia, Europe and Australia (PMID: 26667886). From 1988 to 2015 in the U.S., age-adjusted early-onset CRC incidence rates increased from 7.9 to 12.9 cases per 100,000 people---corresponding to a 63% increase, with less discernible increases in mortality ( Early-onset CRC patients, compared to older patients, are more likely to be diagnosed with distant-metastatic disease (26% versus 19%) and with rectal cancer (37% versus 28%) ( Limited data also suggest that tumors from early-onset CRC patients are histologically aggressive and possess distinct somatic mutational features (PMID: 29615301). Collectively, these data suggest that early-onset CRC may have unique etiologic profiles that distinguish it from older-onset disease.

Although recent trends for early-onset CRC are concerning, they comprise only ~5%-10% of all CRC cases. If these trends continue, however, these proportions are expected to ~double by 2030 (PMID: 25372703). The etiology of early-onset CRC is generally poorly understood: estimates suggest that 15-30% of early onset CRC is explained by the high penetrance genetic syndromes, including Familial Adenomatous Polyposis and Lynch syndrome (PMID: 27978560, 29146522, 30520562). An additional <=20% of early-onset CRC is attributed to familial CRC, a situation where a positive family history for CRC is reported but no known germline variants/mutations are identified. The remaining >=50% of early-onset CRC is unexplained by cancer predisposition syndromes or familial clustering (PMID: 30520562). This group is predominantly responsible for the recent trends in early-onset CRC, albeit without clear explanation.

Prospective cohort and case-control studies have identified many risk factors associated with CRC incidence overall, including body mass index (BMI), waist circumference, height, physical inactivity, diabetes, cigarette smoking, lack of NSAID/aspirin use, red/processed meat intake, Western diet pattern, low fiber/fruit/vegetable/calcium intake, alcohol use, family history of CRC, and inflammatory bowel disease. Of these risk factors, increasing trends in BMI, diabetes, and physical inactivity somewhat mirror the trends in early-onset CRC, but they are insufficient alone to explain these trends (e.g., these risk factors are associated with higher risks of colon than rectal cancers, and more-so for men than women; however, the increasing trends are steeper for rectal than colon cancers and ~equal in men and women). More provocative hypotheses include secular changes to a diverse range of factors beginning decades ago that may have resulted in long-term changes to the gut microbiota and/or gene methylation patterns (PMID: 30659375; 30396471). To date, these potential risk factors are unexplored in the context of early-onset CRC.

This proposal will clarify the associations of established CRC risk factors overall and, most importantly, stratified by age at diagnosis [<50y vs. >=50y (and <50y vs. 50-65y vs. >65y, if numbers allow)] to better understand the etiology of early-onset disease. This work will also create a resource for future studies of blood-based biomarkers.

To understand the role of modifiable (BMI, waist circumference, physical inactivity, cigarette smoking, lack of NSAID/aspirin use, red/processed meat intake, Western diet pattern, low fiber/fruit/vegetable/calcium intake, alcohol use, sleep patterns, antibiotic use) and non-modifiable (height, diabetes, allergies, family history of CRC, and inflammatory bowel disease) risk factors for early-onset versus late-onset CRC and to create a resource for future studies of blood-based biomarkers of CRC.

1) Describe the demographic, clinical, molecular pathological and epidemiologic characteristics of early-onset versus late-onset CRC.
2) Investigate associations of emerging and established risk factors (as defined above in Overall goal) with CRC overall and when stratified by early- vs. late-onset disease.
3) Explore opportunities to investigate more novel risk factors that may have been captured by some cohorts, including antibiotic use, allergies, newer medications, and sleep.
4) Take a census of pre-diagnostic blood samples from early-onset cases and gauge interest in future studies of metabolomics, proteomics, inflammation, and gut dysbiosis.

All cohort studies that have identified a minimum of 100 verified, invasive-CRC cases (ICD10: C18, C19, C20; ICD9: 153, 154) will be invited to participate in this project, regardless of the age distribution of cases. We recognize that early-onset cases will be relatively rare.

After the appropriate Data Transfer/Collaborative Agreements are in place, participating cohort studies will be requested to share their anonymyzed individual-level study data with the American Cancer Society (ACS) Epidemiology program through a secure web portal. ACS staff will use previously established (e.g., GECCO and the Diabetes and Cancer Initiative) study-specific pseudo-code and data dictionaries to harmonize all study data. Studies that have not participated in these previous projects will be still invited to participate; harmonization of new data will follow similar iterative processes as the previous efforts (e.g., Maelstrom Research guidelines). Each study will be invited to nominate two members (or more, if interest is high) to the Analytic Working Group.

Descriptive statistics (e.g., categorical counts, means) will accomplish Aim 1. Cox PH regression models stratified on attained age will accomplish Aim 2 via two complementary approaches. First, we will (a) estimate study-specific hazard ratios (HRs) between a given exposure and outcome using harmonized data and (b) pool these results to estimate summary HRs using random effects models. Secondly, we will pool data from all cohorts into one dataset and conduct analyses on the combined dataset. Regardless of the approach, Cox PH models that are stratified on attained age ensure maximal use of follow-up time and age-stratum appropriate inferences from each HR. Heterogeneity in associations will be tested via fixed-effects meta-analyses or joint/competing risks Cox models, as appropriate.

Early-onset CRC is rare and we anticipate no cohort will have more than 200 cases. We further anticipate weak-to-moderate HRs (e.g., 1.2-2) for exposure-disease associations. Thus, this important research question is best-addressed through a large, collaborative approach to identify meaningful associations, maximize precision, and reduce potential biases.

A minimum of 100 CRC cases will be required from each cohort to participate, regardless of the age distribution of those cases. At present, most established cohort studies have identified at least 2000 to 4000 CRC cases. Assuming conservative estimates of 30 participating studies with 2000 CRC cases each and 5% early-onset cases, we will have 3000 early-onset cases in C2P2. In turn, with 3000 cases, 60,000 non-cases, and assuming 80% power with a 2-tailed test and exposure prevalences ranging from 0.1 and 0.5, our minimal detectable HRs are 1.18 (or 0.83) and 1.11 (or 0.90), respectively.

If a cohort has data on the existence of high penetrance germline syndromes for CRC, we will exclude those cases. This will be rare.

For older-onset CRC, and probably the middle age group (50-65 y), we will be able to stratify participants by sex and site in the colo-rectum to better define these exposure-subgroup associations. If numbers allow with early-onset CRC, we will further stratify by sex and site too.

Colon or rectal cancer outcome: ICD diagnosis code, histology, grade and stage (if available). Time at risk. Age at diagnosis. Date/age at death or end of study/censoring.

Optional tumor molecular phenotype data, including: MSI status (stable, low, high; MSI markers measured; MSI markers unstable), dMMR status (deficient, proficient; IHC proteins measured; affected protein(s)), CIMP status (positive, low, negative; markers measured; markers methylated), BRAF/KRAS mutation status.

BMI and smoking status at study entry. Ideally, updated body weight, height (separate from BMI), waist circumference, updated smoking status and smoking duration and intensity data would be also available. Age at study entry. Alcohol intake. Sex.

Race/ethnicity, physical activity (Met/hrs), diabetes, diabetes meds, history of medical conditions (stroke, MI, hypertension, lung disease), diet (e.g. red/processed meat, Western diet pattern, servings per week of fruit/vegetables, total caloric intake, milk, calcium, folate, etc), family history of CRC, IBD, screening/early detection history, NSAID/aspirin use, sleep, antibiotic use. Germline mutation status (if known).