Skip to Main Content
An official website of the United States government
Epidemiology and Genomics Research Program

NCI Cohort Consortium 2015 Annual Meeting

The annual NCI Cohort Consortium meeting, sponsored by EGRP and the Division of Cancer Epidemiology and Genetics (DCEG), was held on November 4-6, 2015, at the Natcher Conference Center on the NIH campus in Bethesda, MD. Project/Working Group meetings were also held during this time.


Day 1: Wednesday, November 4, 2015

Refer to Working Group Schedule
Time Event
8:00 a.m. - 9:00 a.m. Registration
9:00 a.m. - 6:00 p.m. Cohort Consortium Working Group Meetings
6:00 p.m. Adjourn

Day 2: Thursday, November 5, 2015

Balcony A and B
Time Event
7:30 a.m. - 8:00 a.m. Registration
8:00 a.m. - 9:00 a.m.

Session I: Welcome and Introductions

Elio Riboli, MD, HonFPH, FMedSci., Imperial College London
8:30 a.m. - 8:45 a.m. Opening Remarks
Stephen Chanock, MD, National Cancer Institute
Debbie Winn, PhD, National Cancer Institute
Kathy Helzlsouer, MD, MHS, National Cancer Institute
8:45 a.m.- 9:00 a.m. NIH Precision Medicine Initiative Update
Debbie Winn, PhD, National Cancer Institute
9:00 a.m. - 10:00 a.m.

Session II: Microbiome

Leslie Bernstein, PhD, City of Hope
Meir Stampfer, MD, DrPH, Harvard School of Public Health
9:00 a.m. - 9:20 a.m. Human Microbiome and Orodigestive Cancers
Jiyoung Ahn, PhD, NYU School of Medicine
9:20 a.m. - 9:40 a.m. The Human Gut Microbiome as a Modular of Adiposity-Related Biomarkers in the Multiethnic Cohort (MEC): Initial Insights
Meredith Hullar, PhD, Fred Hutchinson Cancer Research Center
9:40 a.m. - 10:00 a.m. Open Discussion
10:00 a.m. - 10:30 a.m. Break
10:30 a.m. - 11:30 a.m.

Session III: Cohort Studies of Cancer Survivorship and Related Outcomes

Wei Zheng, MD, PhD, Vanderbilt University
Joanne Elena, PhD, MPH, National Cancer Institute
10:30 a.m. - 10:50 a.m. WHI's Life and Longevity After Cancer Study (LILAC)
Bette Caan, PhD, Kaiser Permanente
10:50 a.m. - 11:10 a.m. Childhood Cancer Survivor Cohorts: Lessons from CCSS and St. Jude LIFE
Leslie Robison, PhD, St. Jude Children's Research Hospital
11:10 a.m. - 11:30 a.m. Open Discussion
11:30 s.m. - 1:15 p.m. Session IV (Natcher Atrium)
11:30 a.m. - 12:30 p.m. Poster Session
Presenters are asked to be by their posters during this time
12:30 p.m. - 1:15 p.m. Lunch
Attendees are responsible for meals and light refreshments on their own, at their own cost. Posters will remain on display during lunch.
1:15 p.m. - 2:15 p.m.

Session V: Data Harmonization

Mattias Johannson, PhD, International Agency for Research on Cancer
Marc Gunter, PhD, Imperial College London
1:15 p.m. - 1:35 p.m. From Individual Cohorts to Collaborative Data Sharing Infrastructures
Isabel Fortier, PhD, McGill University
1:35 p.m. - 1:55 p.m. Rethinking CEC Data: the California Teachers Study
James Lacey, PhD, City of Hope
1:55 p.m. - 2:15 p.m. Open Discussion
2:15 p.m. - 2:30 p.m. Break
2:30 p.m. - 5:30 p.m

Session VI: Cohort Consortium Signature and Related Projects

Anthony Swerdlow, D., PhD, DSc, University of London, Institute of Cancer Research
Susan Gapstur, MPH, PhD, American Cancer Society, Inc.
2:30 p.m. - 3:00 p.m. OncoArray/GAME ON Initiative
Peter Kraft, PhD, Harvard School of Public Health
3:00 p.m. - 3:10 p.m. Discussion
3:10 p.m. - 3:40 p.m. Lung Cancer Cohort Consortium (LC3)
Paul Brennan, PhD, International Agency for Research on Cancer
3:40 p.m. - 3:50 p.m. Discussion
3:50 p.m. - 4:20 p.m. Ovarian Cancer Cohort Consortium (OC3)
Shelley Tworoger, PhD, Harvard Medical School
Nicolas Wentzensen, PhD, National Cancer Institute
4:20 p.m. - 4:30 p.m. Discussion
4:30 p.m. - 4:45 p.m. Open Mic: Ideas for New Signature Studies
4:45 p.m. - 5:30 p.m. Networking Opportunity, Posters
5:30 p.m. Adjourn

Day 3: Friday, November 6, 2015

Refer to Working Group Schedule
Time Event
9:00 a.m. - 12:00 p.m. Cohort Consortium Working Group Meetings
12:00 p.m. Adjourn

Cohort Consortium Project / Working Group Meeting Agenda

Wednesday, November 4, 2015
  Room B Room C1/C2 Room D Room E1/E2 Room F1/F2 Room G1/G2
9:00 a.m.            
9:30 a.m.            
10:00 a.m.     Cohort Infrastructure Grantees Meeting
10:00 - 11:30 a.m.
[Closed - WG Members Only]
10:30 a.m.     Lung Cancer Cohort Consortium
10:30 - 12:30 p.m.
Diabetes and Cancer Initiative
10:30 - 12:00 p.m.
11:00 a.m.      
11:30 a.m.        
12:00 p.m.          
12:30 p.m.            
1:00 p.m.   African American Pooling Project
1:00 - 2:00 p.m.
[Closed - Members Only]
Diet and Cancer Pooling Project
1:00 - 3:00 p.m.
  Breast and Prostate Cancer Cohort Consortium (BPC3)
1:00 - 3:30 p.m.
Markers of HPV Infection and Risk of Head and Neck Cancer
1:00 - 3:00 p.m.
1:30 p.m.   Biomarkers and Breast Cancer Risk Prediction in Younger Women
1:00 - 3:15 p.m.
[Closed - WG Members Only]
2:00 p.m.    
2:30 p.m. Tumor Tissue Working Group
2:30 - 3:30 p.m.
Physical Activity and Risk of Cancer in the Cohort Consortium
2:30 - 3:30 p.m.
3:00 p.m.    
3:30 p.m.       Premenopausal Breast Cancer Collaboration Group
3:30 - 5:15 p.m.
[Closed - WG Members Only]
  Ovarian Cancer Cohort Consortium
3:30 - 5:30 p.m.
4:00 p.m. Lymphoid Malignancies Working Group
4:00 - 6:00 p.m.
4:00 - 6:00 p.m.
[Closed - WG Members Only]
Vitamin D Pooling Project of Breast and Colorectal Cancer
4:00 - 6:00 p.m.
[Closed - WG Members Only]
NCI-NHLBI Working Group Meeting
4:00 - 6:00 p.m.
4:30 p.m.
5:00 p.m.
5:30 p.m.    
6:00 p.m.            
Friday, November 6, 2015
  Room D
9:30 a.m.  
10:00 a.m. Second Cancers/Survivorship Working Group
10:00 - 11:30 a.m.
10:30 a.m.
11:00 a.m.
11:30 a.m.  

Meeting Summary

The 2015 Annual Meeting of the NCI Cohort Consortium was held on the NIH campus in Bethesda, MD on November 4-6. This summary reflects the portion of the meeting involving all participants on November 5, 2015. The other two days of the meeting were dedicated to multiple, simultaneous working group meetings.

This year is the 15th Anniversary of the NCI Cohort Consortium. The Consortium now includes 57 cohorts. Since last year, two new projects have joined the Consortium: the Mexican Teacher's Cohort and the Alberta's Tomorrow Project. The Mexican Teacher's Cohort has recruited female teachers from the 12 different Mexican states and began recruiting male teachers in 2013. The Canadian Partnership for Tomorrow project includes cohorts in the different Canadian Provinces. The Alberta cohort is the largest and is more advanced than other Tomorrow cohorts.

Session I: Introductions

Moderator: Dr. Eli Riboli

Dr. Riboli delivered the introduction and noted that the format of the Cohort Consortium Steering Committee (SC) meetings has changed. Cohort working groups deliver presentations at every other SC meeting. The purpose of the informal presentations is to exchange ideas and share lessons learned and accomplishments. Last year, 10 cohort principal investigators (PIs) delivered presentations to the SC. These presentations have been successful and provide useful input for the cohorts and the SC.

Dr. Stephen Chanock
Dr. Chanock, Director of NCI's Division of Cancer Epidemiology and Genetics (DCEG), noted that Dr. Douglas Lowy, Director of NCI (acting) has made epidemiology and genetic studies a priority. The Precision Medicine Initiative (PMI) for Oncology with targeted therapy is an important priority. PMI for prevention (as opposed to oncology, which focuses on treatment) also provides an important opportunity for cohorts and is likely to have a substantial impact on cancer care in the long term.

Dr. Debbie Winn
Dr. Winn introduced Dr. Kathy Helzlsouer, the new Associate Director of NCI's Epidemiology and Genomics Research Program (EGRP), which manages the NCI Cohort Consortium. Dr. Helzlsouer also is the Chief Medical Officer for NCI's Division of Cancer Control and Population Sciences (DCCPS). She has expertise in epidemiology, cancer genetic counseling, and clinical research.

Dr. Kathy Helzlsouer
Dr. Helzlsouer discussed changes to the NCI Cohort Consortium SC. The NCI Cohort Consortium SC selected Dr. Susan Gapstur as its new Chair and Dr. Anthony Swerdlow as Vice Chair for 2016. The SC is seeking nominations for two new members. SC members must be PIs of one of the Consortium cohorts. Members are expected to serve a 3-year term and participate in the monthly SC teleconferences.

NIH Precision Medicine Initiative Update

Dr. Debbie Winn

President Obama announced the PMI in 2014. Precision medicine was defined in the 2011 Institute of Medicine (IOM) report, Toward Precision Medicine. The PMI includes two major NIH programs, the 1) PMI Cohort Program, and 2) PMI for Oncology Program.

The PMI will seek one million volunteers in the United States who are willing to share their medical information, provide a biospecimen, and be contacted in the future. The goal is for the PMI participants to represent the diversity of the U.S. population. The PMI will use a highly interactive and proactive participation model. Participants will be recruited directly and through health care provider organizations (large health care systems that can collect biospecimens and provide electronic health records [EHRs]).

In preparation for the PMI cohort, analyses are being conducted to generate risk estimates for a range of diseases, identify determinants of individual variation in efficacy and safety of common therapies, and discover biomarkers. Studies also are using mobile technology to correlate activity, physiologic measures, and environmental exposures with health outcomes and to empower participants by providing them with data to improve their own health. NCI will participate in the PMI cohort but will focus on PMI for Oncology. The latter effort will focus on understanding drug resistance, laboratory models of human cancer, and integration of genomic and clinical information. Treatment in the PMI for Oncology trials will focus on genetic abnormalities in tumors.

Recruitment is beginning for NCI's Molecular Analysis for Therapy Choice (NCI-MATCH) trial. Trial participants will have their tumors genetically sequenced to determine whether they contain genetic abnormalities for which a targeted drug exists and will be treated according to actionable mutations that are detected. Participants who do not respond to the first treatment will be retested for other mutations and other treatments will be attempted. NCI-MATCH is collaborating with commercial organizations that want to test drugs. Successful drugs can be studied in Phase 3 trials.

Session II: Microbiome

Moderators: Drs. Leslie Bernstein and Meir Stampfer

Interest in the role of the microbiome in disease is increasing. The human microbiome is larger and more complex than the human genome. Session 2 focused on the role of the microbiome in epidemiologic research and ways to integrate microbiome studies into the NCI Cohort Consortium.

Human Microbiome and Oro-digestive Cancers

Dr. Jiyoung Ahn

The microbiome affects digestion, the immune system, and vitamin synthesis and varies by body site. The human microbiome has a common structure, but the relative amounts of different microbiome vary among individuals in a population. DNA sequencing is necessary to comprehensively and accurately study the human microbiota because 80 percent of human bacteria do not grow outside the body.

Two types of assays commonly are used to measure the microbiome: 1) 16S rRNA gene targeted sequencing and 2) whole shotgun metagenome sequencing. Dr. Ahn used 16S rRNA sequencing in two studies. This assay allows amplification and taxonomic identification and has adequate validity and reproducibility. Dr. Ahn described 16S sequence processing, which allows investigators to study the diversity of cancer and non-cancer phenotypes.

Dr. Ahn tested three different methods for collecting fecal samples to measure the microbiome: fecal occult blood test (FOBT) card, RNAlater, and fresh frozen. She found no major differences in the efficacy of the three collection methods.

Dr. Ahn also tested various DNA extraction methods (Qiagen, Qiagen+lysosyme, and 4 Mobio methods). She examined the proportion of microbial to human DNA and found that the Mobio methods were most effective (particularly Powerlyzer microbial).

Animal studies have shown an association between poor oral health and increased pancreatic cancer risk because oral bacteria can migrate to the pancreas and accelerate carcinogenesis. Dr. Ahn is conducting a nested case control study of the association between the oral microbiome and pancreatic cancer risk in humans. The study has revealed that smokers have a different oral microbiome than non-smokers. Proteobacteria are significantly depleted in current smokers.

Dr. Ahn received an R03 grant to conduct a case-control study of the gut microbiome, inflammation, and colorectal cancer (CRC). The study found that the gut microbiome is less diverse in CRC patients than in controls. Taxonomic comparisons also showed differences between cases and controls. Fusobacterium, a gram negative inflammatory bacterium, was increased in CRC cases. On the other hand, clostridia, a bacterium involved in fermentation and digestion of dietary fiber, was depleted in the CRC cases. Higher fiber intake also was associated with higher levels of clostridia. Higher levels of clostridia, which has anti-carcinogenic components, might protect against CRC.

R21 funding was obtained to develop a repository of gut microbiome samples from colonoscopy clinics. Analyses of these samples, to date, have found that gut microbiome differ by adenoma type.

A prospective cohort is needed to study the microbiome at the population level. Dr. Ahn is starting the Food and Microbiome Longitudinal Investigation (FAMiLI) study to meet this need. The study will involve approximately 20,000 individuals recruited from the New York City area. Participants will complete questionnaires (including annual follow-up) on demographics and diet; contribute fecal, oral, and tissue samples; and have their medical chart information verified.

Microbiome research is important to 1) develop knowledge about the causes of cancer, 2) identify people at high risk for certain cancers, and 3) examine interventions that might prevent cancer. The next steps for this field include whole shotgun metagenomics, other –omics integration, replication studies using consortium, and randomized controlled trials.

The Human Gut Microbiome as a Modulator of Adiposity-Related Biomarkers in the Multiethnic Cohort (MEC): Initial Insights

Dr. Meredith Hullar

Dr. Hullar discussed the gut microbiome (the aggregate genes and genomes of the gut bacteria) and adiposity. The prevalence of obesity is high in the Western world. More obese individuals are associated with lower socio-economic levels, less education, and certain ethnic/racial groups. This is a public health concern, not only because of adverse health effects of obesity, but also because 20 percent of all cancers are linked to obesity in the US. However, the current standards used to assess obesity (BMI) do not accurately assess disease risk of minority/ethnic groups. In these groups, the distribution of adiposity is different from Caucasian study participants and is not accurately reflected by BMI. Therefore, new models need to be developed that incorporate other predictors of adiposity to accurately assess the health risks due to obesity in ethnic/racial groups.

The gut bacteria may be considered as a regulatory agent in obesity as it can alter both the exposure to dietary metabolites and antigens which increase inflammation associated with disease risk. Changes in the composition and functional capacity of the gut microbiome have been associated with obesity. Studies show that disruption of the normal microbiota (dysbiosis) may be a biomarker for adverse outcomes in obesity-related diseases such as diabetes and colon cancer. Studies of the microbiome in obesity, however, have been confounded by weight loss and dietary changes. Studies have also addressed the association of the microbiome and obesity in racial/ethnic groups. In non-Westernized populations, the microbiome is more diverse and the Bacteroidetes abundance is greater than Firmicutes in comparison to populations from Europe, USA, and Canada. However, these outcomes are confounded by small sample sizes, genetics, and environmental exposures such as diet. No studies have measured association between the microbiome in different ethnic groups and the distribution of adiposity which may be a better predictor of health risk than BMI.

Dr. Hullar discussed unique aspects of the microbiome that need to be facilitated in the design of human population studies. Considerations include eligibility criterion and confounders that affect the microbiome of the study participants, sample frequency, nucleic acid extraction, and shipping and storage of microbiome samples. Dr. Hullar has developed questionnaires and sampling protocols to facilitate measuring the microbiome in large human population studies. Early studies by Dr. Hullar showed that nucleic acid extraction approaches adapted for the unique microbial communities in stool are required to represent the microbial diversity accurately. Dr. Hullar's group also developed a bioinformatic approach, Microbial Nucleic Acid Signatures (MNS), a rapid and inexpensive diagnostic of microbial community structure that can be applied when recruiting participants in large epidemiologic studies. Using this approach, monthly sampling of the human microbiome revealed that variation across samples from the same person was low. Whether sampling frequency of the microbiome is influenced by gender and ethnicity is a topic currently being examined in the MEC cohort. The effects of storage media and shipping temperature on the gut microbiome showed that storage media is necessary to maintain the integrity of the sample and that within-person variation is less than between-person variation regardless of shipping temperatures tested. Other sources of protocols and procedures can be found at the Human Microbiome Project Data Analysis and Coordination Center (DACC) (see

Disease risk due to obesity is underestimated in ethnic groups because fat distribution varies by gender and ethnic group. Dr. Hullar and colleagues are attempting to integrate the microbiome with other markers to better understand the influence of adiposity on racial/ethnic disparities in cancer risk. In a sub-cohort of the Multiethnic Cohort (MEC), they are investigating links between obesity, fat distribution, and disease risk. They are developing predictors of obesity in five ethnic groups by gender using variables from the exposome, genome, metabolome, and gut microbiome in relation to measures of the distribution of adiposity using DXA and MRI (n~2000). They are currently recruiting ~6000 participants to measure the gut microbiome composition and genome-wide associations across gender and ethnic groups. They will use a nested case-control design to measure the circulating concentrations of lipopolysaccharide binding protein (LBP) in the five MEC ethnic groups with breast or colon cancer outcomes (n~2000). Preliminary data showed that although BMI was not significantly different, measures of fat distribution using DXA or MRI were significantly different between gender and ethnic groups. The microbial community was measured using 16S rRNA genes. Adiposity distribution and gut bacteria genera were correlated. In addition, the human gut microbiome community structure varied by individual and clustered into four groups although the factors driving these separations still needed to be determined. While the diversity of the gut microbiome was similar across racial groups, within the Japanese American and Black populations, the gut microbiome diversity varied by gender. Human population studies that incorporate the microbiome will establish better predictors of disease risk in ethnic groups. Future studies will expand to -omics platforms that measure the functional capacity of the microbiome associated with disease risk.


Participants asked presenters for their opinion about the best type of sample for studying the microbiome. The best type of sample would depend on the research question, but both presenters expressed a preference for fecal samples, which provide information about systemic exposures due to gut microbial metabolism and antigens that may influence disease risk. Another type of sample is the mouth wash, but the mouth microbiome is contaminated with eukaryotic DNA. Presenters also questioned the degree to which buccal cell samples represent the oral microbiome. These samples do not include anaerobic periodontal pockets in the mouth that support a completely different microbiome than the tongue or mouth. Tissue samples from areas such as the colon can be helpful for examining possible disease outcomes. Collection of tissue from healthy individuals, however, is difficult. Fecal occult blood tests (FOBT) provide some information about tissue in healthy individuals.

Participants asked about the utility of blood samples for studying the microbiome because many cohorts have collected blood samples. A case/control study is underway in the MEC between the microbiome and CRC risk by measuring lipopolysaccharide binding protein (LBP) in blood samples. LBP binds to bacterial cell wall components which stimulates inflammation which may be associated with increased cancer risk. Other studies have amplified microbial DNA in the blood, but the signal is low. In the future, analyses of blood samples might provide clues about the diversity, community, structure of the microbiome. Many metabolites produced by the microbiome in the gut also can be found circulating throughout the body, which offers the opportunity to study whole host effects.

Participants asked about studies of the microbiome within families and implications for hereditary cancer. A study has been conducted that found strong correlations between mother and infant microbiomes. Studies also have shown that people who are cohabitating have more similar microbiomes than people who are not living together.

A participant suggested that Dr. Hullar consider physical activity (PA) in her study. PA is measured in the larger MEC study and could be included in final analyses.

Participants asked about the feasibility of sampling the microbiome of the aerodigestive tract. Researchers have sampled bacteria in air by placing filter units in an environment for several days. Air pollution also might influence the microbiome and could be measured simultaneously. Researchers currently are studying air conditioning units and environmental bacteria by sampling the microbiome in sputum.

Participants discussed the need to collect microbiome samples as a series until the question about sample variability over time is resolved. The microbiome fluctuates daily and over time, but between-person variations remain greater than within-person variations at different points in time. Longitudinal samples might help to establish causal relationships and learning how the microbiome evolves over the life span.

Session III: Cohort Studies of Cancer Survivorship and Related Outcomes

Moderators: Drs. Wei Zheng and Joanne Elena

The NCI Cohort Consortium has existed for a long enough period of time to examine factors that affect progression, survival, and recurrence. Many methodological issues impede the study of these outcomes.

Women's Health Initiative Life and Longevity after Cancer Study (LILAC)

Dr. Bette Caan

Dr. Caan was a Principal Investigator in the Women's Health Initiative (WHI) Clinical Trials (Diet, Hormone Replacement Calcium and Vitamin D) and Observational Study, which together provided a large population of cancer cases that formed the basis of the LILAC cohort. LILAC is a survivorship cohort from WHI that supports studies of cancer survival, survivorship, and molecular epidemiology. In addition to data collected through the larger WHI, LILAC, through infrastructure money form NCI, has added information on first treatment, first recurrence, and treatment side effects as well as tumor tissue collection.

The LILAC cohort focused on eight cancers that were selected based on interest in the research community and WHI investigators. Cancers also were chosen based on gaps in the survival and survivorship literature. Treatment and recurrence data were obtained through Medicare linkage and medical record review (for women diagnosed under the age of 65 since 2002). Fixed tumor tissue is being requested for solid tumors diagnosed since 2002. Large numbers of cases for each cancer have treatment, recurrence, WHI questionnaire, and tissue data. A rich GWAS database also exists for many of these cancers.

LILAC collected questionnaire data on several long-term health outcomes associated with cancer survival and treatment. Baseline data show that more than half of ovarian cancer (OC) survivors have nerve problems.

Challenges experienced in the LILAC study include inability to obtain older data, especially for women under age 65. Another challenge was the need to work with multiple providers to obtain treatment information. Radiology information was particularly difficult to obtain. A time/cost study was performed and found that cost per medical record was relatively high, particularly if hard copies were obtained. Because of these limitations, less detailed medical record information had to be requested. Medicare data also were limited. This information was less detailed than medical records data and levels of missing data for certain variables were high. For example, 20 percent of breast cancer cases were missing information about chemotherapy agents. Algorithms were developed to obtain recurrence information from electronic data from Medicare and electronic medical records from managed care organizations. Information on test results and reasons for procedures also were missing from all sources because of reliance on billing codes.

Investigators also experienced challenges in performing analyses. As an example, mortality and survival analyses generated different results regarding the relationship between body mass index (BMI) and outcomes. Incidence influenced these results. These conflicting findings also could have been due to collider or selection biases. One approach to control for these types of biases would be to examine effects on survival with updated exposures and control for baseline measures.

Several ancillary studies are underway using WHI/LILAC resources. LILAC provides information for most WHI cases on second cancers, adjudicated CHD outcomes, treatment dates, chemotherapy agents, radiation modes and doses, pre-diagnosis exposures, and survival in addition to tumor tissue. Information about the process for collaboration is available at A Cancer Scientific Interest Group also conducts bi-monthly calls to discuss projects and available data (e-mail to join).

Childhood Cancer Survivor Cohorts: Lessons from the Childhood Cancer Survivor Study (CCSS) and St. Jude Lifetime Cohort (LIFE)

Dr. Leslie Robison

Pediatric cancer comprises less than one percent of cancers but makes up a larger proportion of survivors. Survival rates for childhood cancer have increased rapidly since the 1960s, with the 5-year survival rate now exceeding 80 percent. Approximately half a million childhood cancer survivors are estimated to be living in the United States in 2020.

Several studies have looked at late effects in childhood cancer survivors. The Five Center Study used registries to examine pediatric cancer survivors. The Late Effects Study Group focused on Hodgkin survivors. Other cohort studies collected data on genetics, exposures, and treatment. Some of these studies now are collecting medical record data.

The CCSS is international but Dr. Robison focused on the retrospective cohort study in the United States, which initially included childhood cancer survivors diagnosed from 1970 through 1999. Any patient who survives childhood cancer five years or more is eligible to participate. Detailed treatment information was collected for more than 90 percent of the cases and biologic samples were collected for more than 7,000 cases. CCSS generated 257 manuscripts, a third in journals with impact factors above 10. Many investigator-initiated projects have used this cohort. Knowledge generated by studies using CCSS data has translated into clinical recommendations for follow up of childhood cancer survivors.

Childhood cancer survivors are more difficult to locate and contact than adult cancer survivors. Recruitment of the expanded cohort (individuals diagnosed since 1999) required almost four times the resources required to contact the initial cohort. Participation and retention was high, however, once contact was made. Investigators also have had moderate success collecting biologic samples and obtaining more complete treatment data. A cohort is being developed to collect health-related outcomes for both childhood cancer survivors and their siblings. Buccal cell and blood samples are being collected and analyzed for all participants who report a second cancer. Investigators are working to develop a genotype repository and conduct whole exome sequencing.

The St. Jude Lifetime Cohort (SJLIFE) is a long-term study of childhood cancer survivors. The study examines late effects through 100 percent medical abstraction, clinical assessments, and questionnaires. The clinical assessment involves risk-based screening, testing in a human performance lab, collection of biological samples, and neurocognitive and psychosocial testing. Follow-up is conducted with all participants after the clinical visit. Whole genome and exome sequencing also is performed on all patient samples. Many researchers use this cohort for external ancillary research studies as well as pilot studies.

SJLIFE studies found high mortality from second cancers and cardiac or pulmonary disease as well as high incidence of second neoplasms among childhood cancer survivors. Dose-specific risk of new neoplasms was examined. Glioma and meningioma risk increased with radiation dose in a linear pattern. Thyroid cancer risk, however, declined up to a certain point then began to increase with higher doses of radiation received during childhood cancer treatment. Patients who developed subsequent cancers often had a third neoplasm.

Childhood cancer survivors in the SJLIFE cohort generally had a high prevalence of life-threatening health conditions, which increased rapidly with age. It is estimated that, by age 45, most childhood cancer survivors will have at least one chronic health condition which, in most cases, will be seriously disabling or life threatening. Although many late effects experienced by this population have been linked to high doses of radiation or certain chemotherapy agents, genetic risk factors also have been identified. Childhood cancer survivors might present a phenotype of premature aging.


Participants asked the presenters how they effectively managed and analyzed the large childhood cancer data sets. CCSS has an NCI-funded data and statistics center located at Fred Hutchinson Cancer Center. Several other statistical groups also contribute to CCSS data analyses. When a researcher wants to analyze CCSS data, he or she can submit and request a rapid review concept proposal on the study website. These researchers also have the option to have CCSS staff perform their analyses. This approach generally is preferable because statisticians need in-depth knowledge of analytic approaches for the longitudinal CCSS data. When the investigators have appropriate statistical support, CCSS staff will create a data set so that they can do their own analyses, with the caveat that CCSS PIs must approve the analyses before they are submitted for publication. Support of outside investigators interested in using study data is built into the CCSS and SJLIFE budgets. They do not charge for biospecimens or data. Ancillary studies that will collect additional data must have their own support for the additional analyses.

WHI receives support from multiple NIH Institutes to operate coordinating and regional centers that have biostatisticians. Researchers can work with these centers to perform analyses or obtain data sets to perform their own analyses. Researchers must request additional resources to obtain biospecimens. WHI provides researchers with tumor blocks but does not perform TMA for tissues.

In response to participant questions about specific plans for following the survival cohort, Dr. Caan indicated that new information still is being collected from the women who participated in WHI. Updated exposure data have been collected from this cohort (including post-diagnosis exposures such as exercise) and the data have been incorporated into the LILAC database. LILAC investigators do not plan to collect new biospecimens. Some WHI sites are collecting additional biospecimens, and some of the participants will be cancer survivors.

Participants discussed recruitment of survivors, and the most effective techniques for recruitment. Many pediatric cancer survivors want education, particularly regarding risk reduction. As they develop chronic health conditions, these survivors become more interested in education and support. Loyalty to the institution where they were treated also is important to many of these survivors. Many WHI participants are committed to this study, and LILAC is considered part of the WHI study. A general challenge to recruitment and retention of long-term cancer survivors is that, as the amount of time since diagnosis increases, these individuals are less inclined to think of themselves as cancer survivors.

Session V: Data Harmonization

Moderators: Drs. Mattias Johansson and Marc Gunter

NCI is interested in developing a cohort metadata repository. Tools for data harmonization are needed that can be used across studies to avoid the need to create a new process with each study.

From Individual Cohorts to Collaborative Data Sharing Infrastructure

Dr. Isabel Fortier

Maelstrom-Research has developed methods and open-source software to support data cataloguing, harmonization, and analysis for several national and international projects. Increase in the number of cohort consortia is leading to a pressing demand for tools to support retrospective data harmonization. Research networks need to have access to efficient tools and methods to help easily but formally identify available data, determine the potential to generate harmonized data, support processing of information under a compatible format, and co-analyze data.

Co-analysis of data across studies can be achieved using a number of approaches including: study-specific data analysis (independent analyses followed by a meta-analysis combining the study-level estimates); pooled data analysis (data transferred to a central server – warehouse – and pooled to be analyzed); and federated data analysis (centralized analysis, but the individual-level participant data remain on local servers). However, to ensure content equivalence across studies and minimize measurement or assessment error that can impair statistical power, all these approaches require usage of harmonized data. Each harmonized variable should reflect a satisfactory balance between targeting very high precision but limiting applicability to some studies and acceptance of some degree of heterogeneity to permit inclusion of a larger number of studies.

Effective harmonization process requires comprehensive knowledge of the input data and can be supported by the creation of interactive catalogues providing detailed information on the individual studies and variables and biological samples available. Such catalogues are essential but should be complemented by mechanisms facilitating data access, harmonization, and integration if we are to fully unleash the potential offered by actual research data. Because of the complexity and inevitable heterogeneity of the information collected across pre-existing studies, valid comparison, integration, and co-analysis of information present major challenges. Maelstrom Research aims to develop tools to support such processes and thus optimize the use of available data and foster a collaborative approach to research.

Rethinking CEC Data: The California Teachers Study (CTS)

Dr. James Lacey

Cancer Epidemiology Centers (CECs) are required to share data. Data sharing and harmonization, however, are expensive and time consuming. Older data are particularly difficult to harmonize and share. Legal and administrative issues (e.g. consent, data transfer limitations) also complicate data sharing. Many investigators want to know how their data are used, which often is difficult to determine.

CTS is a multi-site consortium in California with a data coordinating center. Efforts to harmonize CTS data across its four sites are similar to efforts to harmonize data across CECs. Every CTS site had become its own silo and collected slightly different types of information. CTS decided to eliminate all silos. This change involved implementing a new cloud-based biobank, requesting updates to cancer endpoint data at all sites, and creating a single warehouse for all study data with analytic space and secure, on-demand real time access by authorized users from any location.

A data governance strategy and policies are being developed to ensure that CTS data will be shared in a secure, HIPAA-compliant, encrypted manner. Both private and public data workspaces will be created. Requestors will be given access to the CTS data "guest room" where they can explore, code, and harmonize data within the CTS environment. This approach will be more efficient, visible, and shift more of the costs to the requestor.

New consortia should establish approaches to reduce the burden of future data harmonization. Harmonization is facilitated when all sites collect as much data as possible with the deepest possible level of detail (e.g. open ended questions, long lists of options, continuous numerical variables). New technologies and methods permit the collection of detailed information at lower cost. For example, electronic questionnaires that tailor response options and apply skip patterns as appropriate minimize the burden on participants while collecting detailed information. Medical record information and biobanking also provide detailed, quantifiable information that can be more easily converted during data harmonization. Another strategy for facilitating data harmonization is to agree on a common set of data elements that all sites will collect and share. Examples of variables that could be collected by all Cohort Consortium studies include BMI, smoking, menopausal status, and family history of cancer.

Data harmonization can be improved by removing negative aspects of data silos, increasing visibility of data activities, shifting burden from source CECs to data users, and planning data collection. Setting up automated linkages also facilitates harmonization.


Participants asked what would happen if the company that provides the biobanking software platform went under. If this occurred, CTS investigators would immediately obtain access to the biobanking data. The software platform is used widely and is scalable and flexible. The data easily could be moved to another company's platform with a similar structure.

Participants asked about the availability of the resources discussed by Dr. Fortier. The Maelstrom-Research software is open source and the data in the repository are free. Most of the organization's resources are publicly available but investigators likely will need guidance from the organization's staff to use them. Maelstrom-Research is working on making all resources developed for its projects available on its web site. Cohort Consortium resources also can be added to the Maelstrom-Research repository. Investigators interested in adding their study information, data, and methods to the repository should contact Dr. Fortier. Dr. Fortier emphasized the importance of obtaining feedback from investigators who use resources available through the organization's website. Users who expand on those resources should share what they have developed with Maelstrom-Research and give credit to the organization in presentations and publications.

Tools are available to collect dietary assessment data that are more amenable to harmonization. For example, 24-hour recall or food record data would be preferable to a food frequency questionnaire because the former obtain direct information that does not require that the collection tool be tailored to the specific population. Software applications that are used to collect data from participants also can be loaded into a data warehouse.

Metadata should improve the ability of outside investigators to understand and use study data. This approach would allow collaborative study investigators to dedicate more time to scientific rather than data administration issues.

Participants discussed how data collected with a greater level of detail might affect response rates. Presenters have not investigated this question, but have noticed that attempts to obtain great depth of detail by providing less restrictive response options results in lower levels of missing data (respondents still have the option to respond "don't know" or refuse). Responses appear to depend more on how information is presented and how participants are asked to provide that information. Investigators should know the study population and the best ways to ask questions without overwhelming participants.

In response to a question, presenters discussed approaches for eliminating silos. In some situations, data should be kept in silos. The physical data storage location, however, is no longer important. The critical issues are data access and security.

Participants asked about quality measures for data harmonization. Dr. Fortier developed some algorithms for effective harmonization. As in research studies, data must be carefully checked and cleaned before analysis.

Session VI: Cohort Consortium Signature and Related Projects

Moderators: Drs. Anthony Swerdlow and Susan Gapstur

OncoArray Network

Dr. Peter Kraft

The OncoArray Network was formed to discover new cancer susceptibility variants for breast, ovarian, prostate, colorectal, and lung cancers. Through fine mapping and high-density genotyping, the Network also offers an opportunity to determine variants in known loci. The OncoArray Network has assembled more than 400,000 samples from existing studies and several biobanks to create the OncoArray, a custom array being manufactured by Illumina. OncoArray includes approximately 570,000 single nucleotide polymorphism (SNP) markers proposed by many different institutions. SNPs relating to quantitative phenotypes such as BMI, height, and breast density that are related to common cancers also are included.

Sample collection, quality control, genotyping, and analysis occur at multiple sites. Crowdsourcing techniques also are used for quality control. Structures are in place to facilitate sharing of genotyping data across consortia, including a single data use agreement (DUA).

Dr. Kraft presented preliminary OncoArray analysis results for a few cancers. For breast cancer, according to results presented at the American Society of Human Genetics Meeting, 63 new loci were identified that explained about five percent of familial relative risk. In addition, 157 loci were found to be associated with overall breast cancer risk, accounting for about 19 percent of familial relative risk. Eight new loci also were found that were associated with estrogen receptor (ER) negative risk.

Cross-cancer pathway analyses were performed. Various risk alleles were associated with multiple cancers and some were protective against certain cancers (or cancer subtypes).


An understanding of cancer heterogeneity (subtypes) is important for understanding shared genetic components. A barrier to examining genetic components by cancer subtype is small sample sizes. Dr. Kraft is exploring clear markers for well-studied cancer subtypes because there are enough cases to study. OncoArray might provide opportunities for studying biomarkers of rarer cancer subtypes.

Participants asked about future directions for studying the genetic epidemiology of cancer. Some work is underway in this area, such as studies of lower frequency variation. Many known high penetrance, low-frequency alleles are included in the OncoArray. OncoArray, however, is not ideal for identifying new alleles. Many large sequencing projects are underway that could identify new alleles. In addition, many samples overlap and could be built into the OncoArray analyses. Tumor tissue obtained for OncoArray has been preserved, allowing investigators to perform microarrays and tumor sequencing for molecular characterization in epidemiology studies. In the future, large data sets of genomic information could be used to stratify study participants.

Participants emphasized the importance of disseminating information about OncoArray to ensure that this work is rapidly applied to public health and health care. Funding mechanisms might be available to support projects to communicate about OncoArray activities. A prospective study in Quebec has a risk communication component examining ways to help people understand and use OncoArray information. Dr. Kraft also is working with investigators at Dana Farber to pilot a communication intervention at mammography screening clinics. They plan to analyze 100 SNPs and report results back to the patients to examine their reactions and how they use the information. The findings of these communication studies will have implications for the PMI.

Lung Cancer Cohort Consortium (LC3)

Dr. Paul Brennan

Dr. Brennan provided some statistics on LC and strategies to control the disease. Approximately 50 percent of LCs occur in former or never smokers, so preventing LC in non-smokers is a priority.

The LC3 examines LC in several cohorts internationally with 12,000 potential cases. The study is attempting to examine LC with and without the effect of tobacco. Dr. Brennan and colleagues are examining carbon metabolism, inflammation, fat soluble vitamins, renal function, smoking, and kynurenine pathways in LC.

LC3 examined the effects of vitamins B6, folate, and methionine on LC risk but did not see the dramatic protective effects found in the European Prospective Investigation into Cancer (EPIC) study. The effects of these vitamins on risk were insignificant in the United States, not in other continents. This finding could be due to generally high intake of these vitamins in the United States. A threshold of B6, folate, and possibly methionine intake might exist so that additional intake beyond this threshold has no effect on LC risk. Next steps include examining genetic determinants of one carbon metabolism and vitamins using a Mendelian randomization analysis. Preliminary data from the EPIC study suggest that certain SNPs are associated with vitamin B6 levels.

In the EPIC study, Mendelian randomization analyses were used to examine the relationship of BMI to LC risk. LC risk in smokers appeared to increase with lower body weight, but this finding could have been confounded by smoking. LC3 identified 97 SNPs that explained 2.7 percent of the variance in LC risk associated with BMI. Further analyses revealed that, after controlling for the effects of smoking on weight, only adenocarcinoma was slightly associated with low body weight.

A new OncoArray GWAS resource will become available in the near future. This resource will include new loci for overall LC and some subtypes. Loci related to addiction also will be included.

Dr. Brennan is interested in studying improved methods for early detection of LC. The Cohort Consortium could be used to examine secondary prevention of LC. Survival is good when LC is detected at an early stage, but this is rare. The National Lung Screening Trial (NLST) found that early detection through low-dose computed tomography (CT) could decrease LC mortality by 20 percent. The problem with CT screening was that the false positive rate was 95 percent and the screening was not cost efficient. Screening efficacy improves and cost declines with patients at higher risk of LC, so some individuals are eligible for CT LC screening. Nevertheless, more than half of LC patients are not eligible for screening using NLST criteria as determined by questionnaire responses. Biomarkers for LC risk could be used to more accurately identify individuals who could benefit from secondary prevention of LC. For example, cotinine can be used as a marker of current smoking intensity and AHRR methylation could be used as a marker of past smoking intensity. AHRR methylation is a strong indicator of LC risk, particularly in former smokers. LC3 investigators plan to validate all LC risk biomarkers that have been identified to date. The best biomarkers will be incorporated into risk models and tested in CT scan studies. Using a stratified approach, the Cohort Consortium could identify the best subgroup for testing the LC risk model.

LC3 will play an important role in identifying and evaluating etiological factors in LC, identifying risk biomarkers, and refining risk models. The next meeting of the LC3 will take place at the 50th anniversary conference of the International Agency for Research on Cancer in June 2016.


Observational studies also found that folate was associated with lower risk for cardiovascular disease. Randomized trials later showed no benefit of folate. When the Mendelian randomization was stratified by areas of the world with and without adequate folate intake, however, folate was found to be associated with reduced risk of cardiovascular disease. Studies in China also found that folate helped prevent cardiovascular disease in regions with pronounced folate deficiency. These findings highlight the importance of threshold effects, which also appear to be relevant to LC.

Ovarian Cancer Cohort Consortium (OC3)

Drs. Shelley Tworoger and Nicolas Wentzensen

OC is a mixture of different diseases originating in different places that manifest on the ovary. The OC cohort consortium (OC3) was created to provide adequate power for identifying risk factors and studying biomarkers and pathways for OC subtypes. OC3 also was set up to conduct prospective studies to include more aggressive cases and reduce recall bias. More than 25 cohorts in the United States and internationally participate in OC3. Plasma and serum samples have been collected for a few studies and tumor tissue is being collected across studies.

A Department of Defense translational leverage grant was obtained to study known or suspected OC risk factors by histologic subtype, tumor dominance, and tumor fatality and to develop OC risk prediction models. The grant provided funding for OC3 infrastructure development. OC3 now has a data dictionary and publication guidelines.

OC3 participates in OncoArray, and genotyping for more than 1,500 cases and controls has been completed. An additional 1,000 cases and controls will be genotyped in the near future. Primary analysis is underway in the Ovarian Cancer Association Consortium (OCAC), combining all study types.

Data harmonization was a major challenge. OC3 did not have the tools developed by Maelstrom-Research. OC3 investigators, however, learned many lessons from the harmonization process including the need to 1) invest time in setting up the process at the beginning of the study, 2) hire a scientist/programmer, 3) perform data cleaning a few variables at a time with data checks, and 4) balance quality and impact on results.

Study investigators examined risk factors stratified by histology. They found much heterogeneity of established risk factors across tumor subtypes. Most established risk factors were more strongly associated with non-serous subtypes (less common and less fatal). BMI was associated with more rapidly fatal disease. In particular, death soon after diagnosis was associated with high BMI, but this finding could be due to lower efficacy of surgery and chemotherapy due to a large amount of adipose tissue. Smoking was not associated with time between diagnosis and death when histologies were combined but was positively associated with rapidly fatal serous tumors and inversely associated with clear cell tumors. Testosterone was found to be associated with endometrioid, mucinous, and possibly clear cell tumors.

OC3 investigators are working with OCAC to develop a risk prediction model based on known risk factors. An important challenge for this task is the problem of missing data, primarily due to questions that were not asked. Multiple imputation methods are being employed.

OC3 now is beginning to examine early detection using prospectively collected samples to develop a biomarkers database. Investigators also are performing survival analyses. One challenge for the survival analyses is that baseline data were collected at different intervals. Exposures before and after diagnosis need to be considered.

Investigators made OC3 resources available to the OC research community. The availability of resources varies by cohort, including tissue resources. A system was developed for accessing data through the coordinating center with strict data protection controls and options for amending DUAs when needed. Templates were developed for sending data to the coordinating center and for requesting access to the study database (by external investigators). Using OC3 data, 12 different investigators across eight institutions have proposed 16 projects. In spite of these successes, OC3 investigators found the DUA process to be complex. They are seeking ideas for improving this process.

OC3 investigators plan to expand the infrastructure to include biomarker data from existing and new assays and OncoArray information. They are seeking funding to develop new assays. Follow-up questionnaire and dietary data also will be added to the database.

The cost of maintaining OC3 resources is high, so investigators are seeking funding from multiple sources. Many cohorts are willing to expand participation beyond contributing baseline data.


The LC and OC cohort consortia experienced similar challenges, including DUAs. DUAs are time-consuming and expensive, and as a result, many consortiums have developed DUA templates. Participants would like NCI to work with the Cohort Consortium PIs to develop a standard DUA template or templates (for different types of institutions). Presenters also emphasized the importance of having a lawyer who understands all aspects of the project. Projects also require professionals who understand legal and administrative differences in how research is conducted in various countries. Lawyers from different institutions need to collaborate to establish more consistency in practices across institutions.

Other consortia have experienced challenges in getting collaborating centers agree to allow outside investigators to access and analyze data in the data coordinating center. Prior to developing DUAs, the OC consortium leaders had multiple discussions with investigators at the various centers to achieve consensus regarding data access. External investigators who wish to use OC3 data must agree to all the terms of the existing DUA, which can be challenging and has led to amendments in some cases. This experience has led to the development of an OC3 outgoing DUA template for external investigators wanting to log in to the system.

Participants asked about plans for identifying more defined OC subtypes and risk factors for those subtypes. Investigators anticipate that risk prediction models will work better for some OC subtypes (e.g. clear cell) which have many known risk factors, compared to the serous subtype (most common and most fatal). The OC consortium investigators plan to examine all conceivable risk factors and are adding SNPs to the risk model. Investigators also plan to perform a detailed histologic evaluation of all tissue, using pathology and exposure data, to discover improved approaches for classifying tumors.

Open Mic: Ideas for New Signature Studies

Participants discussed the possibility of leveraging the Cohort Consortium for small, focused clinical trials within the cohorts, for example, to identify biomarker endpoints. New trial designs should be considered to improve efficiency. For example, a new WHI trial has a design that takes advantage of existing follow-up data for cardiovascular endpoints. Only the intervention group will need to be approached for consent because all required information already has been obtained from the control group. Investigators for the Black Women's Health Study would be interested in collaborating with other cohorts to conduct small clinical trials focused on specific endpoints. The best approach might be to conduct these types of trials across multiple cohorts at the same time. Methods development studies also could be conducted in a randomized fashion.

Participants also discussed the development of common questions to be used across multiple cohorts. Many cohorts continue to actively administer questionnaires to participants, and might collaborate to develop a set of common questions. Many investigators are interested in prospective harmonization of questionnaire data. NCI's Behavioral Research Program has access to standardized questionnaires and clearinghouses of behavioral measures. Patient Reported Outcomes also has done much work on standardization of instruments.

Standard processes for biospecimen collection also could be developed. The Nurses' Health Study has begun to post protocols for sample collection. Investigators for this study are interested in sharing these protocols. Participants suggested creating a location on the Cohort Consortium website to share protocols.

Contact Us

If you have questions about the meeting, contact at the National Cancer Institute.