2011 Cohort Consortium Annual Meeting

The annual Cohort Consortium meeting, sponsored by EGRP and the Division of Cancer Epidemiology and Genetcis (DCEG), was held October 26, 2011 in Boston, MA. Project meetings were held on October 25, 2011. If you have questions about the meeting, contact Nonye Harvey at the National Cancer Institute.

Main Cohort Consortium Meeting Agenda

The Colonnade Hotel, Boston, Massachusetts - Boston Ballroom
Wednesday, Oct. 26, 2011 Topic
7:00 a.m. – 8:00 a.m. REGISTRATION
8:00 a.m. – 8:30 a.m. Welcome Remarks
Julie R. Palmer, Sc.D.
Chair, NCI Cohort Consortium, 2010-2011
Senior Epidemiologist, Slone Epidemiology Center at Boston University
Professor of Epidemiology, Boston University School of Public Health

Opening Remarks
Robert T. Croyle, Ph.D.
Director, Division of Cancer Control and Population Sciences
National Cancer Institute

Robert N. Hoover, M.D., Sc.D.
Director, Epidemiology and Biostatistics Program
Division of Cancer Epidemiology and Genetics
National Cancer Institute


Moderator: James R. Cerhan, M.D., Ph.D.
Professor and Chair, Division of Epidemiology
Mayo Clinic College of Medicine

8:30 a.m. – 9:00 a.m. Sequencing of Risk Regions for Prostate Cancer in 2,000 African Americans
David Reich, Ph.D.
Professor of Genetics, Harvard Medical School / Broad Institute of Harvard and MIT
9:00 a.m. – 9:30 a.m. Gene-Environment Interactions: Lessons Learned
Peter Kraft, Ph.D.
Associate Professor of Epidemiology, Department of Epidemiology and Department of Biostatistics
Harvard School of Public Health
9:30 a.m. – 10:00 a.m. Exposure Linkage in Cohorts: Linking Agricultural Health Study and Iowa Women's Health Study to Environmental Data
Mary Ward, Ph.D.
Senior Investigator, Occupational and Environmental Epidemiology Branch
Division of Cancer Epidemiology and Genetics
National Cancer Institute
10:00 a.m. – 10:30 a.m. BREAK
10:30 a.m. – 11:00 a.m. Unique Methodology Used in New Cohorts

The Million Veteran Program (MVP)
J. Michael Gaziano, M.D., M.P.H.
Director, Massachusetts Veteran Epidemiology, Research and Information Center (MAVERIC), VA Boston Healthcare System
Chief, Division of Aging, Brigham Women's Hospital
Professor of Medicine, Harvard Medical School

The Millennium Cohort Study
Cynthia LeardMann, MPH
Senior Biostatistician, Deployment Health Research Department
Naval Health Research Center


Moderator: Anne Zeleniuch-Jacquotte, M.D.
Associate Professor, Department of Environmental Medicine
New York University School of Medicine

11:00 a.m. – 11:30 a.m. Overview from the NCI Office of Biorepositories and Biospecimen Research
Jim Vaught, Ph.D.
Deputy Director, Office of Biorepositories and Biospecimen Research
National Cancer Institute
11:30 a.m. – 12:00 p.m. Tumor Tissue Collection: The Experience of the Harvard Cohorts
Lorelei Mucci, Sc.D.
Associate Professor of Epidemiology, Department of Epidemiology
Harvard School of Public Health
12:00 p.m. – 1:00 p.m. LUNCH
1:00 p.m. – 1:10 p.m.

Moderator: Julie E. Buring, Sc.D.
Senior Epidemiologist, Brigham and Women's Hospital
Professor of Medicine, Harvard Medical School
Professor of Epidemiology, Harvard School of Public Health

1:10 p.m. – 1:40 p.m. NCI Cohort Infrastructure Program Announcement
Daniela Seminara, Ph.D., M.P.H.
Senior Scientist and Consortia Coordinator, Epidemiology and Genetics Research Program, Division of Cancer Control and Population Sciences
National Cancer Institute
1:40 p.m. – 1:50 p.m. One-Carbon Metabolism Pathway and Lung Cancer: a New Cohort Consortium Project
Paul Brennan, Ph.D.
International Agency for Research on Cancer
The World Health Organization, Lyon, France
1:50 p.m. – 2:00 p.m. HPV and Head and Neck Cancer
Aimee Kreimer, Ph.D.
Investigator, Infections and Immunoepidemiology Branch
Division of Cancer Epidemiology and Genetics
National Cancer Institute
2:00 p.m. – 2:10 p.m. African American Working Group
Nonye Harvey, M.P.H
Public Health Advisor, Epidemiology and Genetics Research Program
Division of Cancer Control and Population Sciences
National Cancer Institute
2:10 p.m. – 2:20 p.m. Updates from the Obesity Working Group
Amy Berrington de Gonzalez, D.Phil.
Investigator, Radiation Epidemiology Branch
Division of Cancer Epidemiology and Genetics
National Cancer Institute
2:20 p.m. – 2:50 p.m. BREAK

Moderator: Julie R. Palmer, Sc.D.
Senior Epidemiologist, Slone Epidemiology Center at Boston University
Professor of Epidemiology, Boston University School of Public Health

2:50 p.m. – 3:20 p.m. Feedback from Cohorts: Communications, Opportunities, Burden

Julie E. Buring, Sc.D.
Senior Epidemiologist, Brigham and Women's Hospital
Professor of Medicine, Harvard Medical School
Professor of Epidemiology, Harvard School of Public Health

Anne Zeleniuch-Jacquotte, M.D.
Associate Professor, Department of Environmental Medicine
New York University School of Medicine

3:20 p.m. – 4:20 p.m. Brainstorming: Best thing to do next

Patricia Hartge, Sc.D.
Deputy Director, Epidemiology and Biostatistics Program
Division of Cancer Epidemiology and Genetics
National Cancer Institute

Bill Blot, Ph.D.
Professor of Medicine, Vanderbilt University
Chief Executive Officer, International Epidemiology Institute

Deborah M. Winn, Ph.D.
Deputy Director, Division of Cancer Control and Population Sciences
National Cancer Institute

4:20 p.m. – 4:30 p.m. Meeting Wrap Up
Julie R. Palmer, Sc.D.
Senior Epidemiologist, Slone Epidemiology Center at Boston University
Professor of Epidemiology, Boston University School of Public Health
4:30 p.m. ADJOURN

Return to Top

Project Meetings

The Colonnade Hotel, Boston, Massachusetts
Tuesday, October 25, 2011
Room - Colonnade West Room Time
Liver Cancer Pooling Project Working Group
POC(s): Katherine McGlynn
11:00 a.m. – 12:00 p.m.
Room - Braemore Time
Lymphoid Malignancies Working Group
POC(s): Nat Rothman, Brenda Birmann, Paolo Boffetta
12:00 p.m. – 1:00 p.m.
Vitamin D and Breast and Colorectal Cancer Working Group
POC: Stephanie Smith-Warner
1:00 p.m. – 2:00 p.m.
Obesity Working Group
POC(s): Amy Berrington, Patricia Hartge
3:00 p.m. – 4:30 p.m.
African American Working Group
POC(s): Julie Palmer, Bill Blot
4:30 p.m. – 6:00 p.m.
Ovarian Cancer Cohort Consortium Working Group
POC(s): Nicolas Wentzensen, Shelley Tworoger, Alan Arslan
6:00 p.m. – 7:00 p.m.
Room - Kenmore Time
POC(s): Rachael Stolzenberg-Solomon, Charles Fuchs, Patricia Hartge
12:00 p.m. – 1:00 p.m.
Head and Neck Cancer Working Group
POC(s): Aimee Kreimer, Paul Brennan
1:00 p.m. – 2:30 p.m.
Kidney Cancer GWAS Working Group
POC(s): Mark Purdue, Ghislaine Scelo
2:30 p.m. – 4:00 p.m.
One-Carbon Metabolism and Colorectal Cancer Working Group
POC(s): Paolo Vineis, Rachael Stolzenberg-Solomon
4:00 p.m. – 5:00 p.m.
One-Carbon Metabolism Biomarker and Lung Cancer Working Group
POC: Paul Brennan
5:00 p.m. – 7:00 p.m.

The Breast and Prostate Cancer Cohort Consortium (BPC3) meeting will be held on Thursday, October 27th and Friday, October 28th at Harvard School of Public Health. For more information, please contact Peter Kraft.

Return to Top

Summary of 2011 Annual Meeting

Welcome Remarks
Julie R. Palmer, Sc.D., Chair of the National Cancer Institute (NCI) Cohort Consortium

Dr. Palmer welcomed the participants to the annual meeting of the NCI Cohort Consortium. She thanked the Division of Cancer Control and Population Sciences (DCCPS) and Ms. Nonye Harvey for their continued support of the Cohort Consortium. Dr. Palmer noted that NCI will hold a special lecture in honor of Dr. Arthur Schatzkin, a Cohort Consortium investigator who recently passed away from cancer. The lecture will be held on April 16, 2012 at 3:00 p.m. in the Lipsett Amphitheater, on the NIH main campus in Bethesda, Maryland. Dr. Schatzkin initiated the American Association of Retired Persons (AARP) study and developed novel methods for assessment of diet and physical activity.

The Cohort Consortium's mission is to promote communication and collaboration between member cohorts, identify common problems, and recommend possible solutions. Thirty-two projects have been approved by the Cohort Consortium Secretariat since 2003. Most of these projects use cancer as the outcome, however five projects use mortality, one project uses serum Vitamin D levels, and a methodological study is using diet selectivity as the outcome. Fifteen studies use only questionnaire data for exposure variables. There are seven genome-wide association studies (GWAS), two non-GWAS genetic studies, and eight studies that use circulating biomarkers as exposures.

A few projects, including the Breast and Prostate Cancer Cohort Consortium (BPC3), the endometrial cancer GWAS, and the lung cancer and 1-carbon metabolism study, have secured funding through R01 grants. For other projects, investigators are pursuing funding through a variety of grant and contract mechanisms. Funding has not been secured for approximately half of the projects; however, the cohorts are committed to collaborating and participating in projects even when funding is not available.

Opening Remarks
Robert T. Croyle, Ph.D., Director, DCCPS, NCI
Robert N. Hoover, M.D., Sc.D., Director, Epidemiology and Biostatistics Program, Division of Cancer Epidemiology and Genetics (DCEG), NCI

This is the first year for submitting cohort infrastructure grants in response to the Cancer Epidemiology Cohort (CEC) Program Announcement. Under this new process, cohort infrastructure grants are separated from other applications during review. There have been two rounds of grant submissions, and the first round is scheduled to be discussed by the Scientific Program Leadership (SPL) at NCI on November 22, 2011. The R03 and R21 funding mechanisms at NCI will be transitioning from specific program announcements to broad, generic omnibus mechanisms. The Provocative Questions request for applications (RFA) has been released. Dr. Croyle encouraged investigators to examine the ancillary materials associated with each question for further clarification. Applicants to the Provocative Questions RFA also were encouraged to pay close attention to the overall grant portfolio, which is a significant factor in final funding decisions. The NCI-funded grant portfolio is publically available on the Internet. A competitive supplement program announcement for the NIH and Food and Drug Administration (FDA) Center for Tobacco Products joint study will be released shortly. The FDA has committed $15-20 million to this effort.

The past 20 years of cancer epidemiology research has seen a tendency has been toward large collaborative projects. In the 1970s, some studies suggested that postmenopausal hormone use might be associated with increased breast cancer risk. Studies attempting to replicate and confirm this observation had largely inconsistent results until a large pooled analysis of 50,000 cases and 100,000 controls confirmed that recent hormone use was positively associated with breast cancer. Confirming the association of obesity with endometrial and gall bladder cancer similarly required large collaborative studies. Although 15 years of candidate gene studies failed to discover many genes associated with cancer risk, the collaborative GWAS have identified almost 200 definitive cancer small nucleotide polymorphisms (SNPs).

Large, well-designed studies are necessary to detect low relative risks (RR = 1.1-1.2), study subgroups of diseases and exposures, and simultaneously assess thousands of hypotheses in one experiment. Many of these types of studies could have been performed 10-20 years ago if large consortia had been formed earlier. Infrastructure funding is a challenge, however, for conducting large collaborative studies. The cancer epidemiology community needs to be able to communicate the value of these infrastructures to funding decisionmakers.

Dr. Hoover thanked Dr. Palmer for her service as the chairperson of the Cohort Consortium Secretariat and presented her with an outstanding service award in recognition of her leadership. Dr. James Cerhan will be serving as the chairperson of the Cohort Consortium Secretariat next year.

Moderator: James R. Cerhan, M.D., Ph.D., Professor and Chair, Division of Epidemiology Mayo Clinic College of Medicine

Sequencing of Risk Regions for Prostate Cancer in 2,000 African Americans
David Reich, Ph.D., Professor of Genetics, Harvard Medical School and Broad Institute of Harvard and MIT

The Multiethnic Cohort (MEC) has been conducting large-scale capture and sequencing of genetic regions of interest for prostate cancer using technologies that can analyze large numbers of samples simultaneously. Thousands or tens of thousands of samples need to be analyzed to discover genetic risk factors in large cohort studies, but the cost of DNA sample preparation and processing is prohibitive.

For this project, protocols and experimental methods used for sequencing ancient DNA samples for evolutionary studies were adapted for use in a cohort study. These methods are able to process very small quantities of DNA efficiently while losing only a small portion of the DNA library's complexity. The commercial cost of DNA sample preparation for Illumina's sequencing technology is $500-800 per sample. Isolating genomic regions of interest, such as the whole exome, costs an additional $500 per sample. The initial cost for performing a sequencing study is typically around $1,000 per sample. The costs can be reduced by pooling many samples together.

The first step of DNA preparation involves using ultrasound waves to shear 96 DNA samples simultaneously. Thirty base pair (bp) adaptor sequences, containing a 6bp barcode, are added to the resulting 100-300bp DNA fragments. There are 144 unique barcodes that are sequenced along with the DNA fragment allowing each individual sample to be identified after they are pooled. A single aliquot of capture reagent is used to capture 48-144 pooled samples at once. Illumina sequencing requires adaptor sequences that are 60bp long; however, using long adaptor sequences significantly reduces the capture efficiency. The adaptor sequences are lengthened only after the capture step. DNA libraries produced in this way have sequence duplication rates of 38 percent. The very best libraries have sequence duplication rates of 25 percent, however, the method presented here is optimized for cost. A single technician can process 192 samples per day and the reagent costs are $9 per sample. With 10-fold coverage, the sequencing cost per sample is approximately $100.

In a pilot study, this technology was applied to 1,000 African American prostate cancer cases and 1,000 African American controls to sequence 2.2 megabases (Mb) at 4-fold coverage. The target capture region, around 8q24, has been associated with prostate cancer in GWA studies. The sensitivity using this method was much higher than the sensitivity obtained in the 1,000 genomes study. Almost all polymorphisms with a minor allele frequency (MAF) greater than 1 percent as well as many polymorphisms with an MAF less than 1 percent were detected. SNP variants totaling 35,211 were found in the target region. The allele frequencies, odds ratios (OR), and P-values were very highly correlated with results from a GWAS performed on the same samples, demonstrating the utility of the method. For variants with a MAF less than 5 percent, sequencing provides significantly increased power over 1 million SNP arrays and imputation from the 1,000 genomes database to detect tag SNPs.

Questions and Comments

Question: How much DNA is required and how much time does it take to perform your processing method?
Answer: Approximately 500 nanograms (ng) of DNA are needed for the analysis, however extra DNA needs to be on hand in case the reaction fails. Ultrasound shearing of the DNA is the longest step and takes 21 hours for 96 samples. Currently, processing 4,000 samples takes 3-4 months.

Question: Does DNA extracted from buccal cells work as well as DNA extracted from blood?
Answer: Both sources of DNA work well, although the success of buccal DNA can depend on the level of bacterial contamination.

Question: What does the $100 per sample processing fee include?
Answer: It includes the cost of reagents for preparation, capture, and sequencing, but does not include the cost of equipment, technician time, or data analysis. Technician time is a minor component of the overall cost.

Gene-Environment Interaction and Genome-Wide Association Studies
Peter Kraft, Ph.D., Associate Professor of Epidemiology, Department of Epidemiology and Department of Biostatistics, Harvard School of Public Health

A literature review of gene-environment (GxE) interactions in breast cancer is underway. Using PubMed queries, more than 400 publications on GxE interactions and breast cancer were identified. Challenges associated with candidate gene studies also complicate GxE interactions studies, including low power and low probability of association. Almost 300 of the identified papers claim to have a positive finding of a GxE interaction; given the challenges of studying GxE, most of these reported observations are not likely to be true interactions.

It is well established that both genetic and environmental factors contribute to cancer risk. Studying GxE interactions in observational epidemiology studies can provide increased power to detect genetic associations by leveraging assumed effect modifiers. GxE studies can provide insights into biological mechanisms and improve risk prediction and prognostic models. Statistical interactions per se only offer circumstantial evidence to address any of these goals.

A GxE interaction is defined as a variation in some measure of the effect of an exposure on disease risks across the levels of a modifier. Statistically, an interaction between two factors refers to a departure from an additive effect on a particular scale. The definition is dependent on the measure of association being used (e.g., relative risk, absolute risk, or odds ratio). As an example, a disease with a single dichotomous genetic variable and a single dichotomous environmental variable can be modeled using additive (absolute risk) or multiplicative (log odds) scales. Using the same data, an additive model can demonstrate an interaction while the multiplicative model does not and vice versa. Often, no obvious scale should be used when testing for interactions; the relevant scale is dependent on the hypothesis being tested. Modeling interactions becomes even more complicated when phenotypes or exposures are continuous; detecting an interaction is dependent on the range of the exposure being sampled as well as the scale.

If a genetic effect is restricted to subgroup of exposure, the power to detect that effect can be increased by performing a joint test for genetic main effects and interactions. The joint test has two degrees of freedom, is not scale dependent, and is able to detect interactions using additive or multiplicative scales. Modeling the power of the joint test based on ORs demonstrates that the joint test has the power to detect main effects over a larger range of ORs than the marginal test. It also has much greater power to detect interactions than the usual GxE interaction test.

The joint test approach has been used to identify novel interactions such as the glutamate receptor gene GRIN2A and coffee intake for risk of Parkinson's disease, four susceptibility loci and sex for risk of type II diabetes, and 14 common gene variants and body mass index (BMI) for fasting insulin levels and glucose homeostasis. All of these studies required large sample sizes. Using the joint test, achieving 80 percent power to detect an interaction with an OR of 1.35 in a GWAS context requires 15,000 cases. Novel statistical methods may address power issues in some cases.

When a GxE interaction is observed, the chance of that association being real depends on the power of the study, the prior probability that the interaction is true, and biases for or against rejection of the null hypothesis. Experience from the study of marginal effects demonstrates that strong statistical evidence for association and precise replication are necessary to report a positive finding. The prior probabilities of true interactions are likely to be even smaller than for marginal effects. The reported interaction of the serotonin transporter gene 5-HTTLPR, early life stress, and risk of depression is a high-profile example of a statistically significant observed interaction with a plausible biological mechanism that could not be replicated in independent studies.

There are many practical challenges when testing for GxE interactions. In some cases, a true statistical interaction may not have clinical significance. There are limits to the etiological inferences that can be made because there are often multiple mechanistic models that are all consistent with the observed statistical interaction. Genotypes can be measured very accurately, but there is often significant error when measuring phenotypes and environmental exposures. Reverse causation is a concern that is somewhat ameliorated in prospective cohort studies. Large sample sizes with prospectively collected and accurate exposure data will be needed to detect interactions effectively.

Questions and Comments

Question: In many cohorts, environmental exposures have been measured in sample sizes as large as 1.5 million people, but genotypes only will be available from a small number of participants nested within that sample. Can power to detect GxE be gained by assuming the association of the environmental factor with disease is known and fixed?
Answer: Some power can be gained by making that assumption, but it depends on the assumption being true.

Exposure Linkage in Cohorts: Linking the Agricultural Health Study and Iowa Womens Health Study to Environmental Data
Mary Ward, Ph.D., Senior Investigator, Occupational and Environmental Epidemiology Branch, DCEG, NCI

Linking study populations to exposure source data can be successful using geographic information systems (GIS). Examples of exposure assessments in which geographic information can be useful include monitoring agricultural pesticides, drinking water nitrate levels, and emissions from animal feeding operations. GIS software stores information in data layers, which can be overlaid on a map of residences, pollution sources, and geographic data.

Sources of geographic environmental data include the United States census data, land use data obtained through satellite imagery by the United States Geological Survey, and environmental monitoring databases, which include industrial emissions and drinking water measurement data from public water supplies. These data could be used to investigate whether air pollution modifies the association of physical activity with cancer risk or whether neighborhood characteristics (e.g., built environment, socioeconomic status) affect obesity and cancer risk. A wide range of risk factors can be adjusted for neighborhood characteristics, such as access to medical facilities.

The first step in using GIS to link environmental data is to locate the study population. The most accurate way to locate the population is to take GPS readings in individual homes. Although not feasible in large cohort studies, location information can be obtained by geocoding addresses. Automated geocoding is inexpensive and efficient, and thousands of addresses can be geocoded in hours. Full addresses are necessary to match to a georeference database. Latitude and longitude are estimated for each address by interpolating addresses between known points along street segments. It also is possible to estimate latitude and longitude by matching to parcel data or by using aerial photos.

The accuracy of geocoding is important. In rural areas especially, the position of a home can be significantly offset from the position of the address. A methodological study was conducted to compare locations measured at the residences by GPS with geocoded address locations returned by a commercial service. The median difference between the actual and geocoded locations was 50 meters in towns and 212 meters in rural areas. Town size did not affect positional accuracy for urban addresses. The accuracy of rural addresses could be improved to a median error of 88 meters by using an in-house geocoding method. Iowa has an e-911 system, which is a GPS reading located at the end of the driveway of each residence. Another methodological study used digital orthophotography to measure the distances between the e-911 locations and the actual residences. The median distance between the e-911 location and the actual residence location was 44 meters; however, the range of the location error was 1 to 974 meters.

A body of literature indicates that homes in close proximity to crops are exposed to agricultural herbicides. Satellite imagery and historical farm service records show that 60 percent of all Iowa residences are within 500 meters of an agricultural crop. The risk of acute myeloid lymphoma in women living on farms is increased 2.25-fold over the general population. For non-farm residences with 60 or more acres of crops within 300 meters of the residence, the probability of detecting pesticides in carpet dust is similar to the probability of detecting pesticides in carpet dust from farm residences. The ability to use crop acreage near homes as a surrogate for pesticide exposure is being evaluated.

Monitoring data from U.S. community water supplies indicate that about 25 percent of the population in agricultural states is exposed to 5-10 mg/L of nitrate in their drinking water. These levels are about 50 percent of maximum contaminant levels (MCL) for nitrate set by the U.S. Environmental Protection Agency. Private wells used primarily in rural areas are not regulated, have little historical monitoring data, and can have much higher nitrate levels than community water sources; this is a major challenge for exposure assessment. By using satellite imagery to create crop maps to estimate levels of fertilizer applications, and GIS regression models (e.g., land use, aquifer types), investigators were able to estimate that the nitrate exposures from both community water sources and private wells in the Platte River Valley of Nebraska and in a region of south eastern Pennsylvania have drinking water nitrate levels that exceeded the MCL.

In another example, exposures from animal feeding operations were estimated in Carroll County, Iowa using GIS. Animal feeding facilities generate a lot of waste and their locations are maintained in a geological survey database. The distance of every residence to the closest animal feeding operation was calculated and EPA air dispersion models were used to estimate exposures for each residence within the county.

GIS can add value to cohort studies by allowing assessment of exposures that are not obtained easily through questionnaires or biomonitoring. Uncertainty in spatial data needs to be quantified and GIS-based exposure metrics need to be validated. GIS will be increasingly useful as more data become available and technology improves. Development of new exposure assessment models is needed and requires an interdisciplinary approach including geographers, environmental engineers, chemists, environmental scientists, and epidemiologists.

Questions and Comments

Question: How scalable are these GIS methods for cohorts that may have participants scattered across many states?
Answer: Collecting GIS data from many states is a challenge. Some data, such as census data and land cover data, are available for the whole country.

Unique Methodology Used in New Cohorts: The Million Veterans Program (MVP)
J. Michael Gaziano, M.D., M.P.H., Director, Massachusetts Veterans Epidemiology, Research and Information Center (MAVERIC), VA Boston Healthcare System; Chief, Division of Aging, Brigham and Women's Hospital; Professor of Medicine, Harvard Medical School

The goal of the MVP is to enroll 1 million users of the Veterans Health Administration (VHA) into an observational mega-cohort. After surveying large-scale biobanks in Europe and North America, the approach selected to build the MVP cohort was to develop a model in which specimens and questionnaire data are merged with national health service data.

The VHA is an ideal setting for large-scale population research; it has a large pool of willing participants who tend to stay in the VHA system for long periods of time, electronic medical records, and a robust research infrastructure. Components of the MVP include the Office of Research and Development to provide oversight, a centralized institutional review board (IRB), a centralized biorepository, an informatics infrastructure, and the local sites.

To recruit participants, MVP sends VHA members an invitation letter, a baseline survey, and a MVP brochure. If a member submits the survey, a study visit appointment is made during which consent is obtained and a blood sample collected. Following the initial visit, the participants are sent a thank you letter and a lifestyle survey. Consent forms are sent to a teleform server and biospeciments are shipped to the central repository. The process is automated using Sharepoint and the informatics infrastructure. Two plasma aliquots and one buffy coat aliquot are stored with a two-dimensional barcode; 15 percent of the buffy coat is sent for DNA extraction. A majority of the samples are processed within 1 day of collection.

More than 9,000 people, mostly individuals who are middle aged or older, white, and male, have been enrolled in the study. The recruitment goal for this year is 100,000. Mailings were sent to 125,000 people with a 22 percent response rate; 19,000 people completed the baseline survey and 13,000 people opted out of the study. The 5-year recruitment goal is 1 million participants. Oversampling of females and racial and ethnic groups is planned. Some technological advances will be necessary to meet the goal, including allowing participants to complete surveys, make appointments, and give informed consent online.

Unique Methodology Used in New Cohorts: The Millennium Cohort Study
Cynthia LeardMann, M.P.H., Senior Biostatistician, Deployment Health Research Department, Naval Health Research Center

The Millennium Cohort Study was initiated in 2001. Its purpose is to prospectively evaluate the impact of military experiences on long-term mental and physical health outcomes of U.S. service members. Earlier efforts to study these exposures were in the form of retrospective studies. The Millennium Cohort Study capitalizes on Department of Defense (DoD) surveillance and healthcare data that was not available during the first Gulf War. A study objective is to provide strategic policy recommendations and information to DoD and government leaders to guide future interventions.

The cohort consists of 170,000 population-based participants of current and former service members representing all service branches. Cohort members are requested to complete an in-depth questionnaire every 3 years through 2022, regardless of their military status at follow-up. The cohort includes both active duty and reservists; non-active duty members are followed-up even if they are not part of the VHA. Stratified random samples of active duty service personnel were recruited in 2001, 2004, 2007, and 2011. The baseline response rate is 31 percent with a greater than 70 percent response rate at follow-up. Fifty-seven percent of the cohort was deployed in Iraq or Afghanistan and 33 percent had separated from the military.

The survey, which can be completed online or on paper, consists of more than 450 questions, and takes 30-40 minutes to complete. Informed consent can be granted online. The questionnaire uses standard measures of physical, behavioral, and mental health. It includes questions on military experiences and occupational exposures as well as behavioral outcomes such as alcohol and tobacco use. The questionnaire data can be linked to exposure and healthcare data including immunizations, deployment records, and mortality data.

Univariate analyses are used to investigate unadjusted associations. Multivariable regression models, survival analyses, factor analyses, and other multivariate techniques are conducted as appropriate. Because of the low response rates, weighting techniques and propensity scores are often employed.

A number of techniques are employed to encourage continued participation of cohort members. One unique method involves placing statements about the Millennium Cohort on the pay stubs of military personnel. Social media (e.g., Wikipedia and Facebook) also are used. Postcards are sent to members on Memorial Day and Veterans Day. Challenges to recruitment and retention include military personnel being targets of phishing attacks, leading them to question the legitimacy of the messages about the study. It can be difficult to track former service members and keep them engaged in the study. A non-response analysis suggests that the Cohort is a representative sample of military personnel with minimal response bias.

After the current recruitment phase, the goal is to have more than 200,000 members in the cohort including military spouses. The study has produced 45 peer reviewed publications. Future plans include adding a family component as well as clinically relevant sub-studies. With a 21-year follow-up period, the study will continue to produce relevant data on health outcomes of military personnel.

Questions and Comments

Question: Information captured in electronic medical records (EMRs) is often not standardized in any way. How research friendly are EMRs used by the DoD and VHA?
Answer: EMRs are difficult to use for some diagnoses. Sometimes, data from the EMR are compared to self-reports. Working with EMR data requires careful validation.

Moderator: Anne Zeleniuch-Jacquotte, M.D., Associate Professor, Department of Environmental Medicine, New York University School of Medicine

Overview from the NCI Office of Biorepositories and Biospecimen Research
Jim Vaught, Ph.D., Deputy Director, Office of Biorepositories and Biospecimen Research (OBBR), NCI

In 2002 the National Dialogue on Cancer identified biospecimens as critically important to cancer research. It concluded that biospecimens were in short supply and available biospecimens were often of inferior quality. OBBR was established in 2005 and the Biospecimen Research Network (BRN) was formed in 2006. In 2007, OBBR published the "NCI Best Practices for Biospecimen Resources." After completing a needs assessment for a national biobanking network in the United States, strategic planning for the Cancer Human Biobank (caHUB) began in 2009.

There are a number of international biobanking initiatives in Europe and Asia with well-defined quality management programs and tissue collection protocols. As a result of several workshops, an update to the "NCI Best Practices for Biospecimen Resources" is scheduled for release in November 2011. The best practices include recommendations for technical, operational and safety; quality assurance and quality control programs; implementation of informatics systems; establishment of reporting mechanisms; and administration and management structures. Best practices also address ethical, legal, and policy issues such as informed consent, access to biospecimens and data, privacy protection, custodianship, and intellectual property.

In the April 2010 issue of Cancer Epidemiology, Biomarkers & Prevention, Dr. Vaught, in collaboration with the International Agency for Research on Cancer (IARC), published a paper which outlines international efforts to develop biobanking best practices. Although sets of best practices borrow from one another, there is no single set of harmonized best practices. The paper "Biospecimen Reporting for Improved Study Quality," published in Biopreservation and Biobanking in early 2011, concerns standards for reporting on sample quality when a manuscript is submitted. Editorials are planned for other journals to promote the use of these standards.

A workflow for collecting tissue specimens requires careful control over processes through quality control and quality assurance programs, standard operating procedures, informed consent, and proper material transfer. These become more complicated with more complex collection and processing procedures. When collecting a biospecimen sample during surgery there are many pre-acquisition and post- acquisition variables that need to be considered, including administered drugs, temperature of the room, type of fixative used, the rate of freezing, aliquot size, and container type. These variables can affect sample quality, impact diagnosis and treatment, generate irreproducible research results, and lead to the misinterpretation of artifacts as biomarkers. Until clear sample handling guidelines were established for HER2 analysis in breast cancer, the false positive and false negative rates were almost 20 percent.

The OBBR handles sample issues in a number of ways. A Biospecimen Research Database was created that contains approximately 900 papers on biospecimen methods. New papers are added often. The database can be searched by tissue type and other variables. The OBBR hosts the annual symposium "Advancing Cancer Research through Biospecimen Science" and supports extramural biospecimen methods research. Collaborations with other NCI programs, such as Clinical Proteomics Technologies Assessment for Cancer (CPTAC) and The Cancer Genome Atlas (TCGA), also advance biospecimen methods research.

The cost of sample collection for TCGA was $2,000 per sample set: the tumor sample, matching normal tissue, blood sample, and clinical data. Four major processes need to be considered when developing a cost model for biobanking: biospecimen collection and shipping, biospecimen processing, biospecimen storage management, and biospecimen retrieval and distribution. In addition to setup and maintenance costs, other economic issues should be considered when establishing biorepositories. Is cost recovery possible at the host institution? What is the value of the specimens and data collected? What is the cost of implementing best practices? Can the economic benefits of efficiencies of scale, such as implementing best practices and efficient informatics systems be quantified?

Questions and Comments

Question: In the past, some cohorts were storing their biospecimens at minus20oC. Are hormones and other small molecules stable if they are stored at that temperature?
Answer: Data have been published on that subject by other groups. OBBR has no new information.

Tumor Tissue Collection: The Experience of the Harvard Cohorts
Lorelei Mucci, Sc.D., Associate Professor of Epidemiology, Department of Epidemiology, Harvard School of Public Health

Patho-epidemiology is the integration of molecular epidemiology and pathology, incorporating tumor biomarkers and pathological data into epidemiological studies. Patho-epidemiologic studies allow for the refinement of causal factors in etiology and progression of cancer; identification of cancer subtypes; and discovery of tumor biomarkers associated with cancer outcomes and response to therapeutic interventions. It is important to establish a multidisciplinary team in the early stages of designing patho-epidemiology studies. Cohort studies at Harvard include collaborations between groups of epidemiologists, pathologists, molecular geneticists, and biostatisticians. Identifying pathologists with expertise in the specific cancer of interest is particularly important.

Tissue collection has been ongoing in the Health Professionals Follow-up Study (HPFS) since 1998 and in the Nurses' Health Study (NHS) since 1993. Tumor tissue was collected for many types of cancers in each study. Requests for tumor tissue samples from hospitals are well defined and specific for each disease. For example, for prostate cancer, three representative tumor blocks targeted at the nodule with the highest grade, one representative normal tissue block, the original H&E slides, and the pathology report are requested. Occasionally, hospitals will send only unstained slides. The success rate in collecting tissue samples varies by cancer type, ranging from about 30 percent for bladder cancer to more than 80 percent for Barrett's esophagus in the NHS, and 55 percent for pancreatic cancer to almost 80 percent for colon cancer in the HPFS.

A major challenge in tumor tissue collection is the time between diagnosis and tissue collection; many hospitals retain tissue blocks only for 10 years. As time goes by some blocks get selected for other studies and become unavailable. The success rate for obtaining tumor tissue for breast and prostate cancer is inversely proportional to the number of years after diagnosis; it is important to attempt to collect tissue samples soon after diagnosis.

Another major challenge is that some hospitals, particularly academic teaching hospitals, decline to ship tissue samples for research studies. In these situations, it is often possible for pathologists on the study and at the hospital to set up a collaboration so that the study can still access the tumor tissue. The cost of obtaining tumor tissues varies by hospital from no charge to hundreds of dollars per tumor block. It is often possible to negotiate with hospitals to reduce these fees for research studies.

An example of a patho-epidemiology collaborative project is a study using tumor biomarkers to understand the link between obesity and prostate cancer mortality. Men who are overweight or obese prior to cancer diagnosis are at increased risk to develop lethal prostate cancer over time compared to healthy weight men. Pathological examination of tumor slides revealed that chronic inflammation adjacent to the cancer was associated with increased BMI. In another study, chronic inflammation was associated with lethal prostate cancer. The TMPRSS2:ERG gene fusion commonly is associated with prostate cancer. In men carrying the gene fusion, risk of lethal prostate cancer increased with increasing waist circumference. An association of lethal prostate cancer with increasing waist circumference was not observed in men without the gene fusion.

Questions and Comments

Question: Reviewing and dispatching tissue blocks can be a significant burden on hospitals. Do the cohorts have any mechanisms to alleviate this burden?
Answer: When hospitals are reluctant to dispatch tumor blocks, we can try to include them as collaborators to reduce their work load.

Question: Do you return tumor blocks to the hospitals?
Answer: We keep the tissue blocks unless the hospital wants them returned. Only a small proportion of hospitals want the tissue blocks returned.

Question: Is it difficult to obtain tissue blocks for participants who have died and for whom you do not have informed consent?
Answer: In a few instances, hospitals ask for next-of-kin consent, but it is usually unnecessary.

Moderator: Julie E. Buring, Sc.D, Senior Epidemiologist, Brigham and Women's Hospital; Professor of Medicine, Harvard Medical School; Professor of Epidemiology, Harvard School of Public Health

NCI Cohort Infrastructure Program Announcement
Daniela Seminara, Ph.D., M.P.H., Senior Scientist and Consortia Coordinator, Epidemiology and Genomics Research Program (EGRP), DCCPS, NCI

The 29 EGRP grants that directly support cohorts make up 6 percent of the EGRP portfolio but account for 15 percent of total grant funding. The average cost of a grant supporting a cohort is twice the cost of a typical R01 grant. The oldest funded cohort is the Nurses' Health Study I, which has been funded for 35 years; the newest cohort is Lessons in Epidemiology and Genetics of Adult Cancer from Youth (LEGACY), which has been funded for 1 year; the average length of funding for a cohort study is 12 years. Supported core functions include high-throughput assessment of established genetic markers and data sharing across centers.

The NCI cohort infrastructure program announcement (PAR) was initiated 1 year ago. The goals of the PAR are to establish, maintain, or upgrade new or existing cohorts, and support their core functions; to support methodological research to evaluate and improve core infrastructure functions; and effectively manage the EGRP cohort portfolio, addressing gaps and scientific needs. To be eligible to apply for the PAR an epidemiologic risk or survivor cohort must have at least 10,000 participants. Epidemiologic risk cohorts must examine the effects of multiple exposures on the risks of multiple types of incident cancers and on cancer mortality. Survivor cohorts must examine the determinants of cancer prognosis, progression, and cancer-related outcomes among cancer survivors. Occupational cohorts are excluded from responding to this announcement.

Epidemiologic cohorts are required to collect blood samples or other sources of germline DNA. Survivor cohorts are required to collect tumor tissue with detailed characterization of cancer subtype and histopathology information, germline DNA, and cancer treatment information. All cohorts are required to have a robust data sharing plan and are encouraged to comply with NCI best practices for biospecimen resources, apply for Cohort Consortium membership, and participate in cross-cohort data harmonization. Cohorts will be evaluated on their ability to address anticipated scientific questions.

Applications requesting less than $500,000 per year require no pre-approval, applications requesting between $500,000 and $2.499 million require pre-approval from the Division Director, and applications requesting $2.5 million or more require pre-approval from the Division Director and the senior staff. Criteria for pre-review clearance include cost, uniqueness, and relevance to EGRP priorities. Past performance for ongoing cohorts also is considered. A majority of the applications submitted to date proposed annual budgets ranging from $500,000 to $2.5 million.

Fifteen applications are for new survivor cohorts, three are ongoing survivor cohorts, five are new epidemiologic risk cohorts, and four are continuing epidemiologic risk cohorts. A few of the applications propose converting cohorts designed for other diseases, such as cardiovascular disease and diabetes, into cancer cohorts. There are ethnic, twin, and familial cohorts among the applications, as well as patient cohorts focused on breast, prostate, lung, and testicular cancers.

The cohort infrastructure PAR is part of an overall strategic plan to support cohort research. Other future needs include improved communication tools such as web portals and a web-based descriptive cohort database; funding for data cleaning and harmonization across cohorts; and mechanisms to support collaborative research projects.

Questions and Comments

Question: Indirect costs vary greatly among institutions. Since pre-approval is based direct costs, when are indirect costs considered?
Answer: Total costs, including the indirect costs, are considered by the senior program leadership (SPL) after a grant application is reviewed and is eligible for funding.

Question: There are instances where epidemiology cohorts can also be used for survivorship studies. Have you received any applications for mixed epidemiology and survivor cohorts?
Answer: The proposals closest to mixed epidemiology and survivor cohorts to date are for familial cohorts.

One-Carbon Metabolism Pathway and Lung Cancer: a New Cohort Consortium Project
Paul Brennan, Ph.D., International Agency for Research on Cancer, The World Health Organization, Lyon, France

The initial hypothesis of the Lung Cancer Cohort Consortium (LCCC) was that compounds related to one-carbon metabolism (OCM), e.g. B-vitamins, might have important protective effects for the onset of lung cancer. To test this hypothesis, serum samples were needed from a large, prospective cohort study, including ex-smokers and never-smokers. Pilot data was generated from the European Prospective Investigation Into Cancer and Nutrition (EPIC) study, with B-vitamins and other OCM factors measured in 900 lung cancer cases and 1,800 controls. Results from the pilot study showed associations between levels of Vitamin B6, methionine, and folate a with lower risk of lung cancer. The strength of the associations indicate possible utility in risk prediction, particularly among former smokers. Participants with higher than median levels of Vitamin B6 and methionine had a 2.5-fold reduction in risk over those with lower than median levels of both. Taking into account regression dilution between blood samples, this risk reduction could be as large as 5-fold.

The specific aims for a R01 proposal for the LCCC project were to combine 24 cohorts to study 12,000 cases prospectively, collect biosamples from 5,100 cases and 5,100 controls (including 1,000 repeat samples for regression dilution analysis), measure a large panel of OCM markers, and provide cumulative risk estimates for OCM markers. Whether OCM markers are associated with histology, stage, and survival also will be evaluated. Genes associated with OCM biomarkers will be identified using a Mendelian randomization approach in collaboration with The International Lung Cancer Consortium (ILCCO). A subgroup of 336 samples will be used to identify OCM biomarkers associated with white blood cell methylation patterns. The biochemical analyses of serum samples will measure more than 40 biomarkers related to OCM, including folate, B-vitamins, homocystein, methionine, as well as other biomarkers such as cotinine and C-reactive protein.

The LCCC can be used as a platform for other biomarker studies, genetic studies, and studies of lifestyle and dietary factors. A mechanism will be established allowing investigators to propose additional studies. Unique features of the cohort include equal numbers of current, former, and never smokers; a large proportion of subjects with GWA data, two populations (Asian and European decent) with distinct genetic and dietary backgrounds; and high statistical power.

Human Papilloma virus (HPV) and Head and Neck Cancer
Aimée Kreimer, Ph.D., Investigator, DCEG, NCI

Studies indicate that HPV16 plays a causal role in oral cavity and oropharyngeal (OP) cancers; however, limited evidence indicates that HPV16 causes laryngeal tumors. Most studies have been cross-sectional, case-series, or case-control designs. One prospective study, published a decade ago by the Nordic Cancer Registries showed a 14-fold increase risk of OP cancers and a 2-fold increase in tongue cancer associated with HPV16 L1 seropositivity. A 2-fold increased risk in oral cavity cancer was detected, but was not statistically significant.

Using tumors to identify HPV status is the gold standard, however, using serology allows for the analysis of cases before disease onset. Serology is not susceptible to contamination and is inexpensive. High throughput analysis allows testing of multiple HPV genotypes and proteins to be conducted in parallel. HPV 16 L1 antibodies are markers of prior infection, whereas HPV E6 and E7 antibodies are markers of HPV-related cancer.

The aims of this study were to prospectively investigate the association of HPV infection and cancer risk at several sites of the head and neck, the interaction between HPV and tobacco use for head and neck cancers, the survival advantage among HPV associated tumors, and the roles of other serologic markers. A pilot study was conducted within the EPIC study, and serologic screening for 14 HPV genotypes and antibodies was completed in September 2011. The statistical analyses are ongoing. Cases were incident cancers of the oral cavity, OP, or larynx with at least one prediagnostic serum/plasma specimen available. Controls were individually matched for age and sex from most centers. HPV serologic results were obtained from 530 cases and 892 controls. The median time between blood draw and cancer diagnosis was 5.5 years and the median age at diagnosis was 62.8 years. More than 70 percent of the study population was male, 50 percent were 51-65 years old, 30 percent were over 65. More than 50 percent of the cases were current smokers compared to 25 percent of the controls.

Detection of HPV16 antibodies was associated with significantly increased risk of OP cancers. HPV types 31 and 33 also were associated with increased OP cancer risk. The HPV16 E6 antibody was associated with 34.8 percent of the cases and 0 percent of the controls, giving a positive predictive value of 100 percent. No significant associations were detected between HPV16 or other HPV types and cancers of the oral cavity or the larynx.

A proposal will be circulated to all members of the Cohort Consortium in 2011 to identify cohorts willing to participate in a larger study. Preliminary budgets will be drafted, protocols will be developed, and a working group will be formalized in 2012 pending the acquisition of grant funding. Participating cohorts will need prediagnostic plasma or serum from incident head and neck cancer cases and questionnaire data on tobacco and alcohol use. Tumor specimens from cases, repeat plasma or serum samples, follow up data on vital status and cause of death, and demographic data are preferred but not required.

Questions and Comments

Question: The CDC recently extended their HPV vaccination recommendations to include boys. What role can cohort studies play in evaluating the efficacy of vaccination against head and neck cancers?
Answer: The Cohort Consortium cohorts cannot address the question because HPV vaccination is too new. Vaccine efficacy trials would be the best study design to address the question.

African American Working Group
Nonye Harvey, M.P.H., Public Health Advisor, EGRP, DCCPS, NCI

The African American working group is focused on the analysis of pooled data from seven cohort studies examining the association of BMI and all-cause mortality in African American populations. Results from previous studies of BMI and mortality in African Americans have been inconsistent. The goals of this working group include resolving reasons for inconsistencies in the previous studies and comparing results to those in Caucasian and Asian populations.

Participating cohorts are limited to those that have at least 500 deaths and 5,000 African American participants. The seven cohorts represent about 45,000 total deaths. Barriers to performing this work include lack of funding for preparation of data files, data pooling and harmonization, and data analyses. The participating cohorts agreed to prepare their own data files. EGRP is providing some support for a contractor to harmonize a limited dataset across the seven studies and prepare an analytic data file. EGRP staff coordinate material transfer agreements and the analysis will be carried out at the International Epidemiology Institute.

Progress to date includes defining the project scope, identifying the inclusion and exclusion criteria and essential covariates, and listing the available data elements from each cohort. Material transfer agreements and the analysis plan will be finalized soon.

Questions and Comments

Comment: This is a project in its early stages. The next step is for large cohorts and incidence data to study rare cancers in African Americans. A rare cancer R01 grant application is in development.

Question: In this very large consortium do you have a subset of participants with data on waist circumference?
Answer: Of the seven cohorts, only one has data on waist circumference.

Updates from the Obesity Working Group
Amy Berrington de Gonzalez, D.Phil., Investigator, Radiation Epidemiology Branch, DCEG, NCI

The Obesity working group was initiated approximately 3 years ago with the aim of investigating the association of BMI with mortality in Caucasian populations. The working group currently consists of more than 20 cohorts, which include more than 1.5 million participants. Exposure data are collected on BMI, waist circumference, and physical activity. Outcomes of interest include rare cancers as well as mortality. Potential confounders include smoking and alcohol use, education, and co-morbidities.

A recent study examining the Asian subpopulation within the cohort showed an increase in mortality with increasing BMI that was similar to patterns observed in Caucasian populations. This study population consisted of 10 cohorts, with 25,000 participants and 2,000 deaths. The study population was primarily Asians living in the United States. The observations are distinct from associations of BMI with mortality observed in Asian populations living in Asia.

BMI is an imperfect measure of body composition and body fat. Eleven of the cohorts collected waist circumference data and included 650,000 participants with 78,000 deaths. The risk of mortality increases almost linearly with increasing waist circumference with hazard ratios of 1.5 for the largest segment of waist circumference. Increasing waist circumference profoundly increases risk of death by cardiovascular disease; however, increased risk of cancer death and other cause specific deaths also contribute to the increased risk of all-cause mortality.

The physical activity subgroup includes 6 cohorts with 650,000 participants and 82,000 deaths. Physical activity was measured in metabolic equivalent task hours of leisure time physical activity per week. Compared to participants with normal BMI and high activity levels, high BMI and low activity levels were associated with increased mortality. The average number of life years lost due to low activity was similar regardless of BMI.

A new initiative within the Obesity Working Group is studying incident rare cancers. It includes 23 cohorts with 3,000 cases of thyroid cancer, 1,000 cases of gallbladder cancer, and 4,000 cases of head and neck cancers. The physical activity subgroup also is examining incident cases of all cancers; the 11 participating cohorts include 200,000 incident cancer cases.

Moderator: Julie R. Palmer Sc.D., Senior Epidemiologist, Slone Epidemiology Center at Boston University; Professor of Epidemiology, Boston University School of Public Health

Feedback from the Cohorts: Communications, Opportunities, Burden
Julie E. Buring, Sc.D, Senior Epidemiologist, Brigham and Women's Hospital; Professor of Medicine, Harvard Medical School; Professor of Epidemiology, Harvard School of Public Health
Anne Zeleniuch-Jacquotte, M.D., Associate Professor, Department of Environmental Medicine, New York University School of Medicine

A survey of Cohort Consortium investigators solicited feedback in three major areas:
(1) participation, leadership, and authorship; (2) funding; and (3) communications. The investigators provided positive feedback for the annual meeting, the collaborative spirit of Cohort Consortium members, and the support from EGRP program staff. Feedback also favored bundling research proposals so they can be reviewed in batches.

The need to clarify the process for submitting research proposals and amending ongoing projects was identified as a barrier to collaboration. The criteria for approving projects also are not clear. Some respondents suggested there should be more formal peer review. Engaging active participation of investigators who contribute data to a consortia project was identified as a challenge. It also was suggested that the questionnaires used by each cohort be made available to all investigators.

Many comments concerned authorship issues. Respondents suggested that authorship should be discussed at the beginning stages of a project and that authorship agreements be put in writing. Prime authorship should not be guaranteed without commensurate work.

Often, consortia projects have insufficient funding for data dispatches, sample preparation and shipping, and investigator time. Consortia projects can require significant amounts of time, especially dealing with email and teleconferences. Some respondents wondered if investigators who submit proposals should be required to obtain sufficient funds to support the project during the approval process.

A suggestion to improve communication was developing a mechanism for more frequent updates on the activities of the individual groups, perhaps using email or a web portal. There needs to be better communication when a cohort makes a decision in the context of one project that will affect other ongoing projects. In addition, the Cohort Consortium website should be updated more frequently.


Establishing Data Use Agreements (DUAs) with the various institutions associated with the Cohort Consortium can be lengthy and is a barrier to collaboration. Despite improvements in the process in recent years, some projects still experience significant delays. For example, establishing DUAs for the diet pooling project took 9 months. This delay was primarily caused by a legal disagreement between two different institutions over terms of the agreement. It was suggested that the Cohort Consortium could adopt a standard DUA. It was unclear if a standard DUA was feasible, however, since different institutions have different policies and priorities when making these agreements. Different collaborative projects also have different nuances, which may require different agreement terms.

Second to funding, authorship was the most frequent issue cited by respondents to the Cohort Consortium feedback survey. The order of the names of authors are important in some countries. There were concerns that the authorship positions were not always representative of the contributions of those authors to manuscripts. There was a consensus that it would be helpful to have written, flexible authorship guidelines. Many consortia have established authorship guidelines that provide useful examples. It was suggested that access to the authorship policies of other consortia might be helpful in addressing authorship issues within the Cohort Consortium. Common themes could be isolated from them to use as a basis for creating authorship guidelines for the Cohort Consortium.

Brainstorming: Best Thing to Do Next
Moderator: Patricia Hartge, Sc.D., Epidemiology and Biostatistics Program, DCEG, NCI
Moderator: William J. Blot, Ph.D., Professor of Medicine, Vanderbilt University; Chief Executive Officer, International Epidemiology Institute
Moderator: Deborah M. Winn, Ph.D., Deputy Director, DCCPS, NCI

The following points were raised during this session.

  • NIH Institute directors disagree as to establishing new cohorts or utilizing the existing ones. There is little consensus on the relative merits of issues such as data access policies, new data collection methods, and population representativeness. Some directors support the idea of a national cohort, while others believe that no resources should be devoted to it. Establishing a national cohort, whether a consortia of existing cohorts or a new cohort, will require support from the Institute directors who control most of the research funding at NIH. The Cohort Consortium and its accomplishments are not broadly known and are not informing this debate. It may be useful for the next meeting of the Cohort Consortium to be a symposium on the NIH campus, open to all Institutes, highlighting the methods and accomplishments of the consortium.
  • It is difficult to obtain a truly random sample in a new cohort. One of the criticisms of using existing cohorts is that they are not diverse enough. This meeting has demonstrated, through initiatives like the African American Working group, that although the cohort populations may not be geographically diverse, they are ethnically diverse.
  • A common criticism of using existing cohorts is that most exposure data has been collected through questionnaires and not technological measures, such as GPS devices to measure physical activity. On the other hand, methodological studies have not shown that technological measures of exposures are significantly better than questionnaires.
  • GWAS are not a strength of the Cohort Consortium, but studying multiple outcomes is very feasible; almost half of the cohorts have active follow-up.
  • Studying multiple endpoints is interesting; however, validation is an issue. There are significant scientific advantages in using existing cohorts for these studies; they have decades of follow-up data. Self-reported outcomes are often not reliable. Linking cohort data to Medicare data might be particularly useful for survivorship studies, although Medicare linkage can be very costly and difficult.
  • A lack of clinical information is a significant challenge for studying outcomes that those cohorts were not designed to study. Some outcomes are too rare to study prospectively, and the death certificate is not an accurate source of data. When studying multiple outcomes, the endpoints should strategically chosen.
  • GWAS have demonstrated that well-powered epidemiologic studies can detect associations without meticulous documentation of the phenotype.
  • Because of the extensive biospecimen repositories, significant opportunities exist for the Cohort Consortium to study biomarkers and metabolomics.
  • Increasing awareness of the Cohort Consortium at other institutes could be important for forming collaborations examining non-cancer outcomes.
  • The Cohort Consortium could leverage its position as a large group to work with industry in developing assays for biomarkers and metabolomics.
  • The cancer cohorts tend to be much larger than cohorts for other diseases. They may be more powerful for smoking, heart disease, or diabetes studies than cohorts designed for those exposures and outcomes.
  • In the next year it should be possible to initiate three new projects, write a white paper, and hold a symposium.

Return to Top