2017 Annual Meeting
The annual NCI Cohort Consortium meeting, sponsored by EGRP and the Division of Cancer Epidemiology and Genetics (DCEG), was held on November 13-14, 2017, at the NCI Shady Grove Campus in Rockville, MD. Project/Working Group meetings were also held during this time.
Day 1: Monday, November 13, 2017
|12:30 p.m. – 1:20 p.m.||
SESSION I: Opening Plenary
Welcome and Introductions
|12:35 p.m. – 12:45 p.m.||
Opening Remarks and NIH Updates
|12:45 p.m. – 1:10 p.m.||
|1:10 p.m. – 1:20 p.m.||Open Discussion|
|1:20 p.m. – 4:15 p.m.||
SESSION II: Setting New Directions for the Cohort Consortium
|1:20 p.m. – 1:25 p.m.||
World Café – Introduction
|1:25 p.m. – 1:30 p.m.||
Charge to the World Café Table Groups
|1:30 p.m. – 2:00 p.m.||World Café – Round 1|
|2:05 p.m. – 2:20 p.m.||Break|
|2:25 p.m. – 2:45 p.m.||World Café – Round 2|
|2:50 p.m. – 3:05 p.m.||World Café – Round 3|
|3:05 p.m. – 3:20 p.m.||Break|
|3:20 p.m. – 4:10 p.m.||
World Café Report Out and Open Discussion
|4:10 p.m. – 4:15 p.m.||
|4:15 p.m. – 5:25 p.m.||
SESSION III: Methodologic Insights into Cohort Consortium Studies
|4:15 p.m. – 4:40 p.m.||
Ghost-Time Bias from Imperfect Mortality Ascertainment in Aging Cohorts
|4:40 p.m. – 4:50 p.m.||Open Discussion|
|4:50 p.m. – 5:15 p.m.||
Innovative Data Capture for Cancer Surveillance and Cohort Studies
|5:15 p.m. – 5:25 p.m.||Open Discussion|
Day 2: Tuesday, November 14, 2017
|8:30 a.m. – 9:05 a.m.||
SESSION III (Continued): Methodologic Insights into Cohort Consortium Studies
|8:30 a.m. – 8:55 a.m.||
Cohort Consortium Data for the Development, Validation and Estimation of the Utility of Risk Models for Disease Prevention
|8:55 a.m. – 9:05 a.m.||Open Discussion|
|9:05 a.m. – 10:05 a.m.||
SESSION IV: Cohort Consortium Project / Working Group Reports
|9:05 a.m. – 9:15 a.m.||
Liver Cancer Pooling Project
|9:15 a.m. – 9:25 a.m.||
|9:25 a.m. – 9:35 a.m.||
Pooling Project of Prospective Studies of Diet and Cancer
|9:35 a.m. – 9:45 a.m.||Open Discussion|
|9:45 a.m. – 9:55 a.m.||
Biomarkers and Breast Cancer Risk Prediction
|9:55 a.m. – 10:05 a.m.||Open Discussion|
|10:05 a.m. – 11:15 a.m.||
SESSION V: Poster Session
|11:15 a.m. – 12:00 p.m.||
SESSION VI: New Concept Proposals and Next Steps
|11:15 a.m. – 11:25 a.m.||
|11:25 a.m. – 11:50 a.m.||Open Mic – New Concept Proposals|
|11:50 a.m. – 12:00 p.m.||
|12:00 p.m. – 1:00 p.m.||Lunch|
Cohort Consortium Project / Working Group Meeting Agenda
|9:00 a.m.||Premenopausal Breast Cancer Collaboration Group (Tony Swerdlow, Hazel Nichols)|
|9:30 a.m.||Second Cancers / Survivorship Working Group (Joanne Elena)||Ovarian Cancer Cohort Consortium (Shelley Tworoger, Nico Wentzensen)|
|11:00 a.m.||Steering Committee Working Lunch||Vitamin D Pooling Project of Breast and Colorectal Cancer (Stephanie S-Warner)|
Tuesday, November 14, 2017
|1:00 p.m.||Prostate Cancer Working Group (Lorelei Mucci, Eric Jacobs, Michael Cook)||Markers of HPV Infection and Risk of Head and Neck Cancer (Aimee Kreimer, Mattias Johansson)||Lymphoid Malignancies Working Group (Lauren Teras, Jon Hofmann)|
|2:00 p.m.||Diet and Cancer Pooling Project (Stephanie S-Warner)|
|2:30 p.m.||Lung Cancer Cohort Consortium (Paul Brennan, Mattias Johansson)|
|3:30 p.m.||Physical Activity Working Group (Steve Moore, Charles Matthews)||PanScan (Rachael Stolzenberg-Solomon, Laufey Amundadottir)|
The 2017 Annual Meeting of the NCI Cohort Consortium was held at the NCI Shady Grove Campus in Rockville, MD on November 13-14. This summary reflects the portions of the meeting involving all participants. The other portions of the meeting were dedicated to multiple, simultaneous working group meetings.
Session I: Opening Plenary SessionView Session I Details
Moderator: Anthony Swerdlow
Dr. Kathy Helzlsouer introduced the session moderator, Dr. Anthony Swerdlow, who is the outgoing Chair of the Cohort Consortium Steering Committee. She also thanked three members who are rotating off the Steering Committee and welcomed incoming Steering Committee Chair Dr. Giske Ursin, Chair Elect Dr. Celine Vachon (2019), and four new members, including 2020 Chair Elect Dr. Lynne Wilkens.
Dr. Swerdlow noted that a key purpose of this meeting is strategic planning, beginning with the World Café brainstorming activity. Input gathered during this activity will be used by the Cohort Consortium Steering Committee to develop a strategic plan for the future of the Cohort Consortium.
During the past year, the Steering Committee has been involved in reviewing proposals for several new working groups (WGs), as well as progress reports from several existing WGs. Following a recommendation made during the 2016 Cohort Consortium meeting, the Steering Committee drafted an overview paper describing the Cohort Consortium that will be submitted for publication. The paper will be distributed to the principal investigators (PIs) of all cohort studies for review prior to submission.
The Cohort Consortium now includes 63 cohorts and 42 active WGs/projects. Most WGs will accept new members for future analysis work. Four new WGs are recruiting new members, including the Anthropometrics and Family History of Cancer WGs. Participants interested in joining a WG should contact Nonye Harvey. More information about the WGs is available on the Cohort Consortium website.
Eleven WG sessions were held either immediately prior to or immediately after the main plenary session of the Cohort Consortium meeting.
Opening Remarks and NIH Updates
Debbie Winn and Stephen Chanock
Dr. Debbie Winn, Deputy Director of NCI’s Division of Cancer Control and Population Sciences, provided an update on NIH activities relevant to Cohort Consortium members. The United States Congress passed the 21st Century Cures Act in December 2016. This Act authorized funding, primarily for the NIH, to accelerate progress in cancer research through initiatives such as the Beau Biden Cancer Moonshot initiative. NCI now has active Funding Opportunity Announcements (FOAs) that respond to the 10 areas for research acceleration identified by the Cancer Moonshot Blue Ribbon Panel. NCI expects to release more FOAs responding to the 21st Century Cures Act in the near future. Only investigators in the United States may respond to these FOAs, although most other NCI funding opportunities are open to foreign investigators.
Dr. Norman Sharpless will serve as the new NCI Director. Dr. Doug Lowy will stay on as NCI’s Deputy Director. Dr. Sharpless comes from the University of North Carolina, where he served as the director of the Lineberger Comprehensive Cancer Center. Dr. Stephen Chanock, director of NCI’s Division of Cancer Epidemiology and Genetics, recently met with Dr. Sharpless. Dr. Chanock indicated that Dr. Sharpless is interested in the application of biomarkers and is supportive of both precision medicine and precision prevention. Drs. Winn and Chanock were scheduled to meet with Dr. Sharpless on November 13, 2017, and discuss the Cohort Consortium. Dr. Chanock also announced that the current nominee for U.S. Secretary of Health and Human Services is Mr. Alex Azar, former president of Lilly USA, a graduate of Yale Law School, and a former law clerk for the late Supreme Court Justice Antonin Scalia.
Data sharing is a priority at NCI, and there is an ongoing discussion across NIH regarding when and how data should be shared. Dr. Francis Collins believed that data from epidemiologic and clinical studies should be in databases within 24 hours of collection. Dr. Chanock indicated that researchers should do their best to make data from well-curated data sets available for public use as soon as possible, particularly for meta-analyses. Participants generally agreed that data should be made publicly available when study findings are ready to be submitted for publication. Currently, according to NIH requirements, data must be released when a study is accepted for publication.
The All of Us Study on precision medicine (and prevention) is underway, although it still is in the early phases. The study already has begun some groundbreaking work in the development of IT infrastructure and recruitment. The All of Us Study likely will present many opportunities for Cohort Consortium investigators in the future.
Dr. Walter Willett, Professor of Epidemiology and Nutrition at Harvard T.H. Chan School of Public Health and Professor of Medicine at Harvard Medical School, is among the most cited people in science and has been responsible for much new knowledge about cancer etiology over the past few decades.
Dr. Willett began his keynote address with an overview of important findings over the past few decades regarding factors linked to the development of cancer and cancer survival. These cohort study findings reflect the importance of cancer epidemiology research. The NCI has made major investments in large cohorts and has been the only NIH Institute that has supported substantial methodological research to enhance assessment of diet and physical activity.
Cohort studies have produced evidence demonstrating that approximately half of cancer incidence and mortality is attributable to documented, modifiable factors. Dr. Willett emphasized the importance of cohort studies with null findings that sometimes debunk findings from earlier studies about protective and risk factors for cancer. Cohort studies with null results have led to modifications of dietary guidelines. NCI-funded cohort studies also have developed and tested new methods and measures that could benefit future research across disciplines. For example, a study of cognitive decline and the Mediterranean diet has used a set of six questions that are highly effective in assessing subjective cognitive decline. Dr. Willett is willing to share these questions with other cohort investigators. In addition, many cancer cohort studies have generated important findings relevant to diseases other than cancer, such as cardiovascular disease, diabetes, and neurodegeneration. For example, cohort studies have found that type 2 diabetes is almost entirely due to modifiable risk factors and is therefore largely preventable.
Other types of studies cannot replace the follow-up and pooling of large quantities of data characteristic of cohort studies. Cohort studies are critical to the study of cancer, a disease that develops over decades. Repeated measures are important to assessing the impact of exposures over time and for examining latency. The use of repeated measures of exposure is a growing area of investigation that offers studies much greater power. For example, a long-term study using repeated measures found an inverse relationship between folate intake and colorectal cancer, but these effects took 16 years to appear.
Analysis of specific tumor characteristics is an important research area for cohort studies. For breast cancer, hormone receptor status now can be readily obtained from pathology records. Tissue is increasingly available for studies of tumor characteristics, which will allow further molecular characterization of tumors. For example, methylation analyses can identify specific subsets of colorectal cancers for which folate is likely to be protective; reproducible findings of this type can strengthen evidence for causality. Large numbers of subjects resulting from collaboration among cohorts will be important for these types of analyses.
Future cohort studies should examine additional aspects of early-life exposures, diet, -omics using existing biological samples, microbiome in new samples, and populations outside the United States. The microbiome field now is mature enough to apply to epidemiologic studies. Studies of populations outside the United States are needed to obtain knowledge about a wider range of exposures.
A funding opportunity, PAR-17-233 (Core Infrastructure and Methodological Research for Cancer Epidemiology Cohorts), will maintain existing cancer epidemiology cohorts (CECs) but has limited direct costs. Additional funding might be needed to add younger cohort members because current cohorts are aging. The current budget also limits integration of new research areas into CECs, such as studies of the microbiome and molecular tumor characteristics, particularly if adequate tissue has not already been collected. The Precision Medicine cohort could be used for some studies, but follow-up is uncertain, the waiting period for cancer outcomes is decades long, experienced cohort investigators might not have input on study design, and opportunities for student and postdoctoral investigator training are lacking. Low-cost new cohorts with passive registry follow-up are another option, but repeated measures and certain outcomes would be missed. Other options include leveraging investments in existing cohorts for longer follow-up, data pooling support beyond unfunded mandates, support for repeated exposure assessments, and support for integrating microbiome research and tumor tissue collection. Some opportunities exist for tumor tissue collection but primarily for less common tumors. Investigators need to catalogue tissue that currently is available.
Dr. Willett recommended launching a new extramural cohort to refresh and fill gaps in other cohorts so that early-life exposures can be examined, as well as a wider range of endpoints. Multiple Institutes at NIH could fund this cohort, which should study multiple diseases to be most cost-effective. The cohort could be largely web based and employ new technologies where appropriate to improve efficiency. The cohort should include biomarkers, support data sharing, and possibly extend to populations outside the United States, as well as integrating interventions.
Participants discussed the feasibility of a new cohort that would involve an intra- and extramural partnership. Most participants agreed this was a good idea because cohorts need further development. NCI needs to make a strong case for what has been achieved by the Cohort Consortium. The cohorts have contributed much genomic data. If cohorts cannot be expanded, they should try integrating such activities as tissue collection and analysis into their efforts. Collaboration across cohorts is critical, especially for pooling bio specimen data and developing standard methods for analyzing these data.
Session II: Setting New Directions for the Cohort ConsortiumView Session II Details
Setting New Directions for the Cohort Consortium
Dr. Susan Gapstur, vice president of epidemiology at the American Cancer Society and PI of the Cancer Prevention Study II, discussed the results of the survey of CEC investigators conducted earlier this year. Three themes came out of this survey: (1) future directions for the Consortium with the greatest potential impact, (2) ways to move scientific activity forward across the Consortium, and (3) ways to improve general Consortium operations and how NCI could support these improvements.
World Café Introduction and Charge
Participants engaged in a World Café activity to discuss the three themes generated by the survey. Ms. Shannon Connolly of the NCI Office of Workforce Planning and Development explained the World Café process. Discussions were conducted at 15 different tables, and each table was assigned one of the three discussion areas. Discussions occurred in three timed rounds so that participants could change tables, allowing everyone an opportunity to discuss each of the three areas. Participants engaged in discussions with different people during each round. Rounds were expected to be shorter as time progressed. Discussions at each table were facilitated and discussion themes documented. Participants were asked not to edit their own or criticize other’s ideas. They also were asked to treat all discussion participants equally because experience in the field would not be relevant to discussions. The goal was for every idea to be considered and captured; consensus was not necessary.
World Café Report Out and Discussion
Three participants were asked to deliver a summary of the themes and common trends heard during discussions of each of the three topics. Ms. Connolly emphasized that all points made during each discussion would be documented.
1. Operations: Communication, Making It Work, Collaboration, and Collegiality
Dr. Stefanie Nelson summarized the discussion of Cohort Consortium operations. With regard to communication, discussants generally agreed that the web portal could be improved to make it more useful to CEC investigators. For example, information on the portal could be made clearer and easier to find. Participants would like to see information on the portal about what other groups are doing, available resources, and publications. They also want information about NCI and NIH policies that affect CEC operations. Discussants suggested posting stories about successes, failures, and lessons learned about what did and did not work. Discussants suggested investigating ways to link with social media platforms to engage the public and recruit both younger cohort members and new investigators.
Participants also discussed how the CECs could work more effectively. Some discussants were interested in cloud-based analysis instead of having to transfer data, which is burdensome. Data use transfer agreements (DUAs) are time consuming, and CEC investigators would like NCI to develop a standard DUA template to use across groups. Discussants noted the difficulty in tracking progress on projects that involve multiple cohorts and suggested creating a Dashboard. The Dashboard could be used to announce when a project is initiated and when it is stopped or delayed. Discussants also requested guidance on when to “sunset” a project. In addition, discussants emphasized the importance of mentorship, which is needed to keep cohorts growing and moving forward. Mentorship is needed both for new investigators and for established investigators who recently joined the Cohort Consortium. They recommended forming a Trainee WG for this purpose, the chair of which should be on the Cohort Consortium Steering Committee.
With regard to collaboration, discussants indicated that they wanted more meetings. They recommended a midyear Consortium meeting, perhaps at some location other than NCI. The national meetings should have longer WG meetings with less overlap across them. Participants also recommended satellite meetings at larger conferences. In addition, discussants recommended more overt progress updates and recognition for the progress that people have achieved. Participants expressed a strong interest in knowing what their colleagues are doing. They believed that while this information may exist somewhere, it should be disseminated widely.
2. Operationalizing Science: Increasing Information Effectively
Dr. Joanne Elena noted that some discussants wanted more meetings and communication, whereas some wanted fewer. Technology might be used to increase desired meetings and other communications without overburdening investigators. Discussants indicated the need to facilitate data sharing, perhaps at a central repository; this would include the use of metadata and codes for data harmonization. They also wanted standardized templates for DUAs and other repeated forms. NCI already has developed some of this infrastructure, but many CEC investigators are unaware of these resources. Improved IT infrastructure would help investigators stay abreast of existing resources so they are not “reinventing the wheel.” Many groups already have harmonized variables that can be accessed by investigators across the Consortium. Improving communication and the dissemination of resources would decrease time spent on administrative tasks. Investigators also would like help from NCI to set up calls, monitor websites, and facilitate data sharing to increase the time investigators spend on research. Discussants want an interactive online place (possibly the portal) where they can post and respond to questions and collaborate with other investigators. They also want more information about WGs, including reports of activities, progress updates, and contact information for each WG. Minutes from WG meetings are being circulated now.
Discussants emphasized the need to assist both new investigators and established investigators who are new to the Cohort Consortium with navigating the system. Consortium directors also could benefit from this type of training.
In addition, discussants wanted support to leverage new technology to facilitate follow-up. Lessons learned and best practices should be shared across groups to increase efficiency and avoid redundancy. For example, NCI might arrange best practices webinars and seminars at professional meetings. Some groups have been conducting best practices seminars, which participants have found useful. Project managers, not just PIs, should be involved in these webinars because they are more familiar with day-to-day operations.
Participants discussed some creative funding mechanisms that could support data analysis, such as a 10 percent full-time equivalent analyst or separate funds for data sharing, supervision of analysis, and so on. Participants also would like a Cohort Consortium Data Coordinating Center. There is a need to define roles and responsibilities related to data to ensure that the same data are not requested for the same kinds of analyses. Investigators within the Consortium should only need to request data for new research questions, and harmonized data should be shared. A tumor biopsy bio base also would be useful. NCI should support the exploration of noncancer cohorts for cancer endpoints and demonstrations of the feasibility of data harmonization.
Participants expressed interest in branding the Cohort Consortium, which would be important because methods, tools, and other resources developed by Cohort Consortium investigators are shared with other researchers.
A participant noted that cohort investigators trying to obtain tumor tissue must contact many cancer hospitals in the United States. Some hospitals, particularly the larger cancer centers, do not want to share tissue because they use it for their own research. Participants would like help from NCI in obtaining access to enough tissue samples (either temporarily or permanently) to perform their research.
With regard to data harmonization, many new biomarker technologies are emerging, which provide an incentive for investigators to pool and share data across different platforms. Studies are needed to examine the feasibility of pooling biomarker data across different platforms.
3. Science Direction
Dr. Somdat Mahabir began his summary of the discussion on science direction by noting that discussants agreed that research on rare, high-mortality cancers was important. The entire Consortium should be leveraged to advance this research area. Discussants also emphasized the need to increase racial/ethnic diversity in cohorts and to collaborate to examine exposures across the life span. Rare exposures need to be included in this research. More work also is needed on early detection methods.
Participants in the science direction discussions also mentioned the need for standardization and calibration of existing data, especially on exposures and outcomes, to facilitate reproducibility. They reiterated the need for more tissue-based assessment, which would involve more timely collection of tumor samples, particularly for newly identified tumor types.
Other scientific areas of interest to discussants included—
- Nested intervention studies within cohorts
- Increased assessment of environmental exposures
- Microbiome studies
- Methodologic studies for reproducibility
- Survivor studies (current cohorts include many cancer survivors)
- Linkages to healthcare system data and more passive cancer data collection
- Studies of medication use among cancer patients and survivors
- Noncancer outcomes
- Risk prediction
- Healthy aging and longevity
Participants suggested obtaining more input from different stakeholders about research directions.
Participants emphasized the importance of measurement standardization/calibration to inform policies and guidelines (e.g., calibrating vitamin D measurement). Standardization and calibration across cohorts also will improve collaboration. Participants also noted the importance of integrating tumor tissue data with epidemiologic, clinical, and pathology data from other sources.
Another priority might be identifying research questions with clinical, public health, and policy implications. Stakeholder input could help identify research gaps that need to be filled to inform new policy and guidelines. Stakeholders should include policymakers who develop the guidelines. Their input is needed to determine what research questions they need answered to develop more useful and complete guidelines.
Participants added that methodologic and statistical biases must be considered as data are merged from different cohorts. Data are collected over broad and varying time periods. The amount of missing data also varies at different levels. Studies are needed that evaluate different approaches to combining data (e.g., when data pooling versus meta-analysis is more appropriate).
A report of World Café results will be drafted and will include additional comments. A webinar is planned for early 2018 and the Steering Committee will develop an action plan and discuss future directions with the Consortium members.
Session III: Methodologic Insights into Cohort Consortium StudiesView Session III Details
Facilitator (Day 1): Meir Stampfer
Ghost-Time Bias from Imperfect Mortality Ascertainment in Aging Cohorts
Dr. Eric Jacobs of the American Cancer Society coined the term “ghost-time bias” as a result of a study of social isolation and all-cause mortality in elderly people that was part of the longitudinal Cancer Prevention Study II. Deaths in this study were identified through a National Death Index (NDI) linkage. Social isolation was measured on a four-point scale used in many other studies of social isolation. The study found that, as expected, social isolation was associated with higher mortality for participants in their 60s, 70s, and even early 80s, but, contrary to expectations, was associated with lower mortality for participants 90 or older. The trend towards lower relative risks at higher ages was more marked for people missing a Social Security number (SSN). Investigators believed this trend might be related to an accumulation of missed deaths. Linkages miss about 3 to 7 percent of deaths for people with a SSN and about 12 percent of deaths among those without a SSN. Missed deaths result in “ghost-time bias,” which results from people remaining in a data set after they die. Ghost-time bias accelerates with advancing age following the same pattern for both men and women, although it begins about 2 to 3 years later among women. The lower the relative risk (RR) for all-cause mortality, the less ghost-time bias occurs. Ghost-time bias is driven by and influences all-cause mortality. The older the age distribution, the more ghost-time bias there is. Longer follow-up periods also lead to ghost-time bias. The more strongly an exposure is associated with all-cause mortality, the more it will be affected by ghost-time bias.
Ghost-time bias can be minimized by striving for maximally complete death ascertainment up front—for example, ensuring that the complete name and SSN information are collected for all study participants. Use supplemental methods in addition to NDI linkage to verify the death status of the oldest members of a cohort. If appropriate, censor data after a maximum age threshold where ghost-time bias is likely to have a significant impact (e.g., age 90 or 95).
One participant noted that he had reviewed data for the oldest members of the Nurses’ Health Study and found that ghost-time bias was a widespread problem. He asked participants if they had encountered this problem and what they had done to examine and mitigate it.
Another participant observed ghost-time bias in the Nurses’ Health Study of ovarian cancer, which saw an attenuated RR with longer follow-up. The investigators believed this pattern was due to a declining ability to confirm ovarian cancer. They also thought the trend might be due to competing risk of death from other diseases or poor reporting of the outcome. A participant asked about competing causes of death as a possible alternative explanation for the decreased RR. Dr. Jacobs responded that competing causes of death are linked to all-cause mortality. It would be useful to examine the proportion of bias linked to competing causes of death and missed deaths. Participants wondered whether the same phenomenon would occur with outcomes other than mortality, such as cancer itself, because not all cancers are detected. Ghost-time bias also could occur at younger ages among cohorts of cancer patients who die at younger ages. Participants added that deaths of many immigrants in cohorts could be missed because they often migrate back home when very old or sick. They also noted that recurrence might be affected by ghost-time bias, when it is captured.
Innovative Data Capture for Cancer Surveillance and Cohort Studies
Dr. Lynne Penberthy, Associate Director for NCI’s Surveillance Research Program (SRP), discussed strategic approaches to filling gaps in knowledge through novel methods of data capture in cancer surveillance. These approaches include NCI Surveillance, Epidemiology, and End Results (SEER) program linkages with commercial entities to capture missing data on treatment, genomics, and outcomes; development of novel methods to extract data from unstructured text documents automatically; and potential use of SEER and other surveillance data to support research through new infrastructures.
Currently, most surveillance data are collected from hospital medical records and linkages with population-based death registries. This approach will not provide the clinically relevant data needed for cancer research.
SEER currently covers approximately 30 percent of the U.S. population, and registry coverage is expected to expand with the open recompetition of the SEER registry contracts. At this point, however, SEER lacks detailed longitudinal treatment data and data on outcomes other than survival and cause of death. More comprehensive genomic testing data are needed to improve characterization of cancers at time of diagnosis and recurrence.
SEER staff are developing the following solutions:
- Identifying and prioritizing surveillance and research community needs
- Enhancing efficiency and completeness and expanding clinical data through linkages to capture current and new data items, the development of tools for automation (natural language processing/machine learning), and collaborations with commercial and public data providers
- Improving the ability of the SEER program to support research through real-time case eligibility assessment for cohorts, clinical trials, and other studies; a virtual pooled registry (VPR); and a virtual SEER linked biorepository (VTR)
Registries currently lack treatment data on both oral anti-neoplastics and infusion chemotherapy. Collection of population-based treatment data would allow monitoring of compliance, adherence, and quality of care and identification of disparities and adverse events. Oncology practices mostly perform infusions and need to be covered by cancer registries. Pharmacy data needs to be captured for information on oral treatments. The SEER program is seeking to work with large U.S. pharmacy chains that have central data repositories. SEER has an agreement with Walgreens and is reaching out to Walmart and Rite Aid. Data from oncology practices and pharmacies have a standardized format and could be automatically added to registry data.
SEER is working to receive claims data from physician practices, which will provide information on treatment over time. Medicare claims data have been linked to SEER since the early 1990s. The NCI Cancer Moonshot sponsored a workshop with six major health insurance companies with the goal of expanding linkage to claims data. The Kentucky and Seattle SEER registries already receive claims data from insurance companies. In addition, SEER has an agreement with Unlimited Systems, which collects claims data for all payers from oncology practices. The Georgia SEER registry already has about 4.5 years of Unlimited claims data. An additional five SEER registries will begin receiving claims data from Unlimited this year.
The SEER program is interested in capturing outcomes other than survival. For example, researchers want data on recurrence. Accurate measurement of recurrence, however, requires multilayered, combined data sources and new methods. Natural Language Processing (NLP) will allow a more comprehensive capture of recurrence data from medical records. SEER is working with the U.S. Department of Energy (DOE) to develop machine-learning algorithms to extract recurrence information from unstructured text documents. SEER also is working with its partners to test methods for capturing patient-generated data from patient portals, direct patient reporting, and other patient-generated data sources. Two studies are underway to test the feasibility of capturing patient-generated data.
Insufficient genomic data are available to characterize cancer. SEER has developed collaborative partnerships with genomics laboratories to help resolve this problem by linking data across all SEER registries for BRCA mutation panel testing. Two laboratories already have developed BRCA panels. SEER also is engaged in discussions with Foundation Medicine, a large genomic testing company, to conduct pilot studies to link panels to SEER data. In addition, SEER is in discussions with other commercial companies to obtain data from multigene panels. Genomic testing data are important because (1) germline testing can be used to individualize treatment, (2) population-level BRCA mutation panels can help identify variance in population subgroups, and (3) real world data that are generated outside the clinical trial setting will expand the capacity to conduct research.
Dr. Penberthy discussed the benefits of the VTR, and many participants expressed interest in this resource. The VTR is population based and provides data from a broad spectrum of health care facilities and pathology laboratories. The VTR provides access to rare cancers and data on exceptional and long-term outcomes. Annotation exists for clinical and demographic data, with the potential for custom annotation. SEER plans to include pathology images in the VTR in the future. The VTR is a renewable resource, with more than 450,000 cases collected each year. Researchers can search the VTR patient data set and then request tissue on selected patients.
SEER funded seven registries to conduct a pilot study of exceptional survivors of pancreatic and node-negative breast cancers using the VTR. The purpose of the pilot study is to understand best practices, estimate the cost of a SEER-wide system, assess specimen availability, and understand differences in human subjects/consent requirements at different registries. When completed, the pilot will provide a well-annotated set of cancers with unusual responses for researchers.
The VPR also generated interest from participants. Dr. Penberthy explained that the VPR was developed because of the lack of a nationwide cancer registry and the limited exchange between registries. The VPR will permit linkages of patients from different types of studies, including cohort studies, to all registries across the United States, while maintaining patient identifiers behind registry firewalls. Approved investigators, however, will be able to access data. The goals for the VPR are to implement automated linkage through an honest broker; create a central Institutional Review Board (IRB) or IRB template; and implement rapid return of patient information on cancers, survival, cause of death, treatment, and other variables from high-quality registry databases.
The VPR is being tested through linkages with the Camp LeJeune Cohort and the Rad Tech study data sets. These pilot studies have shown increased case ascertainment and reduced manual review of data by registries. Data completeness improved for six items. The Camp LeJeune Cohort was linked to 45 cancer registries. The Rad Tech study revealed declining response rates, so SEER is seeking alternate cost-efficient methods to improve response.
Participants asked when the VPR would be ready for use by investigators. SEER will need to fund registries to support this effort. Processes currently are being developing and SEER is examining ways to recruit and support registries. The goal is to simplify and standardize financing at all sites. A soft launch is planned for early 2018, and a California registry expects to have the VPR operational within that year. NCI support for VPR infrastructure will be needed for the next 7 years.
Cancer registries in several states have volunteered to participate in the VPR. IMS developed LinkPro software to link information without a SSN for the VPR. Other variables will be used to find patients.
The process of making individual-level data available through the VPR will be complicated. SEER is using the Rad Tech pilot to understand IRB requirements. Half of the participating registries will accept a central IRB and most will accept an IRB template. A problem is that many states are increasing requirements for data release, particularly California. A California SEER registry PI is working on Common Rule changes that might force California to adopt a single IRB. NCI is dedicated to creating a single IRB for VPR, but a central IRB is a major culture change for registries.
In California, researchers need to explain clearly what they will do with the data in their data request applications. Agreements have been developed with the North American Association of Central Cancer Registries and SEER for text that can be used in all data requests.
With regard to the NPL project with the DOE, organizations in Belgium, Scandinavia, and the United Kingdom are working on NPL algorithms and are interested in collaboration. The DOE laboratories have worked extensively on machine learning and NPL and have some of the largest computers in the world.
Participants noted that a Cancer Research Network is working with FlatIron and the U.S. Food and Drug Administration (FDA) to determine how best to capture recurrence data from electronic medical records. Dr. Penberthy expressed interest in attending a meeting with FlatIron and the FDA.
A participant emphasized the need to conduct many validation studies to understand problems with data from different new sources such as pharmacies.
Methodologic Insights into Cohort Consortium Studies (continued)
Facilitator (Day 2): Lynne Wilkens
Cohort Consortium Data for the Development, Validation, and Estimation of the Utility of Risk Models for Disease Prevention
Mitchell H. Gail
Dr. Mitchell Gail discussed the use of Cohort Consortium data to develop and test risk models for disease prevention. He is well known for his methodological work and risk modeling. He discussed use of data from multiple cohorts to help develop, validate and assess the utility of risk models for: weighing risks and benefits of an intervention; and estimating the reduction in disease risk from reducing exposure to modifiable risk factors.
Absolute risk is the chance that an individual will develop a disease (we consider breast cancer) by a certain age. In order to calculate absolute risk, one needs to know: the relative risks (RRs) associated with risk factors, the baseline hazard (age-specific incidence rate with a baseline level of all risk factors), and the hazard of competing mortality from non-breast cancer causes. Cohort data can be used to estimate RR as well as the baseline hazard and hazards of competing risks. Thus, cohort data can be used to calculate absolute risk. Nested case-control studies within a cohort (if it is known who developed the disease and death from other causes n the cohort), case-cohort studies, and population-based case control studies (together with registry data) also can be used to estimate RR, baseline hazards, and hazards of competing mortality. Dr. Gail noted that registry data may improve risk calculations as opposed to relying on cohort data alone, because registry disease rates are stable and representative of the general population. To convert registry rates into baseline hazard rates, one multiplies by one minus the attributable risk (AR), which can be estimated from cohort or from case-control data.
Having access to data from multiple cohorts allows one to assess heterogeneity of estimates of RR, that may arise from differences in joint distributions of risk factors, factors included in the analysis model, distributions of unmeasured confounders, measurement error, and types/amounts of missing data. Multiple cohort data also allow one to assess heterogeneity in baseline hazard rates that may arise because of differences in methods and completeness of follow-up, factors missing from the model, and variation in (1-AR) from variation in risk factor distributions. Challenges to combining Consortium data include differences in the sets of risk factors measured, different definitions of risk factors, missing data, different surveillance procedures, and the fact that data shared by some studies is limited to summary statistics.
Multiple cohorts offer significant advantages in model development when data can be combined. For example, the precision of RR estimates is improved, cohort-specific biases can be identified, and outlier cohorts removed. Multiple cohort data also can provide increased precision in estimates of baseline hazards and allow investigators to assess variation when registry data are not used. When registry data are used, this estimation technique allows investigators to assess the impact of variation in (1-AR). Overall, data from multiple cohorts provide a more robust model.
When individual-level data are available, RR can be calculated for multiple cohorts by fitting the same multivariate model to all cohorts and making adaptations for missing covariates. When individual-level data are not available but all cohorts used the same joint risk model, a standard multivariate meta-analysis can be performed. The GMeta method can be used if different joint models are used for different cohorts.
Dr. Gail discussed the iCARE-BPC3 and iCARE-Lit models that use registry and risk factor survey data to obtain baseline hazards. Once a risk model is developed, it is important to validate it in independent cohort data. The first step is to test calibration, namely how well the model predicts the number of outcomes, overall and in subsets. The next step is to determine whether the model discriminates cases from non-cases well in the validation data. Finally, it must be determined whether the RR features of the model are in line with the RRs in the validation cohort. A single validation cohort might be unusual, which can yield a misleading validation result. Using multiple cohorts helps to overcome this problem. Discriminatory accuracy and RR depend on both the models and the distribution of risk factors in the population. Therefore, investigators should assess the consistency of RR across multiple cohorts.
Data from multiple cohorts can improve assessments of risks and benefits of an intervention. Previous studies used absolute risk models for the risk of breast cancer to decide who might benefit most from an intervention like tamoxifen to prevent breast cancer. One potential for consortium data is to develop models for other outcomes affected by tamoxifen, such as stroke. The risk-benefit assessment can be improved by having good risk models for the various outcomes affected by the intervention.
Data from multiple cohorts are also useful for assessing the risk reduction from reducing exposure to modifiable risk factors. Such data allow one to assess the variation in distributions of both modifiable and nonmodifiable risk factors, which impacts the estimated reduction in absolute risk from eliminating modifiable risk factors.
Participants asked when they should modify a model if they find differences between observed and expected results when validating the model. Dr. Gail advised that, if the model is reasonably consistent with validation data, one can recommend it, but investigators have an opportunity to use Consortium cohort data for more definitive validation. If the model is severely discordant with the initial validation cohort, it should not be made public until more validation evidence is accumulated from other countervailing cohorts.
Participants also asked how to assess heterogeneity across studies. Dr. Gail responded that variability must be assessed to determine how much certainty to place on a model. Knowing the beta and standard error is not enough. To obtain realistic confidence bands that take into account variation across studies, multiple cohorts are needed to estimate systematic variation across studies reflecting different distributions of risk factors and other factors.
The question of handling missing data in the context of multiple cohorts was raised. Different distributions of a variable with a lot of missing data make this already complex process more complex. Models can be developed for missing covariates, but they will depend on certain assumptions, such as the sample being representative of another sample with complete data. Investigators can test the sensitivity of results with different imputation approaches. The BPC3 Working Group is working on the problem of missing data for many factors. It is possible to impute missing components of a linear combination of the risk factors instead of imputing all missing values, thereby ameliorating some of the problems of high dimensionality.
Participants discussed the calibration of models to registry data. National survey data generally are more representative. The data source will depend on the intended application of the model and the target population. If the general U.S. population is the target, investigators will need information on the distribution of risk factors of interest in the U.S. population, and a national survey usually is a good reference. If investigators are interested in the performance of special populations, they need information on the risk distribution in that population. One problem is that even National surveys do not provide information on the joint distribution of all risk factors in a model. Thus, investigators might need to piece together data from different surveys to obtain the needed joint probability distribution.
Session IV: Cohort Consortium Project / Working Group ReportsView Session IV Details
Three Cohort Consortium Working Groups that oversee long-running studies delivered presentations on their research activities. One of these studies began before the Cohort Consortium existed. Presenters were asked to discuss lessons learned on what works and challenges encountered.
Liver Cancer Pooling Project
Dr. Katherine McGlynn discussed the Liver Cancer Pooling Project, which began as a consortium of U.S. cohorts in 2009. U.S. liver cancer rates have been increasing for both sexes and most racial/ethnic groups since the 1980s. This has been suggested to be due to the increased incidence of infection with the hepatitis C virus (HCV). Investigators wondered whether obesity and diabetes might also play a role in the increasing rates of liver cancer. HCV and hepatitis B viral infection have strong associations with liver cancer, but more attributable risk is due to metabolic disorders. In 2009, the relationship of metabolic disorders to liver cancer in the U.S. was still not clear.
The Liver Cancer Pooling Project combined data from 14 large studies to examine possible causes of liver cancer. Some of these studies collected serum samples, as well. The Project has generated a wide range of findings and publications. Diabetes, higher body mass index and waist circumference were found to be associated with risk of hepatocellular carcinoma, the most common type of liver cancer. Among women, reproductive factors and exogenous hormone use were examined in relation to l risk. In contrast to earlier reports, liver cancer was not associated with oral contraceptive use. Other analysis found that coffee intake was associated with decreased risk of liver cancer in both sexes and, similarly, aspirin use was associated with decreased risk. Other analyses have found that tobacco is associated with liver cancer, as is high levels of alcohol use. Moderate alcohol use, however, appeared to be protective. Papers are under development that report results for intrahepatic cholangiocarcinoma, the second most common type of liver cancer.
A future direction for the Liver Cancer Pooling Project is studies of the metabolome and gut microbiome. Bacterial translocation products could increase inflammation and DNA damage in the liver due to exposure via the portal circulation. The Liver Cancer Pooling Project proposes to collaborate with NCI’s Frederick National Laboratory, a laboratory at Georgia State University, and Metabolon, Inc. to investigate these issues. The Project is seeking more cohorts, particularly those with serum samples, to examine the relation of the metabolome and microbiome to liver cancer.
A new proposed collaboration with the investigators with the Nurses’ Health Study proposes to partner with the Pooling Project of Prospective Studies of Diet and Cancer Pooling Project (DCPP). This collaboration would examine dietary factors, including measures of single nutrients and composite measures that influence insulin and inflammatory response. The study also would examine the association between sleep duration and liver cancer. In the future, Project investigators would like to collaborate on meta-analyses with European and Asian cohorts.
Some historic challenges for the Liver Cancer Pooling Project include the Materials Transfer and Data Transfer agreements process, obtaining serum samples because some studies allow only one assay per proposal, and the fact that some important assays are not hypothesis-testing assays. Data harmonization is another challenge.
Dr. McGlynn welcomes new cohorts in the Liver Cancer Pooling Project. She also welcomes investigators interested in leading studies and writing publications, including new investigators. Analyses would need to be performed in-house because of DUAs. Dr. McGlynn noted that she and her colleagues would consider ways to provide access to Project data for analysis by external researchers.
In response to a question about microbiome studies, Dr. McGlynn responded that the Liver Cancer Pooling Project appreciates that most existing cohorts don’t have stored fecal samples, and thus is focused on conducting studies of metabolomics and bacterial translocation.
In response to a participant question, Dr. McGlynn indicated that China is the only place in the world where liver cancer rates are declining. One participant working with a cohort in Scandinavia expressed interest in participating in the Liver Cancer Pooling Project. Dr. McGlynn is willing to discuss adding a European cohort. Expansion is important because obesity and diabetes rates in most countries are increasing. Cohorts do not need to have serum samples, although serum or plasma samples would be helpful.
In response to another query, Dr. McGlynn indicated that the Project might accept small cohorts representing rare diseases.
Pooling Project of Prospective Studies of Diet and Cancer
Stephanie Smith-Warner and Jeanine Genkinger
Drs. Stephanie Smith-Warner and Jeanine Genkinger discussed the Pooling Project of Prospective Studies of Diet and Cancer (DCPP). This international consortium was established in 1991 to evaluate associations between dietary and anthropometric factors and several cancers. During their presentation, they highlighted their approach, methods, and results from the DCPP. The first DCPP project examined the association between dietary fat and breast cancer and obtained null results.
The DCPP now has >35 cohort studies in the United States, Europe, Asia, and Australia. The DCPP has examined or is currently examining associations with breast, colorectal, kidney, lung, ovarian, pancreatic, prostate, thyroid, and upper aerodigestive tract cancers and non-Hodgkin lymphoma. Data are now being collected from participating cohorts on non-dietary exposures and non-cancer outcomes (e.g., cardiovascular disease). Plans are also being discussed to conduct methodologic studies in the consortium.
For each project, the DCPP requests original data from cohorts rather than data in pre-specified formats. Data are stored at a secure central data repository. Data have been harmonized for about 40 nutrients, the foods on the food frequency questionnaire, and about 40 covariates.
For each new project, investigators are invited to opt in or out of that project. Up to two investigators from each cohort are usually included as co-authors on manuscripts.
Dr. Jeanine Genkinger discussed how the DCPP has been involved in several projects, one of which focused on alcohol intake and breast cancer. The study found that breast cancer risk increased with alcohol intake; this work has been cited over 1,000 times. Recently, the consortium was funded to examine alcohol intake and risk of other cancers in 36 cohorts.
Multiple anthropometry projects also have been conducted or are underway. One study examined the associations between BMI, waist circumference, and pancreatic cancer, including BMI in young adults. Cancer risk and sustained weight loss, weight cycling, and weight gain in adulthood also are being studied. When comparing sustained weight lost to stable weight over time, we found that breast cancer risk decreased.
DCPP investigators faced some challenges, primarily related to execution of data use agreements and data harmonization.
Other cohort consortia investigators wanted updated exposure data. It is difficult to convince reviewers this is possible. Participants recommended that DCPP investigators publish a paper demonstrating the feasibility of updating exposure data. Project investigators have drafted text about overcoming the challenges of data harmonization. They can demonstrate that data harmonization in this area is feasible, but complicated and time consuming.
Participants asked about the possibility of the DCPP sharing dietary variables with some Cohort Consortium working groups. This might help other cohorts that have had difficulty harmonizing dietary data. The DCPP has explored this issue with liver cancer and OC3 working groups. Dr. Smith-Warner agreed to collaborate with other cohorts to reduce redundancy in this area. Drs. Smith-Warner and Genkinger indicated that macros for harmonizing data are being developed for the NIH Cohort Metadata Repository.
Biomarkers and Breast Cancer Risk Prediction
Dr. Anne Zeleniuch-Jacquotte discussed biomarkers and breast cancer risk prediction in a cohort of younger women. Investigators decided to develop a breast cancer risk prediction model because guidelines have changed and women are left to decide whether to start screening at age 40 or 50. Information about their individual risk for breast cancer would facilitate this decision.
Investigators are investigating biomarkers of breast cancer risk in 10 cohorts. They are focusing on two biomarkers, anti-mullerian hormone (AMH) and testosterone. They found cancer risk increased with higher AMH levels, but the association was weaker than expected. Testosterone also was associated with cancer risk, but the association was not strong. The inclusion of both hormones modestly improved the Gail risk model. Measurement of these two biomarkers might help younger women decide when to begin breast cancer screening.
Dr. Zeleniuch-Jacquotte noted several challenges associated with multicohort studies. There are many sources of variability in biomarker concentration and heterogeneity in assays. For example, investigators found significant differences in AMH concentrations across cohorts. This problem might be corrected with proper calibration. An AMH assay calibration study demonstrated good correlation.
Risk prediction is another challenge. Investigators used a nested case control design because of its efficiency in estimating the association and hazard ratio. There is a need to assess not only discriminatory accuracy but also calibration of the risk assessment model. Inverse probability weighting can be used when biomarkers are available for cases but matched controls, rather than on the full cohort. However, this is problematic when fine matching is used.
Funding always is a challenge because biomarker measurements are expensive. Proposals usually need to include a large number of biomarkers. Dr. Zeleniuch-Jacquotte indicated that she does not plan to keep the working group active because there are no known biomarkers strongly related to risk in younger women.
A Steering Committee member emphasized the need to end working groups when they have accomplished their goals. The Steering Committee will reach out to working group leaders to determine their status.
Session V: Poster Session
Participants took a break to attend the poster sessions, after which poster awards were announced. Dr. Alison Van Dyke's and Emma McGee's presentation, "Highlights of Findings from the Biliary Tract Cancers Pooling Project (BiTCaPP)," won the Best Student/Fellow Award. Dr. James Lacey won the Best Overall Poster Award for his presentation, "Connected to Collected: A Fully Integrated CED Survey Approach."
Session VI: New Concept Proposals and Next StepsView Session VI Details
Moderators: Dr. Anthony Swerdlow and Dr. Roger Milne
Participants made the following recommendations for future Cohort Consortium meetings:
- Continue to have working group presentations in the future. The number of presentations was appropriate. Participants want series of working group reports in future plenary sessions.
- Participants want a few long presentations and many short presentations at meetings. A participant suggested fewer long presentations on completed studies and some short presentations on proposed studies.
- Participants agreed that lessons learned from successful studies were useful. Most participants also would like to hear about lessons learned from unsuccessful studies. Only a few participants wanted to have primarily traditional science presentations.
- Participants suggested that the Steering Committee evaluate which projects have a lot of support and which do not. They should consider how to carry forward the projects with less support.
The World Café discussion generated several ideas that the Steering Committee needs to consider. Dr. Susan Gapstur agreed to lead the development of a communication and strategic plan based on World Café feedback. Participants might eventually be asked to choose the five highest priority areas among the many discussed. Participants suggested a presentation on the strategic plan and the process for developing it at next year’s Cohort Consortium meeting.
New Concept Proposals
Discussion during the World Café frequently focused on the use of tumor tissue. New technology allows simultaneous profiling of multiple factors, but obtaining tumor tissue is a key barrier. Cohorts in the Consortium are not very successful in this area. Part of the problem is that some states allow tissue to be disposed of after a certain period. Large hospitals also are not very willing to share tissue because they want it for their own research. Alternatives should be considered, such as creating tissue microarrays (TMAs) from tissue cores. Participants asked what NCI could do to motivate Cancer Centers to share tissue.
It is especially difficult to obtain consent to obtain tumor blocks for fatal cancers, but cohorts need representative tissue samples for these cancers. Rapid improvements in RNA expression methods suggest that cores might be sufficient for extracting RNA and DNA.
It is not clear if the Cohort Consortium has enough blocks for each tumor across cohort studies. NCI could target money for expanding tumor block collection if the Consortium has insufficient tissue, including collection of less common tumors and subtypes of common tumors. Participants asked that NCI investigate the status of tumor tissue across Consortium cohorts and develop a report.
Many studies outside the United States obtain high proportions of tissue for study subjects. Cohort studies outside the United States could inform tissue collection in this country. SEER access to tissue blocks would be helpful. Consortium investigators need to know the proportion of tissue collection for different studies. Samples have greater translatable value in cohort studies. Participants noted that Kaiser has helped cohorts obtain tissue. They might work with Kaiser to encourage other medical systems to share tissue.
Cohort Consortium investigators also need to consider the best methods for collecting and preserving biospecimens so that future research is not compromised. The Steering Committee might consider the best way to use resources to support biospecimen collection, and possibly reactivate the Tumor Tissue Working Group. NCI staff agreed to investigate reinstating this working group.
If you have questions about the meeting, contact Nonye Harvey, M.P.H. at the National Cancer Institute.