Think Tank on Metabolomics and Prospective Cohorts: How to Leverage Resources

Think Tank on Metabolomics and Prospective Cohorts: How to Leverage Resources


The "Think Tank on Metabolomics and Prospective Cohorts: How to Leverage Resources," identified resources that can be used collaboratively across prospective cohorts; developed strategies to leverage resources for advancing the use of metabolomics in prospective cohort studies; identified the best strategies for performing analyses using metabolomics data across multiple studies; and, established a collaborative group, the COnsortium of METabolomics Studies (COMETS), that will identify and tackle research projects that cannot be effectively investigated by one independent group.


View Agenda for Tuesday, October 28, 2014

Time Topic
11:00 a.m. - 11:30 p.m. Registration
11:30 p.m. - 11:40 a.m. Welcome and Introductions
Deborah Winn, Ph.D.
National Cancer Institute
11:40 a.m. - 11:55 p.m. Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO)
Steven Moore, Ph.D., M.P.H. National Cancer Institute
11:55 p.m. - 12:10 p.m. Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC)
Demetrius Albanes, M.D. National Cancer Institute
12:10 p.m. - 12:25 p.m. Cancer Prevention Study – II (CPS-II)
Victoria Stevens, Ph.D.
American Cancer Society
12:25 p.m. - 12:40 p.m. Nurses' Health Study, Health Professional Follow-up Study, Physician's Health Study, Women's Health Initiative
Brian Wolpin, M.D., M.P.H.
Dana-Farber Cancer Institute
12:40 p.m. - 12:55 p.m.

ColoCare, Women's Health Initiative
Nina Habermann, Ph.D.
German Cancer Research Center

Neli Ulrich, Ph.D.
Huntsman Cancer Institute, University of Utah

12:55 p.m. - 1:10 p.m. Health Professional Follow-up Study (Prostate Cancer)
Kathryn Wilson, Sc.D.
Harvard School of Public Health
1:10 p.m. - 2:00 p.m. Lunch
2:00 p.m.-2:15 p.m. Shanghai Women's Health Study, Shanghai Men's Health Study
Xiao-ou Shu, M.D., Ph.D.
Vanderbilt University
2:15 p.m. - 2:30 p.m. Coronary Artery Disease Screening Trial and The Environmental Determinants of Diabetes in the Young
Tomas Cajka, Ph.D.
University of California, Davis Genome Center
2:30 p.m. - 2:45 p.m. Atherosclerosis Risk in Communities Study
Eric Boerwinkle, Ph.D.
University of Texas Health Science Center at Houston
2:45 p.m. - 3:00 p.m. Framingham Heart Study
Robert Gerszten, M.D.
Massachusetts General Hospital/Harvard Medical School
3:00 p.m. - 3:15 p.m. Catheterization Genetics and Genetics of Early Onset Cardiovascular Disease
Svati Shah, M.D., M.H.S.
Duke Molecular Physiology Institute
3:15 p.m. - 3:30 p.m. Women's Health Initiative (Cardiovascular Disease)
Kathryn Rexrode, M.D., M.P.H.
Brigham and Women's Hospital/Harvard Medical School
3:30 p.m. - 3:45 p.m. Multi-Ethnic Study of Atherosclerosis and COMBI-BIO
David Herrington, M.D.
Wake Forest School of Medicine
3:45 p.m. - 4:00 p.m. Health, Aging, and Body Composition Study
Tamara Harris, M.D., M.S.
National Institute of Aging
4:00 p.m. - 4:25 p.m. Break
4:25 p.m. - 4:40 p.m. TwinsUK
Cristina Menni, Ph.D.
King's College London
4:40 p.m. - 4:55 p.m.

Diabetes Prevention Program Outcomes Study
William Knowler, M.D., Ph.D., M.P.H.
National Institute of Diabetes and Digestive and Kidney Diseases

Yong Ma, Ph.D.
George Washington University

4:55 p.m. - 5:10 p.m. University College London, London School, Edinburgh and Bristol Consortium
Juan Pablo-Casas, M.D., Ph.D.
London School of Hygiene and Tropical Medicine/University of College London
5:10 p.m. - 5:30 p.m. Data Sharing: Definitions and Concepts
Deborah Winn, Ph.D.
National Cancer Institute
5:30 p.m. - 5:40 p.m. Summary
Joshua Sampson, Ph.D.
National Cancer Institute
5:40 p.m. - 6:30 p.m. Discussion
7:30 p.m. Group Dinner

View Agenda for Wednesday, October 29, 2014

Time Topic
8:00 a.m. - 8:05 a.m. Introduction
Joshua Sampson, Ph.D.
National Cancer Institute
8:05 a.m. - 8:25 a.m. LC-MS-Based Metabolic Profiling of Human Cohorts
Clary Clish, Ph.D.
Broad Institute of MIT and Harvard
8:25 a.m. - 8:45 a.m. Metabolomics and the Measurement of the Exposome in Epidemiological Studies
Augustin Scalbert, Ph.D.
International Agency for Research on Cancer
8:45 a.m. - 9:05 a.m. Quantitative Serum NMR Metabolomics Platform for Large-Scale Epidemiology and Genetics
Peter Würtz, Ph.D.
University of Oulu
9:05 a.m. - 9:15 a.m. Introduction/Objectives for the Day
Krista Zanetti, Ph.D., R.D., M.P.H.
National Cancer Institute
9:15 a.m. - 10:00 a.m. Breakout Groups
10:00 a.m. - 10:10 a.m. Break
10:10 a.m. - 10:40 a.m. Reports from Breakout Groups
10:40 a.m. - 11:00 a.m. Identify Key Collaboration Objectives
11:00 a.m. - 12:00 p.m. Discussion of Research Priorities: Identify Short- and Long-Term Goals
12:00 p.m. - 12:45 p.m. Lunch
12:45 p.m. - 1:00 p.m. Consortia Structures
Krista Zanetti, Ph.D., R.D., M.P.H.
National Cancer Institute
1:00 p.m. - 1:45 p.m. Discussion of Collaboration Structure
1:45 p.m. - 2:00 p.m. Next Steps and Wrap-up

Meeting Summary

Welcome and Introductions

Speaker: Deborah Winn, Ph.D., Deputy Director, Division of Cancer Control and Population Sciences, National Cancer Institute (NCI)

Dr. Winn welcomed the participants to this discussion on using prospective cohort studies to understand the development of disease. The meeting resulted from a recognition of both the need to work together to address challenges in the metabolomics field and of the many opportunities for collaboration in studying cancer and other diseases. She noted that Dr. Krista Zanetti designed the agenda to allow participants to learn about each other's metabolomics research. Following the speakers' presentations, the group discussed research priorities and identified opportunities for collaboration.

Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO)

Speaker: Steven Moore, Ph.D., Division of Cancer Epidemiology and Genetics, NCI

Dr. Moore discussed the PLCO, a multicenter trial designed to test the efficacy of screening methods. The trial enrolled 155,000 men and women aged 55–74 years, 77,000 of which were randomized to a screening arm. Biospecimens included serum, heparin-treated plasma, and EDTA-treated plasma. Metabolomic studies were conducted in three nested case-control studies for colorectal, lower esophageal, and breast cancer. PLCO data of interest included outcomes (e.g., deaths, heart disease, diabetes, cancer) and exposures (e.g., smoking, alcohol, diet). Results showed that colorectal cancer was not associated with any endogenous metabolites, but smoking was strongly associated with hydroxycotinine, which reflects how tobacco is metabolized. The analysis identified 46 metabolites associated with diet and 36 metabolites associated with body mass index (BMI).

For purposes of collaboration, Dr. Moore's research interests include physical activity, obesity, and cancer associations; biological mediators of these associations; and modeling determinants of weight gain and life expectancy. Large sample sizes are needed to understand the mediators between biomarkers and health outcomes. Research topics for collaborative groups could include mortality, genome-wide association studies (GWAS), and weight gain.

Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC)

Speaker: Demetrius Albanes, M.D., Division of Cancer Epidemiology and Genetics, NCI

Dr. Albanes described the ATBC study conducted in southwestern Finland. The trial started as a vitamin supplement trial in which 29,133 men were randomized to supplementation with alpha-tocopherol (vitamin E, 50 mg/day) or beta-carotene (20 mg/day), or both vitamins (vs. placebo) for 5–8 years. The study found a higher incidence of lung cancer in the men receiving beta-carotene supplementation. In contrast, vitamin E supplementation reduced prostate cancer incidence and mortality by 32–40 percent. The study continued as a large prospective cohort, based on the baseline questionnaires and biospecimens (e.g., fasting serum, germline DNA, toenails), follow-up endpoints (e.g., cancers, mortality), and other data. Two metabolite profiling studies have been completed and published to date. The first studied the response to the trial supplementation vitamins based on a total of 200 men, 50 from each of the four intervention groups, and baseline and on-trial fasting serum. With respect to beta-carotene supplementation, xenobiotic metabolites were overrepresented among the top signals. The serum metabolite profile of prostate cancer risk in 74 cases and 74 controls with 20 years of follow-up found inverse associations for several lipids as well as the TCA cycle metabolite alpha-ketoglutarate, suggesting lipid and energy dysregulation is involved in the etiology of prostate cancer.

The research questions of interest include identifying biomarkers of cancer risk, discovering early detection biomarkers, and metabolic profiling of nutritional status and nutrition-related exposures. The group as a whole will need to address issues related to platform comparability, the roles of fasting status and sample storage, and planning pooled/meta-analyzed cancer-specific studies that can evaluate risk differences by cancer stage and aggressiveness and prospectively represent the evolving cancer metabolome.

Cancer Prevention Study-II (CPS-II)

Speaker: Victoria Stevens, Ph.D., American Cancer Society

The Cancer Prevention Studies (CPS) are a group of large cohort studies conducted by the American Cancer Society (ACS). CPS-I, the Hammond-Horn study, was instrumental in establishing the link between smoking and lung cancer. CPS-II began in 1982 with 1.2 million Americans. The cohort consisted of mostly white middle- to upper-middle-class Americans who were followed for cancer mortality. A nutrition subcohort was introduced in1992 and followed for cancer incidence. Individuals in this cohort completed questionnaires about cancer diagnosis, exposures, cardiovascular disease (CVD), and other outcomes. From 1998 to 2001, blood was collected from 40,000 members of this cohort. When blood could not be collected, buccal cells were used. Information about BMI and waist circumference also was collected. ACS is particularly interested in smoking and lung cancer, physical activity, and obesity. In coming years, ACS will focus on issues of energy balance and cancer in the elderly population.

Leveraging Metabolomics to Understand Early Pancreatic Cancer

Speaker: Brian Wolpin, M.D., M.P.H., Dana-Farber Cancer Institute, Harvard Medical School

Dr. Wolpin described data on pancreatic cancer generated by a collaboration of four cohort studies: the Health Professionals Follow-up Study (HPFS), the Nurses' Health Study (NHS), the Physicians' Health Study (PHS), and the Women's Health Initiative-Observational Study (WHI-OS). Pancreatic cancer is the fourth leading cause of cancer death, and by 2020 is expected to be the second leading cause. Many patients present with late-stage pancreatic cancer, after it has spread. Thus, the need to develop screening and prevention strategies is urgent. Pancreatic cancer is known to be accompanied by altered systemic metabolism. Biospecimens were collected from individuals before they presented with pancreatic cancer. Blood was collected for profiling of 150 non-lipid metabolites and approximately 100 lipid metabolites. The three metabolites found to be associated with pancreatic cancer were the branched chain amino acids (BCAAs) isoleucine, leucine, and valine. This association remained even after controlling for BMI. To understand the mechanisms that account for this observation, the team used mouse models of pancreatic cancer to determine that the BCAAs were the result of tissue breakdown. Elevated BCAAs were observed before the mice developed glucose intolerance. Metabolomics could help investigators understand why pancreatic cancer patients develop insulin resistance.

Shanghai Women's Health Study, Shanghai Men's Health Study

Speaker: Xiao-Ou Shu, M.D., Ph.D., Professor of Epidemiology and Associate Director for Global Health, Vanderbilt University

The Shanghai Women's Health Study (SWHS) was initiated in 1996 as a collaboration between Vanderbilt University, the Shanghai Cancer Institute, and NCI. The initial aims were to recruit 75,000 Chinese women for long-term epidemiologic research on cancer and other noncommunicable diseases. The study participants provided biologic samples and completed questionnaires. The Shanghai Men's Health Study (SMHS), led by the Vanderbilt University and Shanghai Cancer Institute, was initiated in 2002 and recruited 61, 491 men; approximately 90% participants provided biological samples. Metabolomic studies of the cohorts identified 14 metabolites that were associated with prevalent diabetes and 15 metabolites associated with incident diabetes, including several metabolites reported previously in the literature and some potential novel biomarkers. These results coupled with the intrapersonal variation of metabolites data from these two cohorts suggest that metabolomics is a sensitive and robust tool for epidemiologic studies, and a metabolomic profile can provide valuable information for disease etiology and progression. The research team for these studies is interested in biomarkers of exposures, diet, and physical activity; discovering novel biomarkers for early cancer diagnosis, physical fitness, and biologic age; and understanding the biologic mechanisms of cancers that are related to metabolic conditions.

Health Professionals Follow-up Study (HPFS): Prostate Cancer Metabolomics Project

Speaker: Kathryn Wilson, Sc.D., Harvard School of Public Health

Dr. Wilson discussed a nested case-control study of advanced prostate cancer that was part of the HPFS (a study of more than 50,000 men). The tumor repository includes tumor tissue from 3,000 men and blood samples from 18,000 men. Metabolomics data were collected at the Clish laboratory at the Broad Institute of MIT and Harvard. Four platforms were used to measure different metabolites (e.g., amino acids, lipids, carbohydrates and purine/pyrimidine derivatives, and fatty acids). Pilot studies showed that assays were reproducible even in blinded conditions, that more than 75 percent of metabolites were stable during delayed processing, and that 90 percent were reproducible over 1–2 years. The biorepository developed specific guidelines for processing samples. The aims of the project were to: use tumor messenger RNA (mRNA) profiling data to define the metabolic pathways that are altered in lethal prostate cancer, examine whether higher levels of specific circulating metabolites are associated with increased lethal disease, and determine the metabolomic profile associated with obesity by comparing extreme groups of BMI. Other questions of interest included using metabolomics to identify biomarkers of diet and other lifestyle data, and to identify prediagnosis blood biomarkers and tumor biomarkers.

Coronary Artery Disease Screening Trial and the Environmental Determinants of Diabetes in the Young (TEDDY)

Speaker: Tomas Cajka, Ph.D., University of California, Davis, Genome Center

Dr. Cajka introduced two of his current projects: the TEDDY project (~13,000 samples), which seeks to understand the environmental determinants of type 1 diabetes in the young, and the National Institutes of Health (NIH)-P20 project (~1,700 samples), which seeks to understand the metabolic pathways associated with atherosclerosis and CVD risk factors. Sample preparation for metabolomic analysis included a combined extraction of amphiphilic and lipophilic metabolites. Only 20–30 microliters of plasma are needed to conduct a complete metabolomic analysis using four different platforms: GC-MS for primary metabolites, LC-MS for comprehensive lipid analysis in positive and negative ion mode, and HILIC-MS for polar and non-GC-amenable metabolites. These platforms increase metabolite coverage but also increase the number of reported unknowns. Multiple internal standards are used to normalize the data. Lipid identification can be accomplished by comparing MS/MS spectra to freely available MS/MS library LipidBlast. Untargeted data processing found unexpected compounds associated with drugs and food. The metabolomic analysis also includes data normalization using locally estimated scatterplot smoothing (LOESS). The group is interested in large-scale metabolomics studies as well as visualizing and interpreting data using network mapping.

Atherosclerosis Risk in Communities Study (ARIC)

Speaker: Eric Boerwinkle, Ph.D., University of Texas Health Science Center at Houston

Dr. Boerwinkle introduced the ARIC study, a 27-year longitudinal study of 15,000 individuals of European American and African American descent. Information about cancer, diabetes, and CVD incidence was recorded, and serum was collected for an untargeted metabolomic analysis. Among all of the compounds analyzed, 204 metabolites were measured reliably. The metabolites were found to be useful for assessing kidney function. The kidneys control the filtration, reabsorption, secretion, and excretion of metabolites. The metabolite 5-oxoproline was found to be positively associated with chronic kidney disease. The research group also conducted a GWAS and found that a single missense mutation in an enzyme accounted for the relationship between N-acetylornithine, a metabolite, and chronic kidney disease.

Speaker: Nina Habermann, Ph.D., and Neli Ulrich, Ph.D., German Cancer Research Center (DKFZ)

Dr. Habermann introduced the ColoCare Consortium (PI: Ulrich), an international cohort study of women and men newly diagnosed with colorectal cancer (CRC) of all stages. Patients were approached at surgery for consent to collect their tumor tissue, adipose tissue, and blood and urine samples for metabolomic analyses. Patients also were asked to complete questionnaires about symptoms, quality of life, and current health habits. Finally, outcomes such as recurrence, survival, quality of life, and treatment toxicity were recorded. Metabolomic assays have been performed in urine and adipose tissue as well as some blood samples. Funding has been secured for additional metabolomics assays.

Dr. Ulrich presented remotely from Germany on a nested case-control study of the WHI-OS in which 988 women ages 55–79 with CRC and matched controls were enrolled. The investigators tested the hypothesis that metabolomic signatures can be detected in plasma as CRC develops and can serve as biomarkers for CRC. A second hypothesis was that the different stages of CRC could be discriminated by specific patterns in the plasma metabolome.

Framingham Heart Study (FHS)

Speaker: Robert Gerszten, M.D., Massachusetts General Hospital and Harvard Medical School

Dr. Gerszten introduced metabolite profiling work in the context of the FHS. The FHS is composed of three cohorts: the original cohort (followed from 1948 to the present), the offspring (followed from 1972–present), and the third generation (followed from 2002–present). Each cohort is composed of approximately 5,000 individuals. Metabolomic profiling was conducted in approximately 2,600 Framingham offspring, focused on cardiometabolic traits. Amino acids, sugars, purines and pyrimidines, and nonpolar compounds were measured. The investigators detected many metabolites with positive or negative associations with phenotypes. A GWAS of metabolites found novel associations between loci and metabolites that previously had been associated with disease. Research interests and future collaborative possibilities include method sharing (liquid chromatography-mass spectrometry [LC-MS] methods), reagent sharing, primary data sharing, and testing for causal relationships using model systems (e.g., worms, zebrafish, mice).

Catheterization Genetics (CATHGEN) and Genetics of Early Onset Cardiovascular Disease (GENECARD)

Speaker: Svati Shah, M.D., Duke Molecular Physiology Institute

The Duke CATHGEN biorepository includes samples from approximately 9,323 patients who underwent cardiac catheterization at Duke University between 2001 and 2011. Samples are linked to a large clinical database, and the data set includes more than 600 clinical variables such as demographics, coronary angiographic variables, and comorbidities. Metabolomic analysis would allow the investigators to understand the mechanisms that underlie disease phenotypes. The investigators found that acylcarnitines and other metabolites are predictive of incident cardiovascular events. A GWAS screen identified six genes, all of which pointed to endoplasmic reticulum (ER) stress, that were associated with disease. Genome methylation profiling also shows associations with ER stress. The Duke biorepositories could be useful in the burgeoning field of cardio-oncology.

Women's Health Initiative (WHI) (CVD)

Speaker: Kathryn Rexrode, M.D., M.P.H., Brigham and Women's Hospital, Harvard Medical School

CVD is the leading cause of death in U.S. women, and many risk factors are associated with altered metabolism. Metabolite profiling techniques could help to identify new markers that are associated with CVD and point to new prevention or therapy options. The specific aims of this study were: to identify metabolomic profiles associated with incident coronary heart disease (CHD), to compare changes in metabolomic profiles from baseline to year one in the hormone therapy intervention groups, and to explore the relationship between metabolomic profiles and other clinical parameters.

A subset of the WHI cohort was chosen to study the metabolomics of CHD. Women without prior cancer or CVD were randomized to estrogen and progesterone treatment or estrogen alone. In total, there were 1,190 cases and 1,190 matched controls. Metabolomic analysis will be performed in the Clish laboratory at the Broad Institute. The effect of hormone therapy on metabolomic profiles will be compared with placebo, and the degree to which changes mediate the observed risk of CHD with combined hormone therapy will be tested. Key areas of interest for this research group include metabolomic profiles associated with CHD in women, effects of lifestyle on metabolomic profiles, genomic associations with metabolomic profiles, and methodological approaches to the analysis of metabolomic data.

Multi-Ethnic Study of Atherosclerosis (MESA) and COMBI-BIO

Speaker: David Herrington, M.D., Wake Forest School of Medicine

COMBI-BIO is an international group of researchers whose goals are to: use metabolic phenotyping to discover novel biomarkers for subclinical atherosclerosis, use multiple platforms to investigate underlying biochemical pathways to advance understanding of the etiology of atherosclerosis, and develop prognostic combinatorial biomarkers to improve prediction and patient management of subclinical atherosclerosis. Contributing cohorts include MESA, the London Life Sciences Prospective Population Study, and the Rotterdam Heart Study. MESA collects unique cardiac imaging data, other "omics" data, and dietary data, among others. Dr. Herrington discussed the technology platforms used to conduct the untargeted metabolomic studies. Nuclear magnetic resonance (NMR) and MS data were normalized and adjusted for confounders. The preliminary results were summarized as a network map of relationships of interest. The speaker emphasized that the traits of interest should not be modeled linearly. Questions of interest for the collaborative group include standardization of sample preparation and assay protocols, metabolite and pathway mapping, metabolic flux in human disease, and intellectual property issues with respect to interaction with industry.

Health, Aging, and Body Composition Study (Health ABC)

Speaker: Tamara Harris, M.D., M.S., National Institute on Aging and Rachel Murphy, Ph.D., DSM Nutrition

The Health ABC study was initiated in 1996 to enhance understanding of the connection between body weight and major health problems, especially in old age. Change in body composition is a common pathway by which weight-related health conditions contribute to the risk of disease and disability. The investigators conducted repeated phenotyping for weight-related health conditions such as CVD, osteoarthritis, osteoporosis, depression, diabetes, and cancer. Body composition also was measured by anthropometry, dual energy X-ray absorptiometry, and computerized tomography (CT). One of the main goals of the study was to link dietary data to metabolomics. The study found that visceral fat differs by both gender and race across BMI groups. Metabolomic studies are under way. Research interests for potential collaboration include cross-sectional associations with CT-measured body composition, diet, and metabolic markers; longitudinal measures, including changes in weight, mortality, and function; and outcomes such as disability, mortality, and incident diseases. Metabolic differences between ethnicities are another research area of interest.


Speaker: Cristina Menni, Ph.D., King's College London

The TwinsUK registry includes 13,000 twins recruited since 1992. Eighty-five percent of the recruits are females aged 18–85. Every twin completes yearly questionnaires about his/her medical history, and a subset is invited to the hospital for sample collection. Twins share the same DNA and are a natural control for age or cohort, maternal, and multigenerational effects. Furthermore, their lifestyle and environments often are similar. The aims of this study are to estimate the genetic contribution to common conditions and traits in adults, especially with respect to diseases of aging. Metabolomics may reveal novel pathways of aging and age-associated disease. For example, a study found 42 metabolites that are associated with impaired fasting glucose levels. Blood pressure also was linked to certain alleles and metabolites. Using metabolomic profiling and genetics allowed the investigators to identify novel disease-related pathways with the potential for clinical translation, early diagnosis, and treatment stratification. Ongoing work will investigate the link between metabolites and drug response, metabolites and diet, and the role of the microbiome.

Diabetes Prevention Program Outcomes Study (DPPOS)

Speaker: William Knowler, M.D., Dr.P.H., National Institute of Diabetes and Digestive and Kidney Diseases

Dr. Knowler introduced the Diabetes Prevention Program (DPP), a multicenter randomized clinical trial in the United States. The study was intended to investigate whether type 2 diabetes can be prevented or delayed by treating modifiable risk factors. Participants (N = 3,234) were randomized to intensive lifestyle intervention, metformin, or placebo. Diabetes incidence was found to be highest in the placebo group, metformin reduced the risk of diabetes by 31 percent, and lifestyle intervention reduced the risk by 58 percent. Two metabolomic DPP substudies were conducted: the first investigated whether certain metabolites were predictors of diabetes, and the other study aimed to identify new metabolites predictive of cancer and CVD. For the diabetes study, investigators identified cases of diabetes diagnosed before the end of DPP (July 2001) and then matched controls. The cancer/CVD study also was a nested case-control study and used cancer cases that were diagnosed through July 2014. The investigators obtained 306 cases and matched controls. Outcomes of interest in the DPP study include diabetes, CVD, cancer, health care utilization and costs, and physical and cognitive function.

University College London, London School, Edinburgh and Bristol Consortium (UCLEB)

Speaker: Juan Pablo Casas, M.D., Ph.D., University College London

Dr. Casas introduced the UCLEB consortium, which included eight studies. Outcome phenotypes (e.g., CHD, cancer, diabetes, Alzheimer disease) were recorded, and NMR metabolomics were conducted to look for associations with health outcomes. New methods in metabolomics allowed agnostic discovery of associations. The data were analyzed through differential network analysis: rather than describing the relationship between a single variable and an outcome, the network describes the relationships between all variables. The group is interested in collaborating with others because the magnitude of the associations with disease events often are small to moderate and require large sample sizes. In addition, robust description of dose-response associations or any potential effect modifications will require several thousand events.

Data Sharing: Definitions and Concepts

Speaker: Deborah Winn, Ph.D., NCI

Dr. Winn opened by discussing the relationships among the key parties to a data-sharing agreement. These parties include the scientist who generated the data and shares it, the person who receives the data and uses it, the facilitator between these two parties, and the study participants from whom the data were collected. Data Access Committees determine whether the terms and agreements of a data-sharing plan are consistent with the scientist's intentions and with the informed consent that the study participants signed. Data Access Committees also assess the scientific merit of the plan proposed by the data user. Investigators can specify how they would like to be acknowledged when others use their data; for example, they could be offered the opportunity to participate in the project and potential co-authorship.


Participants discussed data access agreements in the context of the database of Genotypes and Phenotypes (dbGaP) that was developed by NIH. It was not clear how much metabolomic data is included in dbGaP. Dr. Winn noted that metabolomic data that were generated using NIH funds could be shared through dbGaP. Dr. Zanetti noted that NIH has a sharing policy for GWAS and exome sequencing data, but does not have a sharing policy for metabolomic data. Such a policy is scheduled to be in place in January 2015. The NIH Common Fund (CF) also requires investigators supported by a CF mechanism to deposit their data in a common database. However, there is no data use agreement about how to share and use these data.

A participant noted that data sharing should be considered "a carrot rather than a stick." Data sharing has spurred the science and moved it forward quickly. He noted that many investigators feel that dbGaP is cumbersome and asked whether there might be a better way to share data.

The group discussed the point that although funding bodies acknowledge the cost of generating a data set and conducting an analysis, they often forget that the time and effort involved in setting up collaborations also has a price. A related analytical issue is that it takes time and effort to harmonize the output of different studies. The existing platforms currently are "far from harmonized."

A participant noted the importance of meta-analyses that keep investigators engaged in the collaboration. Sharing approaches and interacting with other investigators is an advantage of meta-analyses that does not necessarily come from sharing data through databases.

Summary—Metabolomics Collaboration: Bigger Is Better

Speaker: Joshua Sampson, Ph.D., Biostatistics Branch, NCI

Dr. Sampson discussed the studies presented during the Metabolomics Think Tank and noted the high potential for collaboration among studies that already have amassed metabolomic data. The collection of studies represents a rich data set that includes more than 50,000 individuals and thousands of health outcomes such as death, cardiovascular events, diabetes, and cancer. The large number of cases provides good statistical power to conduct analyses. Dr. Sampson emphasized that this would be a true collaboration—no single study would contribute a disproportionate amount of data—and that he envisioned focusing on five areas: (1) meta-analysis, (2) replication, (3) detailed analyses, (4) interpretation, and (5) sharing tools.

Meta-analyses can be used to identify new metabolites associated with health outcomes. The large data set is a quick and inexpensive means to replicate findings across studies. Detailed analyses that require a larger number of individuals will benefit from pooling the data. For example, understanding of what mediates the connection between a risk factor and an outcome can be improved with larger sample sizes. Interpretation also benefits from the data from multiple studies, because one study might find that a metabolite is associated with a particular disease, but it may in fact also be associated with genes, behaviors, or diet.

Dr. Sampson raised the issue of harmonizing data from different platforms and using a common nomenclature. The human metabolome database has nonredundant metabolite IDs that could be used throughout the community. Harmonizing data across platforms is a challenge, even when standards are used, because the methods differ. Dr. Sampson noted that it would be useful to be able to access meta-data (e.g., how the data were collected) in a common database where the data themselves are shared.

One participant noted that a challenge will be to determine how much phenotype data can be uploaded, because there may be restrictions about putting such data into an open database.

A participant remarked that it would be useful to know which metabolites were measured in the different studies. In addition, it would be useful to know which metabolites have a standard associated with them, and which metabolites do not. The database should include the raw peak data. Absolute quantification, not just relative concentrations, would be extremely useful. Standards for 600 known metabolites are available for purchase at a reasonable price and could be a useful tool to help harmonize data across studies.

It was recommended that the group not set an ambitious goal of tackling all metabolomics of health and disease, but rather focus on a few specific problems that will launch the collaboration.

A participant reflected that although it is important to discuss the logistics of a collaboration, this is a human resource-limited problem. He invited the group to engage more people in using these data and generating new knowledge. There are too few postdoctoral fellows to do all of the work that needs to be done.


LC-MS-Based Metabolic Profiling of Human Cohorts

Speaker: Clary Clish, Ph.D., Broad Institute of MIT and Harvard

Dr. Clish introduced the Broad Institute's Metabolic Profiling Platform. The Clish laboratory group is composed of ten people and affiliated postdocs and fellows. The focus of the group is to develop and apply technologies for the systematic analysis of endogenous metabolites in biologic specimens. The laboratory is equipped with LC-MS instruments and also has the capability to do enzymology. Projects include investigations of diabetes, CVD, renal disease, cancer, and mitochondrial dysfunction. The laboratory seeks to collaborate with research groups in the community.

The differences in physical properties among metabolites present an analytical challenge. To address this challenge, the group uses different extraction methods and four LC-MS methods to profile polar metabolites and lipids. The lab is equipped with both triple quadrupole and high resolution orbitrap mass spectrometers. The triple quadrupole mass spectrometer is the most sensitive detector and is used to separate, fragment, and then measure specific product ions from the fragmentation process. It is used for targeted profiling of defined sets of metabolites with optimal sensitivity. In contrast to targeted profiling, nontargeted profiling measures all metabolite signals using full scan MS and enables broad coverage of both known and yet to be identified metabolites. Instruments used for nontargeted profiling tend to be less sensitive than triple quadrupole MS and the identification of unknown metabolites can be challenging. Dr. Clish's lab is equipped with five orbitrap mass spectrometers for nontargeted methods. Specialized software developed by Nonlinear Dynamics Limited has been used to process up to 2,000 nontargeted datasets at once. Another software, TraceFinder developed by ThermoScientific, is used to conduct targeted processing of hundreds of metabolites of known identity.

Metabolite coverage is dependent on biomass availability, sensitivity of dynamic range of methods, and compound stability. It is challenging to conduct reliable metabolomic analyses in large studies. Quality assessment is important as there are several points of potential failure in LC-MS. Internal standards are used during the first step of sample processing for quality assessment and to standardize measurements within batches. For larger studies, pooled reference samples are analyzed periodically in the analysis queues and are used for additional quality assessment and to standardize data across batches. The Broad methods were assessed for reproducibility in large studies. Additional limitations of LC-MS metabolomics were discussed. Methods are not optimized for any single metabolite. Furthermore, the complexity of nontargeted data sets complicates the alignment of unknown peaks across batches. Stool samples proved to be the most challenging for metabolomic analysis.

Metabolomics and the Measurement of the Exposome in Epidemiological Studies

Speaker: Augustin Scalbert, Ph.D., International Agency for Research on Cancer (IARC)

Dr. Scalbert discussed the work being done at the IARC on the blood exposome, which is defined as the totality of environmental exposures received by an individual during life. It includes endogenous metabolites, foods, drugs, and pollutants. The investigators also are interested in the food metabolome—specifically, how dietary exposures relate to cancer risk. The food metabolome is complex because compounds are transformed during absorption and metabolism. The food metabolome has been studied in acute intervention studies and also recently in cross-sectional studies. Results obtained in the European Prospective Investigation into Cancer and Nutrition (EPIC) study, a large multicentric cohort study with over 500,000 participants, were presented. Diet was recorded and urine samples were used to assess metabolic profiles using high resolution MS methods. The EPIC researchers were able to detect an association between a large number of polyphenol metabolites and various dietary exposures. Biomarkers were correlated with both short- and long-term food intake habits. This approach may also be applied to the measurement of environmental pollutants, although their detection using these global analytical approach is still uncertain because of their low concentrations in biofluids. Dr. Scalbert also presented Exposome-Explorer, a new database on biomarkers of dietary and environmental exposures that have been used in population studies. This database will be a useful tool for identifying panels of biomarkers that can be quantified through targeted metabolomic assays and implemented in future exposome-wide association studies. These analytical approaches should allow measuring a much greater fraction of the exposome than previously achieved. They will complement classical methods of exposure assessment to understand the relationship between exposures and disease risk.

Quantitative Serum NMR Metabolomics Platform for Large-Scale Epidemiology and Genetics

Speaker: Peter Würtz, Ph.D., University of Oulu

Dr. Würtz introduced the NMR platform for metabolite profiling in the context of CVD and diabetes. The dedicated and automated serum NMR metabolomics platform has applications in large cohorts, biobanks, and clinical trials. The platform is used for absolute quantitation of known metabolites and has been used to measure >250,000 blood samples so far. The platform has led to a spinoff company, Brainshake Ltd. The platform is affordable at approximately €25 per sample for 10,000 samples, because the process is so highly automated. The platform has been used to discover new biomarkers for type 2 diabetes and CVD. Dr. Würtz noted that this platform could be used to help harmonize MS data from the different studies presented at this meeting. The platform also has applications in genetics: metabolite profiling in the context of population genetics could lead to a better understanding of the genetic basis of metabolism and to assess causality of the metabolic biomarkers for disease risk. The platform also can be used to perform functional genetics and assess pleiotropy (e.g., explain interactions between genes, and understand the associations between different biomarkers).

Priority Setting


Speaker: Krista Zanetti, Ph.D., M.P.H., R.D., NCI

Dr. Zanetti summarized the main points that emerged from the presentations in the meeting thus far. Bigger is better: more samples provide more statistical power. The consortium could provide many benefits in the areas of meta-analysis, replication, detailed analyses, data interpretation, tool sharing, and pilot studies. To initiate a collaboration, it is necessary to establish uniform standards and identify research priorities. Specifically, the group should focus on research questions that can be investigated successfully in the context of a consortium and that cannot be addressed by an independent Principal Investigator (PI) working alone. These include research interests and bioinformatics methods. Once the objectives and priorities of the consortium have been defined, the group will need to determine what structure will work best to support both short- and long-term goals. To this end, three breakout group sessions were organized to brainstorm: (1) research priorities, (2) tools to facilitate data sharing, and (3) structure of the international metabolomics consortium. After the breakout sessions, the group voted for three items under each breakout session topic. This process resulted in agreement on: (1) short-term research goals, (2) long-term research goals, (3) consortium organizational structure, and (4) concrete next steps to move forward.

Reports From the Breakout Groups

Organizational Structure

Dr. Zanetti summarized the ideas that came out of the breakout sessions about the organizational structure of the consortium. Common themes included:

  • A steering committee is required for a functioning group.
  • Working groups, authorship groups, and subcommittees will carry out different projects.
  • The objectives and principles of collaboration need to be clearly defined.
  • There is a need to support junior investigators and give them leadership roles, including appropriate funding support and the opportunity for first-authorship on publications.
  • The group needs clear ground rules and transparent policies.
  • The group needs a concrete and achievable initial goal to create momentum.
  • Trans-NIH and trans-agency support could help build consortium infrastructure.
  • Multidisciplinary representation and appropriate geographical representation are necessary for a functioning group.
  • The consortium should liaise with other groups such as the CF to avoid "reinventing the wheel."
  • Communication in the group should be active, regular, and facilitated.
  • The group should agree on disseminated or central analyses.
  • Group members should have shared and mutually beneficial interests.
  • The group will require support to harmonize data.


Dr. Sampson summarized the points discussed in the breakout sessions about the tools that would be useful to facilitate data sharing. They included:

  • Meta-data (including information about how the data were processed) should be presented in a standardized format.
  • Meta-data for sample collection (e.g., fasting status of the study participants) should be included.
  • A catalog of common metabolites would be useful. Endpoints (e.g., death, diabetes, CVD, cancer) should be cataloged.
  • Multivariate statistical methods should be synthesized and compared, and new methods developed as the work progresses. An area of special interest is understanding what mediates the connection between metabolites and the endpoint(s) of interest.
  • A list of practices and protocols that currently are used would be a useful reference.

It was noted that the discussion of tools and methods should come after the group has agreed on specific research priorities.

Research Priorities

Drs. Brandi Heckman and Steve Moore reported back on the research priorities. These included:

  • Reproducibility.
  • Generalizability.
  • Replication.
  • The metabolome as a function of age, race, ethnicity, gender, and BMI. This could be studied both as a science question and as a methods question.
  • As potential long-term goals, a meta-analysis of diabetes, heart disease, or cancer might be considered "low-hanging fruit."
  • As a short-term goal, a systematic review authored by the entire team would be a way to build initial momentum in the collaboration.
  • Developmental screening of biomarkers, clinical development, and translation.
  • Methodological issues involved in pooling and how to address them.
  • Mediation analysis methods.
  • Class analysis vs. individual metabolites to reduce data set dimensionality.
  • Rare diseases.
  • Rare exposures (environmental toxins and pollutants).
  • Rare genetics.
  • Small effect sizes, especially with regard to dietary biomarkers, to identify stronger associations with heart disease.
  • A tabulation of what is measured across platforms and methods would be useful.

The group discussed the fact that, to successfully apply for an NIH grant to support this consortium, proof of principle must come first. The group needs to show that members can productively work together and publish research findings.

All members voted on three items in each list above.

Discussion of Research Priorities: Identify Short- and Long-Term Goals

Dr. Zanetti summarized the top research priorities that emerged from the vote. The following topics received the top votes:

  1. The metabolome as a function of age, race, gender, BMI
  2. Methodological analysis of pooling
  3. Meta-analysis of diabetes
  4. Mediation analysis
  5. Replication

A participant suggested focusing on the relationship between the metabolome and age. The group discussed whether it would make more sense to examine age, gender, BMI, and race in separate papers or whether they should be grouped under one publication.

Dr. Sampson asked whether group members preferred to conduct meta-analyses or share data. He asked for the perspective of participants who own data. One participant indicated that data sharing comes with a heavy administrative burden, including complex data-sharing agreements. Furthermore, there is a question about whether individual PIs have the authority to share data: data from clinical trials are stored in a coordinating center, and PIs do not always have the option of supplying it.

A participant made the point that the study of metabolomics as a function of age, race, gender, and BMI should focus on areas that cannot be addressed by individual research groups; there must be value added in working in the context of the consortium. The work could focus on small effect sizes that might point to new biologically relevant pathways. This work would be publishable, and the emphasis on small effect sizes would justify the need to work as a consortium.

Participants discussed the fact that many groups have analyzed metabolites in terms of diabetes risk, and a meta-analysis of this research could be achieved more readily as a short-term goal. The group also could conduct a meta-analysis across platforms.

Dr. Moore noted that methodological issues with data pooling would be addressed in the first study. Dr. Sampson noted that mediation analysis could be addressed in a working group, in parallel with the first effort that the group identified as a top priority. Dr. Heckman said that the meta-analysis of diabetes would be an opportunity to develop mediation analysis methods and risk predictors.

After discussion, the group agreed on the following goals:

  • Short-term goal: Examine the metabolome as a function of age, race/ethnicity, gender, and/or BMI. This objective would require data harmonization for a meta-analysis.
  • Intermediate goal: Examine the metabolome as a function of genetics.
  • Long-term goal: Perform a meta-analysis of diabetes.

The following three issues will be addressed in parallel with the goals listed above:

  • Address methodological issues.
  • Develop mediation analysis methods.
  • Examine multivariate methods.

Informatics Efforts

The group identified the following two priorities with respect to informatics methods:

  1. Short term goal: Develop a catalog of common metabolites and exposures.
  2. In parallel: Establish a collection of metadata about how the data were collected and the reliability of different platforms.

Consortia Structures: What Are Our Options?

Dr. Zanetti discussed possible options for the structure of the metabolomics consortium. The Epidemiology and Genomics Research Program supports approximately 60 consortia. Considerations for successful collaborations include: (1) a clear definition of membership, (2) a steering committee, (3) authorship policy, (4) data-sharing policy, (5) biospecimen-sharing policy, and (6) working groups and subcommittees. Dr. Zanetti provided examples of different consortia and stated that the goal of this discussion was to identify the minimal structure to keep the momentum going for this group. The collaborative structure should allow junior investigators to take leadership roles that result in first-author publications.

A participant noted that projects such as this that require significant involvement need to be sustained with funding. Dr. Zanetti explained that the initial effort must be done on a volunteer basis that will pave the way for an application for program funding. Dr. Moore suggested focusing on a small, well-defined initial project to provide proof of principle.

Dr. Zanetti stated that the effort must be led by members of the group who are willing to form a steering committee. Without leadership, the consortium will not succeed. A participant suggested that the NCI group composed of Dr. Sampson and Dr. Moore lead the effort; another participant countered that extramural investigators also should participate actively in the leadership, because a top-down model of leadership coming from NCI is unlikely to get as much buy-in as one that comes from the community.

Dr. Zanetti asked what working groups the participants envisioned. Suggested topics included: sex differences, methods, mediation (statistical analysis), diabetes, aging, cancer, genetics, and CVD. A participant suggested that, instead of developing these topical working groups, the group should focus on creating an action group committed to achieving the top priority.

Dr. Heckman recommended that, to get the action group moving, one representative from each study be nominated to serve as liaison to the leadership of their group and explain the goals of the consortium to gain buy-in and champion the consortium's proposed initial project. Many investigators present were not in a position to make executive decisions about data sharing without consulting other members of their studies. Dr. Moore suggested that the executive or steering committee be led by senior investigators, and the working or interest groups could be led by junior investigators.

One participant suggested attempting to conduct a collaborative study before putting in place the organizational structures, but Dr. Zanetti said that this would be difficult from an administrative standpoint. Dr. Heckman concurred that the group would be more likely to be successful if an agreement was determined beforehand and if buy-in from the lead PIs was earned formally from the beginning.

The group agreed on the following three most important considerations to establish a well-structured and highly functioning international metabolomics consortium:

  • Establish a steering committee that includes a representative from each cohort study.
  • Establish a core project/action group (i.e., executive committee), with members to be determined by the steering committee.
  • Establish interest/working groups to be determined by the steering committee. Suggested topics include: mediation or other statistics, methodology, aging, cancer, CVD, genetics, sex differences, diabetes.

Action Items and Adjournment

With respect to action items, Dr. Zanetti suggested that one representative from each cohort be nominated as a member of an initial committee that will meet by teleconference in November 2014 to identify the action working group. Dr. Sampson added that he would send an email to invite interested parties to participate in a statistical mediation discussion group.

Dr. Zanetti thanked Drs. Sampson and Moore for initiating the Think Tank and all of the participants for attending the meeting. She then adjourned the meeting.