Novel Approaches and Challenges to Data Harmonization: Maximizing the Use of Multi-level Data in Collaborative Studies

The Novel Approaches and Challenges to Data Harmonization Workshop, sponsored by the Epidemiology and Genomics Research Program (EGRP), will be held on October 6-7, 2014, at the Neuroscience Center Building in Rockville, Maryland.


The breadth and scope of studies utilizing an epidemiologic framework has grown significantly during the last few decades. This trend is mainly due to the increasing complexity of the research questions being asked and the consequent need to coalesce data across many studies. In fact, the case has been made that "integrative epidemiology"--the integration and analysis of heterogeneous and multi-layered data sets—may be the key to advance the practice of epidemiology in the twenty-first century.

The current climate of scarce resources necessitates assembling these expansive and heterogeneous data sets through extensive multi-disciplinary collaboration or consortia. Throughout the last decade, consortia-based research has grown exponentially, has contributed to a better understanding of the complex etiology of cancer, and has provided fundamental insights into key environmental, lifestyle, clinical, and genetic determinants of these diseases and their outcomes. Using existing epidemiology data sets and complementing them with newly acquired genomic, clinical and other types of data can empower researchers to address important health issues more expeditiously and in a more cost-effective manner than through the creation of new large research infrastructures. Data harmonization, or the process of assessing compatibility of data accrued from independent sources, is an essential step to support these integrative analyses.

The goals of the Novel Approaches and Challenges to Data Harmonization Workshop are:

  1. To explore the theory and practice of multi-level data harmonization;
  2. To consider available tools and new approaches;
  3. To review representative case-studies; and
  4. To develop recommendations for best practices.

Return to Top


Neuroscience Center Building, Conference Room A1-A2
6001 Executive Blvd, Rockville, Maryland

Day 1 - Monday, October 6, 2014

Time Topic
9:00 a.m. - 10:20 a.m.

Consortia in Epidemiology and the Need for Data Harmonization
Daniela Seminara, Ph.D., M.P.H.
Senior Scientist and Consortia Scientific Coordinator
Epidemiology and Genomics Research Program (EGRP), Division of Cancer Control and Population Sciences (DCCPS), National Cancer Institute (NCI)

Setup of Data Harmonization and Cross-Study Data Collection
Carolyn Hutter, Ph.D.
Program Director, Division of Genomic Medicine
National Human Genome Research Institute (NHGRI)

The Harmonization Process: The Datashaper Experience
Isabel Fortier, Ph.D.
Research Institute of the McGill University Health Centre

Data Harmonization for Signature Projects in the Cohort Consortium
Michelle Brotzman, M.P.H.
Study Manager

10:20 a.m. - 10:35 a.m. Break
Session I: Epidemiology Data Harmonization
Moderator: Sara Olson, Ph.D., Memorial Sloan Kettering Cancer Center
10:35 a.m. - 11:15 a.m.

Challenging Variables (such as age at menopause, use of HRT, allergies, diet, and other examples)
Susan Slager, Ph.D.
Professor of Biostatistics
Mayo Clinic

Combining Variables from Case-Control and Cohort Studies
Wendy Setiawan, Ph.D.
Assistant Professor of Preventive Medicine
University of Southern California, Los Angeles

11:15 a.m. - 11:40 a.m. Discussion
11:40 a.m. - 12:25 p.m. Lunch
Session II: Clinical and Outcome Data Harmonization
Moderators: Jonine Bernstein, Ph.D., Memorial Sloan Kettering Cancer Center and Lindsay Morton, Ph.D., Division of Cancer Epidemiology and Genetics (DCEG), NCI
12:25 p.m. - 2:10 p.m.

Harmonizing Tumor Subtypes
Peggy Porter, M.D.
Member, Human Biology Division and Public Health Sciences Division
Fred Hutchinson Cancer Research Center

Harmonizing Treatment Data from Cancer Survivor Studies
Lawrence Kushi, Sc.D.
Director of Scientific Policy
Kaiser Permanente Northern California Division of Research

Combining Data Designed to Address a Specific Question with Those Looking Ad-Hoc for an Outcome (e.g., case-control studies looking at survival or second breast cancer primaries in existing studies)
Lindsay Morton, Ph.D.
Investigator, Radiation Epidemiology Branch

Harmonizing Non-Cancer Disease Outcome
Leslie Lange, Ph.D.
Research Associate Professor
University of North Carolina School of Medicine

2:10 p.m. - 2:25 p.m. Break
Session III: Biomarkers, Genomics, and GxE Data Harmonization
Moderators: Ulrike Peters, University of Washington School of Public Health and Gabriel Lai, EGRP, DCCPS, NCI
2:25 a.m. - 3:25 p.m.

Combining Biomarkers Other Than Genotypes
Anne Zeleniuch-Jacquotte, M.D.
Professor, Departments of Population Health and Environmental Medicine
NYU Langone Medical Center

Harmonizing Genomic Data
Christopher Amos, Ph.D.
Associate Director for Population Sciences, Norris Cotton Cancer Center
Professor, Geisel School of Medicine at Dartmouth

GxE Harmonization
Peter Kraft, Ph.D.
Professor of Epidemiology
Harvard University School of Public Health

3:25 a.m. - 3:50 p.m. Discussion

Day 2 - Tuesday, October 7, 2014

Time Topic
Session IV: Issues Related to Analysis
Moderator: John Ioannidis, M.D., D.Sc, Stanford School of Medicine
9:00 a.m. - 10:00 a.m.

Meta-Analysis: Mega-Analysis and Pooling
Ken Rice, Ph.D.
Associate Professor
University of Washington

Outliers and Approaches in the Presence of Biological Heterogeneity
Nilanjan Chatterjee, Ph.D
Biostatistics Branch Chief and Senior Investigator

Data Harmonization Across Large Consortia: Analytic Challenges
Donna Spiegelman, Sc.D.
Professor of Epidemiologic Methods
Harvard University School of Public Health

10:00 a.m. - 10:25 a.m. Discussion
10:25 a.m. - 10:40 a.m. Break
Breakout Groups
10:40 a.m. - 12:15 p.m. Harmonization with an Eye to the Future
  • Epidemiology
  • Clinical
  • Outcome
  • Biomarkers
  • Analysis
12:15 p.m. - 1:00 p.m. Lunch with Keynote Speaker John Ioannidis
Multi-Level Data Integration
1:00 p.m. - 2:30 p.m. Breakout Groups Reports and Recommendations
Moderator: Jonine Bernstein, Ph.D., Memorial Sloan Kettering Cancer Center
2:30 p.m. - 2:45 p.m.

Concluding Remarks and Charge to the Group
Sara Olson, Ph.D.
Associate Attending Epidemiologist
Memorial Sloan Kettering Cancer Center

2:45 p.m. Meeting Adjourned

Return to Top


There are a limited number of seats available for in-person attendance. Once the room reaches capacity, additional registrants will be placed on a wait list for in-person attendance. A limited number of seats may become available for individuals interested in attending the meeting in person.

Both days of the Workshop will be broadcast via webinar (with the exception of the breakout sessions on Day 2). There are no limits to the number of individuals that may participate in the webinar.

The NCI Workshop on Broadening Epidemiologic Data Sharing, will be held on October 8, 2014. Learn more on the meeting web page.

Return to Top

Planning Committee

  • Sara Olson, Ph.D., Memorial Sloan Kettering Cancer Center
  • Jonine Bernstein, Ph.D., Memorial Sloan Kettering Cancer Center
  • Ulrike Peters, University of Washington School of Public Health, Department of Epidemiology
  • Carolyn Hutter, Ph.D., M.S., National Human Genome Research Institute, Division of Genomic Medicine
  • Daniela Seminara, Ph.D., M.P.H., National Cancer Institute (NCI), Division of Cancer Control and Population Sciences, EGRP
  • Lindsay Morton, Ph.D., NCI, Division of Cancer Epidemiology and Genetics
  • Gabriel Lai, Ph.D., NCI, DCCPS, EGRP

Return to Top

Related Resources

Return to Top


Return to Top