Integrating Host and Tumor Genome Data Holds Promise for Understanding Cancer Risk

The information on this page is archived and provided for reference purposes only.

Cancer is a disease of two genomesExternal Web Site Policy—the host, or germlineExternal Web Site Policy genome, and the tumor, or somatic genome—and extensive data have been generated from both genomes. Most studies of cancer risk focus on the germline genome to identify potential cancer risk variants in genome-wide association studies (GWAS). More recently, whole genome or whole exome approaches have been used and hundreds of variants associated with cancer risk have been identified to date. However, the variants thus far identified do not fully explain the heritability of cancerExternal Web Site Policy. Similarly, The Cancer Genome AtlasExternal Web Site Policy (TCGA) program and the International Cancer Genome ConsortiumExternal Web Site Policy have generated comprehensive molecular profiles of more than 50 tumor types, representing 25,000 tumor genomes with matched normal genome data. This wealth of data is being extensively mined to fully understand driver eventsExternal Web Site Policy responsible for tumor growth.

From the point of view of better understanding the genetic/heritable component of cancer risk, integrating somatic and germline data has been useful for 1) aiding the functional analysis of variants identified in genetic association studies, and 2) using somatic molecular data to re-assess our understanding of cancer types.

GxS figure

Figure 1. Germline and somatic studies of cancer risk inform each other and both benefit from the integration of the two sources.

Functional Analysis

Most cancer risk variants identified by genetic association studies are located in unannotated or noncoding regions of the genome, which makes understanding their function a challenge. Using TCGA gene expression dataExternal Web Site Policy, investigators have compared expression patterns in tumor versus normal tissue (which provides information about the germline genome) and have detected changes in gene expression in some genomic regions where risk variants are located. They hypothesize that some variants may cause changes in gene expression, i.e., they function as expression quantitative trait loci (eQTLs), thus providing a link between germline risk variants and events that occur in the tumor itself. These efforts have identified genes differentially regulated by risk variantsExternal Web Site Policy, several of which were not previously known to be involved in cancer. Similar comparisons have found that risk-associated genotypes identified by GWAS may affect methylation patternsExternal Web Site Policy that also alter gene expression.

Molecular Subtypes

Tumor molecular data can define tumor type more precisely than histological categories. Incorporating this data into studies of genetic risk may help reduce phenotypic heterogeneity among tumors currently defined as a single histological type, and thus may facilitate discovery of germline variants that increase risk for a specific cancer subtype. For example, a meta-analysis of estrogen receptor-negative breast tumors has identified germline variants associated with risk for this specific breast cancer subtype. Emerging data from genomic profiling of tumor tissue has revealed that certain types of breast and ovarian cancerExternal Web Site Policy may be more similar at the molecular level than previously thought, and this information may help clarify why certain genetic risk variants associate with several different cancer typesExternal Web Site Policy. Given the relatively recent generation of complete tumor profiles, incorporation of tumor data into association studies is in its early stages. As this research progresses, it will be interesting to see whether specific germline variants associate with specific tumor subtypes, and whether working with a more homogeneous tumor phenotype, as defined by molecular profiling data, will increase the power of association studies to identify rarer, novel variants.


Integrating germline and somatic genomic data will involve working with large and complex data types, including GWAS and whole genome and whole exome sequencing data; it will also require studies with large numbers of samples and the participation of investigators with a broad range of expertise. The National Institutes of Health Big Data to KnowledgeExternal Web Site Policy (BD2K) initiative may help address issues related to complex data analysis. NCI's Cohort Consortium and GAME-ON initiative offer examples of productive collaboration involving large numbers of investigators with expertise in a range of fields, including genetic epidemiology, molecular biology, and clinical sciences. The Database of Genotypes and PhenotypesExternal Web Site Policy (dbGaP) provides controlled access to existing datasets that contain both germline and somatic data. For further information on tools and other resources available for cancer genomic research, please visit EGRP's Genomic Resources for Cancer Epidemiology web page.

We Want to Hear From You

The Epidemiology and Genomics Research Program (EGRP) in NCI’s Division of Cancer Control and Population Sciences is interested in hearing about opportunities and challenges faced by investigators undertaking research that integrates data from the germline and tumor genomes, any resources that may be particularly helpful, and ways in which EGRP could facilitate this research.

Stefanie Nelson, Ph.D., is a Program Director in EGRP’s Host Susceptibility Factors Branch. She is responsible for developing and managing a portfolio of grants that focuses on host factors affecting cancer risk.


  • ROLANDO BAYONA - March 25, 2015 at 6:41 PM (UTC -4)

    i´M NOT REALLY A SCIENTIST, I´m just a MD and Epidemiologyst, actually I´m the National referent in kids cancer and congenital diseases at the;” Instituto Nacional de Salud” of Colombia. my line of work is the notification of cases, compilation, clasification and sometimes analisys or the showing results to let the health ministery to make decitions about oportunities in treatment, inequity, risk factors. Dr. Stefanie Nelson, there is not a genetic line of cancer genetic vigilance in my country and I dont speak english, but i´ll try, if you let me do it ,to start constructing a specific line in this topic for Colombia.

    Thank you for letting me study your interesting lines, I´ll try to do my best and probably conect reall Colombian Experts for constructing a prospective experience including progresivelly items about genetic and cancer, exposures, ocupational risk, and other develoment items.

    If you tink this will have a minimal opportunity please let me know about it


    MD Epidemiologo
    Referente Nacional de cáncer en menores de 18 años y alteraciones congénitas
    Instituto Nacional de Salud
    Colombia Sur América

  • ROLANDO BAYONA - March 25, 2015 at 6:44 PM (UTC -4)

    OK I understand, thank you a lot.

Return to Top

The information on this page is archived and provided for reference purposes only.