Understanding the Function of Cancer Risk Variants: Challenges and Opportunities

The information on this page is archived and provided for reference purposes only.

During the past decade, more than 1,700 genome-wide association studies (GWAS) have identified more than 100,000 risk variantsExternal Web Site Policy associated with various traits and complex common diseases, including hundreds of genetic variants that have small effects on cancer risk. Most of these single nucleotide polymorphisms (SNPs) are located in regions of the genome whose function remains largely unknown. Because these regions may contain many potential causal variants, identifying the true causal variant can be challenging. Although technologies and resources have been developed to help researchers navigate these challenges, limited data and insufficient reproducibility are ongoing problems.

Technological Advances

Word cloud featuring various analytical techniques for functional genomics studies created using Tagul.com

Word cloud featuring various analytical techniques for functional genomics studies created using Tagul.com

Gene expression studies are one of the most commonly used methods for identifying the true causal variant within a genomic region and understanding variant function. Expression quantitative trait locus (eQTLExternal Web Site Policy) mapping has helped researchers quantify and link under- or overexpression of thousands of genes to a particular variant simultaneously.

Another approach to understanding the impact of these variants on cancer is to focus on their potential regulatory roles. Projects like the Encyclopedia of DNA Elements (ENCODE) are dedicated to identifying regulatory elements in the genome that may facilitate understanding of the function of these risk SNPs.

Technologies also have evolved to support other high-throughput interrogation of mechanisms involved in gene regulation and to identify epigenomic signatures. Chromatin-immunoprecipitation sequencing assays (ChIP-seqExternal Web Site Policy), DNase I hypersensitive sites sequencing (DNase-seqExternal Web Site Policy), and Formaldehyde-Assisted Isolation of Regulatory Elements sequencing (FAIRE-seqExternal Web Site Policy) have been useful in establishing the impact of risk variants on transcription factor binding and chromatin structure. Genome editing tools such as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRExternal Web Site Policy) technology are used widely in cancer biology to directly observe the impact of variants in tumor cells or animal models.

Complex Challenges

Although technology is advancing the field of functional genomics rapidly, many challenges remain for researchers attempting to characterize the functions of cancer risk SNPs. In gene expression studies, results may be difficult to replicate and vary depending on tissue type. This can pose a challenge when determining if a particular SNP is upregulating or down-regulating expression of a gene, either alone or in concert with other SNPs, in a specific tumor type.

Validating previous findings also can be difficult as researchers attempt to move discoveries made in cell culture to in vivo models. Established tumor cell lines may contain different complements of mutations than those observed in human tumors. Although they are extensively used in cancer research, mouse models often do not faithfully recapitulate human disease.

Additionally, cancer is a dynamic disease, and mutations present at disease initiation may not be detected at later stages and vice versa. This can make it difficult to compare findings from cell lines to results from primary tumors. More research is needed to determine how to best translate functional findings between various models.

Beyond Current Resources

Although many resources are available for tissue-specific expression studies, often the data are limited. For example, the ENCODE project contains data on a limited number of cancer cell types. The Roadmap Epigenomics ConsortiumExternal Web Site Policy has characterized the epigenomes of more than 346 cell types and tissues and hosts more than 10,000 epigenomic datasets, but data from additional tumor types would be useful. The Cancer Genome Atlas (TCGA) contains expression data for numerous tumor types and matched normal tissue, but is limited by the amount of germline or epidemiological/clinical data it provides. Integration of these resources is needed to overcome these barriers.

We would like to hear from you!

Please comment below, and let us know what challenges and opportunities you have encountered in determining the function of cancer risk variants that should be addressed. Also, please visit the Epidemiology and Genomic Research Program's (EGRP) Genomic Resources for Cancer Epidemiology web page for a continuously updated list of tools and data resources for functional analysis.


Sharna Tingle, M.P.H., is a Program Analyst in the Genomic Epidemiology Branch of EGRP in NCI’s Division of Cancer Control and Population Sciences. She supports NCI’s implementation of the NIH Genomic Data Sharing Policy and is interested in public health genomics, bioethics, and health policy.


Stefanie Nelson, Ph.D., is a Program Director in EGRP’s Genomic Epidemiology Branch. She is responsible for developing and managing a portfolio of grants that focuses on host factors affecting cancer risk.


Return to Top

The information on this page is archived and provided for reference purposes only.