Genomic Resources for Cancer Epidemiology

Note: This web page provides links to research resources that may be of interest to genetic epidemiologists conducting cancer research, but is not exhaustive. Within each section, the resources are listed in alphabetical order.

If you have suggestions for additional resources to add, please contact nciepimatters@mail.nih.gov.


Data Resources, Genotyping and Sequencing Centers, and NCI- and NIH-Sponsored Networks and Programs

  • Databases and Catalogues of Genetic Variation
    View Resources
    • 1000 Genomes ProjectExternal Web Site Policy
      The goal of the 1000 genomes project is to provide a comprehensive resource on human genetic variation. The Project is sequencing the genomes of approximately 2,500 samples at 4x coverage, to provide data on genetic variants with frequencies of at least 1% in the populations studied.
    • Database of Genomic Structural Variation (dbVar)External Web Site Policy
      dbVar is the NCBI central repository for structural variation. Structural variation is generally defined as any region of DNA involved in inversions and balanced translocations, insertions and deletions, or copy number variation.
    • Database of Single Base Nucleotide Substitutions (dbSNP)External Web Site Policy
      dbSNP is the NCBI central repository for single base nucleotide substitutions (SNPs) and short deletion and insertion polymorphisms.
    • Encyclopedia of DNA Elements (ENCODE) dataExternal Web Site Policy
      ENCODE provides a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
    • International HapMap ProjectExternal Web Site Policy
      The HapMap is a catalog of common genetic variants that occur in human beings. It describes what these variants are, where they occur in our DNA, and how they are distributed among people within populations and among populations in different parts of the world.
    • National Heart Lung and Blood Institute (NHLBI) Exome Variant Server (EVS)External Web Site Policy
      The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of exome sequencing across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community. The current EVS data release represents all variants identified from exome sequencing of 6503 ESP samples.
    • SNP500CancerExternal Web Site Policy
      SNP500Cancer provides a central resource for sequence verification of SNPs in genetic regions of importance to molecular epidemiology studies in cancer.
  • Genomic Datasets for Cancer Research: Datasets and Access Policy
    View Resources
    • Data Access Request Process
      This page contains instructions for submitting a Data Access Request for dataset(s) under the purview of the NCI's Extramural Data Access Committee.
    • Database of Genotypes and Phenotypes (dbGaP)External Web Site Policy
      dbGaP was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include GWAS, medical sequencing, molecular diagnostic assays, as well as studies of associations between genotype and non-clinical traits. dbGaP provides two levels of access, open and controlled, in order to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information.
    • Genomic Datasets for Cancer Research
      This page provides information on a variety of datasets from genome-wide association studies (GWAS) of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays. These data are available to approved investigators through the National Cancer Institute (NCI)'s Extramural Data Access Committee (DAC).
    • GWAS Policy Home PageExternal Web Site Policy
      In January 2008, the National Institutes of Health (NIH) implemented a policy for the sharing of data obtained in NIH-supported or conducted GWAS. The purpose of the policy is to foster science for the benefit of the public through the creation of a centralized NIH GWAS data repository. This website supports the GWAS policy's implementation.
    • Notice on Development of Data Sharing Policy for Sequence and Related Genomic DataExternal Web Site Policy
      This notice details NIH plans to: 1) updated data sharing policies for NIH supported research involving sequence and related genomic data; 2) encourage investigators and IRBs to consider the potential for broad sharing of this genomic data in developing informed consent processes and documents for such studies; and 3) communicate the agency's intent to develop a policy pertaining to the deposition of these large datasets into centralized databases.
  • Genotyping and Sequencing Centers
    View Resources
    • Cancer Genomics Research Laboratory (CGR)External Web Site Policy
      NCI established the CGR to investigate the contribution of germline genetic variation to cancer susceptibility and outcomes. Working in concert with epidemiologists, biostatisticians and basic research scientists in the intramural research program, the CGR has developed the capacity to conduct genome-wide association studies and next-generation sequencing to identify the heritable determinants of various forms of cancer.
    • Center for Inherited Disease Research (CIDR)External Web Site Policy
      CIDR provides high-quality next generation sequencing and genotyping services to investigators working to discover genes that contribute to common diseases.
    • Mendelian Genome CentersExternal Web Site Policy
      These Centers, funded by the National Human Genome Research Institute (NHGRI) apply next-generation sequencing and computational approaches to discover the genes and variants that underlie Mendelian conditions, including certain forms of cancer.
    • National Human Genome Research Institute (NHGRI) Large Scale Sequencing ProgramExternal Web Site Policy
      NHGRI funds large-scale genome sequencing capacity at several centers located in the U.S. This program undertakes sequencing projects to provide critical genomic information that can be of significant value to the scientific community in areas of very broad scientific interest.
  • Literature and Knowledge Base Resources
    View Resources
    • Cancer Genome-Wide Association and Meta Analyses database (Cancer GAMAdb)External Web Site Policy
      Cancer GAMAdb provides a continually updated database containing key descriptive characteristics of each genetic association extracted from published GWAS and meta-analyses relevant to cancer risk.
    • Cancer Genomic Evidence-Based Medicine Knowledge Base (CancerGEM KB)External Web Site Policy
      CancerGEM KB is a resource for researchers, public health professionals, policy makers, and health care providers who are interested in the use of genomic information in cancer care and prevention.
    • GeneReviewsExternal Web Site Policy
      GeneReviews are overviews providing expert-authored, peer-reviewed, current disease descriptions that apply genetic testing to the diagnosis, management, and genetic counseling of patients and families with specific inherited conditions.
    • HuGE NavigatorExternal Web Site Policy
      The Navigator is an integrated, searchable knowledge base of genetic associations and human genome epidemiology.
    • Pharmacogenomic Resources
      This page provides links to pharmacogenomics collaborative opportunities, consortia, and networks; databases related to pharmacogenomics research; knowledge synthesis resources; reports; and toolkits.
    • SEQanswersExternal Web Site Policy
      SEQanswers was founded to be an information resource and user-driven community focused on all aspects of next-generation genomics. The site aims to be a central location for next generation sequencing technology discussion and education. The site will always attempt to cater to everyone, regardless of scientific background or knowledge.
  • NCI/NIH Sponsored Networks and Programs
    View Resources
    • Cancer Genetics Markers of Susceptibility (CGEMS)External Web Site Policy
      CGEMS was launched to identify common inherited genetic variations associated with risk for breast and prostate cancer. It involves genome-wide association studies (GWAS) for a number of cancers, and more recently, exposures and survival. The raw genotype data from each of the CGEMS projects will be available for download to accredited investigators, upon approval of a Data Access Request.
    • Environmental Polymorphism Registry (EPR)External Web Site Policy
      The EPR is a long-term research project to collect and store DNA from up to 20,000 North Carolinians in a biobank. The DNA samples are available to scientists to study variations in genes (known as polymorphisms) that might be linked to common diseases such as diabetes, heart disease, cancer, asthma and others. While many types of genes are studied as part of the EPR, the focus is on a category known as environmental response genes.
    • Genes, Environment and Health Initiative (GEI)External Web Site Policy
      The GEI is an NIH-wide initiative that aims to accelerate understanding of genetic and environmental contributions to health and disease. There are two components to GEI: genetics and exposure biology. The genetics component includes a genome-wide association program called GENEVA (Gene Environment Association Studies)External Web Site Policy.
    • Genetic Associations and Mechanisms in Oncology (GAME-ON)
      GAME-ON comprises five NCI sponsored cooperative agreements for transdisciplinary research projects addressing two overall goals: 1) To pursue promising scientific leads from previously generated GWAS of cancer; and 2) To coordinate and accelerate integrative post-GWAS discovery research, which could provide the basis for expediting clinical translation and public health dissemination of the findings.

Return to Top

Analytical Tools and Statistical Software

  • Analysis Tools
    View Resources
    • Alphabetical List of Genetic Analysis SoftwareExternal Web Site Policy
      Curated at Rockefeller University, a list of computer software on the following topics: genetic linkage analysis for human pedigree data, QTL analysis for animal/plant breeding data, genetic marker ordering, genetic association analysis, haplotype construction, pedigree drawing, and population genetics.
    • Broad Institute Software ToolsExternal Web Site Policy
      Scientists in the Broad community have developed many critical software tools for the analysis of increasingly large genome-related datasets, and they make these tools openly available to the scientific community. Includes GATKExternal Web Site Policy and HaploviewExternal Web Site Policy.
    • Genetic Simulation Resources (GSR)External Web Site Policy
      This web tool provides a catalogue of existing computer simulation programs that simulate genetic data of the human genome for studies in population and evolutionary genetics, genetic epidemiology, and other relevant application areas. It contains computer programs that generate samples by simulating evolutionary processes backward (coalescent) or forward in time, resampling empirical data, or using other novel methods. This is for use for aid in selection of most appropriate genetic simulation tools for specific genetic epidemiology questions.
    • Genome Variation Server (GVS)External Web Site Policy
      GVS provides information on allele frequencies, linkage disequilibrium, tagSNP selection and SNP summaries. Fed by a local database, GVS enables rapid access to human genotype data found in dbSNP, and provides tools for analysis of genotype data.
    • NGSpeAnalysisExternal Web Site Policy
      This pipeline will use the Burrows-Wheeler Aligner (BWA)External Web Site Policy, Genome Analysis Toolkit (GATK)External Web Site Policy, PicardExternal Web Site Policy, ANNOVARExternal Web Site Policy and BEDToolsExternal Web Site Policy to conduct analysis from alignment of pair ended short reads generated by Next Generation Sequencing machine to high quality variants genotype calling.
    • PLINKExternal Web Site Policy
      PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
    • SEQanswers Software ListExternal Web Site Policy
      Dynamic and comprehensive table of next-generation sequence analysis software compiled on the SEQanswers website. Includes programs that recalibrate the quality scores produced by next-generation sequencing base callers (ShortReadExternal Web Site Policy, SHRECExternal Web Site Policy, BINGExternal Web Site Policy, GATKExternal Web Site Policy) and algorithms for DNA sequencing (BWAExternal Web Site Policy, MAQExternal Web Site Policy, BFASTExternal Web Site Policy, SOAPExternal Web Site Policy, etc)
    • University of Michigan Software ToolsExternal Web Site Policy
      Scientists at the University of Michigan have developed software tools for statistical genetics analysis, and they make these tools openly available to the scientific community. Includes LocusZoomExternal Web Site Policy, MACHExternal Web Site Policy and the CaTS Power CalculatorExternal Web Site Policy.
    • VarScanExternal Web Site Policy
      This statistical package can be used to detect germline variants, somatic mutations, and copy number variations for next-generation sequencing platforms.
  • Genome Browsers and Map Viewers
    View Resources
    • EnsemblExternal Web Site Policy
      Ensembl is a joint project between the European Bioinformatics Institute the Wellcome Trust Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes.
    • Integrative Genomics ViewerExternal Web Site Policy
      The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
    • National Center for Biotechnology Information (NCBI) Human Genome ResourcesExternal Web Site Policy
      NCBI's website strives to offer an integrated, one-stop, genomic information resource for data emerging from the Human Genome Project and other sequencing projects worldwide.
    • NCBI Map ViewerExternal Web Site Policy
      The Map Viewer provides graphical displays of features on the human reference genome sequence assembly maintained by the genome reference consortium and the alternate HuRef genome assembly, as well as cytogenetic, genetic, physical, and radiation hybrid maps.
    • The University of California, Santa Cruz (UCSC) Genome BrowserExternal Web Site Policy
      The UCSC Genome Browser contains the reference sequence and working draft assemblies for a large collection of genomes. This interactive website offers access to genome sequence data integrated with aligned annotations.
      • UCSC Cancer Genomics BrowserExternal Web Site Policy
        This browser allows researchers to investigate cancer genomics data and its corresponding clinical information. The browser can be used to view biological pathways, chromosomal locations, and gene expression data. Statistical analysis can be performed on subsets of the data.
  • Toolkits for Harmonizing or Generating Standardized Measures for Phenotypes and Exposures
    View Resources

Return to Top

Interpretative Tools for Genomic Data

  • Biological Pathway Analysis Programs and Databases
    View Resources
    • Ariadne Pathway StudioExternal Web Site Policy
      This pathway analysis software may be used to interpret gene expression and other high-through put data and is a useful resource for building, expanding, and analyzing pathways. Investigators may also use MedScan as a data mining tool to extract relevant information from publications. Pathways can be used in publications.
    • HotnetExternal Web Site Policy
      Hotnet, an algorithm created by the Ralph Lab in the Department of Computer Science at Brown University, can be used with Matlab and Python statistic packages to find significantly altered sub-networks in large gene interaction networks. Visualizations for Hotnet output require Cytoscape.
    • Ingenuity Pathway AnalysisExternal Web Site Policy
      This program allows researchers to analyze gene expression, RNA-Seq, microRNA, qPCR, proteomics, metabolomics, and genotyping data through the identification of relevant pathways, relationships, mechanisms, and functions. The program can be used for large-scale genomic data.
    • MuSiCExternal Web Site Policy
      This statistical package uses multiple tools to find gene alterations and relationships in cancer. Investigators can compare their mutations with mutations found in COSMIC and OMIM, compare their data with clinical data, and use Pathscan to find altered pathways in cancer.
    • NetpathExternal Web Site Policy
      This database, created in collaboration with the John Hopkins University Pandey Lab and the Institute of Bioinformatics, is a useful resource for curated signal transductions pathways in humans. Netpath provides information on 10 immune pathways and 10 cancer pathways. Each pathway includes information on protein-protein interactions, enzyme catalysis, protein translocation, and gene regulation. All pathways are available for batch download.
    • Regulome ExplorerExternal Web Site Policy
      Regulome Explorer is a web-based tool that allows researchers to use TCGA data for cancer comparisons, random forest regression, and individual genome aberrations. Investigators can use Pubcrawl for data-mining, and building gene networks. These networks can be based on the literature distances in medline or protein domain interactions.
  • Cancer Genome and Somatic Mutation Information
    View Resources
    • cBio Cancer Genomics PortalExternal Web Site Policy
      The cBio Cancer Genomics Portal, developed by the Computational Biology Center at Memorial Sloan-Kettering Cancer Center, provides visualization, analysis and download of subsets of large-scale cancer genomics data sets.
    • Catalogue of Somatic Mutations in Cancer (COSMIC)External Web Site Policy
      The COSMIC database is designed to store and display somatic mutation information and related details and contains information relating to human cancers.
    • The Cancer Genome Atlas (TCGA)External Web Site Policy
      TCGA is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA data are available to the research community for use in developing better ways of diagnosing, treating, and preventing cancer.
  • Catalogues and Databases of Relationships Between Genotypes and Phenotypes
    View Resources
    • ClinVARExternal Web Site Policy
      This is a freely accessible, public archive of reports of the relationships among human variations and phenotypes along with supporting evidence.
    • Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources (DECIPHER)External Web Site Policy
      DECIPHER is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance. This database collects clinical information about chromosomal microdeletions/duplications/insertions, translocations and inversions.
    • GeneNetworkExternal Web Site Policy
      A group of linked data sets and tools used to study complex networks of genes, molecules, and higher order gene function and phenotypes from the University of Tennessee.
    • Human Gene Mutation Database (HGMD)External Web Site Policy
      This database provides a comprehensive core collection of germline mutations in nuclear genes that underlie or are associated with human inherited disease.
    • NHGRI Catalog of Published Genome-Wide Association StudiesExternal Web Site Policy
      This resource provides information on SNP-trait associations abstracted from GWAS publications.
    • Online Mendelian Inheritance in Man (OMIM)External Web Site Policy
      OMIM is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources.
    • Phenotype-Genotype Integrator (PhenGenI)External Web Site Policy
      PhenGenI merges NHGRI GWAS catalog data with several databases housed at the NCBI, including Gene, dbGaP, OMIM, Genotype-Tissue Expression (GTEx), and the Database of Single Nucleotide Polymorphisms (dbSNP).
    • wikiGWAExternal Web Site Policy
      wikiGWA is a Wikipedia style platform for researchers to share their GWA findings.
  • Tools for Predicting Impact of Amino Acid Substitutions
    View Resources
    • Polymorphism Phenotyping (PolyPhen-2)External Web Site Policy
      This tool predicts the possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.
    • PMutExternal Web Site Policy
      This software, aimed at the annotation and prediction of whether a mutation is pathological, formulates predictions with neural networks, using internal databases, secondary structure prediction and sequence conservation.
    • The Sorting Tolerant From Intolerant (SIFT) AlgorithmExternal Web Site Policy
      This tool predicts whether an amino acid substitution affects protein function based on the degree of conservation of amino acid residues in sequence alignments derived from closely related species.
    • Variant Effect PredictorExternal Web Site Policy
      This system (formerly known as the SNP Effect Predictor) categorizes Ensembl genomic variants in known transcripts by their potential effect.

Return to Top