On this page...
What are genomic summary results (GSR)?
Genomic summary results (GSR) are summary genomic data generated from primary analyses of genomic research across many individuals (also referred to as “aggregate genomic data” or “genomic summary statistics.” GSR include data calculated from a study sample such as genotype counts, allele frequencies, effect size estimate and standard errors, p-values). Many research and clinical questions can be addressed using summary information without requiring the individual-level data. Because sharing of summary statistics is easier and more efficient than sharing of individual level genetic data, there has been a proliferation of analytical methods that use summary statistic information.
The following references from NIH provide additional information about GSR:
- What are Genomic Summary Results and How Can GSR Inform Research and Clinical Care?
- National Human Genome Research Institute Genomic Summary Results Update FAQs
- Summary from the National Human Genome Research Institute Workshop on Sharing Aggregate Genomic Data (May 2016)
NIH policy for sharing GSR
In 2016, the National Human Genome Research Institute (NHGRI) held a workshop to discuss the benefits and risks of sharing GSR. The workshop resulted in the November 2018 notice in the NIH Guide for Grants and Contracts, NOT-OD-19-023, which updated NIH’s management of GSR under the Genomic Data Sharing (GDS) Policy: previously all GSR were placed under controlled access in NIH-designated repositories, but this update allowed for GSR to be available through unrestricted access for most datasets (i.e., those not considered “sensitive” due to privacy risks).
Where can I find cancer GSR data?
The following list includes databases and websites where cancer GSR data may be found and accessed:
- dbGaP
- Tutorial for finding GSR in dbGaP [PDF - 2.21 MB]
- Obtaining access to controlled data (for GSR data that is NOT freely available without Authorized Access approval)
- NHGRI – EBI GWAS Catalog
- Below are some tips for searching the GWAS Catalog
- Use the GCST# for a given GSR dataset if you have it.
- Enter the GCST# into the GWAS Catalog search field.
- Click on the directory GCST# link. Then click on the “.tsv” file to download.
- Go to your downloads folder and open the tsv file (Notepad is a good option for opening the file).
- Below are some tips for searching the GWAS Catalog
- The following are some examples of other databases/websites with cancer GSR:
- NCI Division of Cancer Epidemiology and Genetics GWAS Explorer
- Breast Cancer Association Consortium
- Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome
- Ovarian Cancer Association Consortium
- Integrative Analysis of Lung Cancer Etiology and Risk
- GWAS results from the UK Biobank
- Pan cancer summary statistics from meta-analyses of UK Biobank and Kaiser GERA cohorts
If you have questions or know of other databases or websites with cancer GSR data that could be added to the list above, please contact Dr. Danielle Carrick by emailing Danielle.Carrick@nih.gov.
Where can I deposit GSR data?
- When deciding on where to deposit your GSR data to make it available to the public, some questions you might ask are:
- Is the GSR data sensitive (i.e., requiring controlled access due to potential privacy risks) and/or is it associated with a dataset already in dbGaP? If so, dbGaP is a good option. More information about privacy risks is available on NHGRI’s website.
- Is the GSR data associated with a specific publication or unpublished, non-sensitive data? If so, the GWAS Catalog could be a good option. Learn how to submit summary statistics in the GWAS Catalog.
If I get individual-level data from dbGaP and generate GSR, can I share the newly generated GSR data?
The answer to this depends on whether the individual-level dataset you requested is designated as “sensitive” or "non-sensitive," which affects whether public posting of GSR is “restricted” or “allowed.” This information can be found on the dataset’s dbGaP public study page, in the “Authorized Access” section, which describes the terms of access of secondary users.
- If the data you requested is non-sensitive, public posting of genomic summary results is “allowed.”
You can determine this by looking under the Authorized Access section of the dbGaP entry; if sharing the GSR data generated as a secondary user is allowed, it would say “Public Posting of Genomic Summary Results: Allowed.”
For non-sensitive datasets, requester-generated GSR can be shared/posted. Data requesters can indicate plans to generate and disseminate GSR in their research use statement if they wish to post GSR more broadly than publication within the scientific literature as an intrinsic piece of evidence to support a study’s conclusions, and this may be approved by a DAC. Requesters do not need to indicate what specific GSR they plan to generate and disseminate.
- If the data you requested is sensitive, public posting of genomic summary results is “restricted.”
You can determine this by looking under the Authorized Access section of the dbGaP entry; if sharing the GSR data generated as a secondary user is "restricted," it would say “Public Posting of Genomic Summary Results: Not Allowed.”
For datasets that are designated as sensitive, DACs will not approve research use statements that indicate plans to disseminate GSR more broadly than publication within the scientific literature to support a study’s conclusions.