Guidelines for Data Sharing, Analysis, and Publication
The OncoArray members from the Genetic Associations and Mechanisms in Oncology (GAME-ON) Initiative are committed to making the NIH-funded fraction of the OncoArray data available through the database of Genotypes and Phenotypes (dbGaP) in a timely manner. The OncoArray Consortium is concerned that the results of this large-scale experiment, involving literally dozens of studies, be presented in the most accurate and complete manner to ensure that any clinical use of the results is based on all the evidence, not a small and perhaps unrepresentative fraction. Thus, the OncoArray Network have planned to analyze all the data in a rapid, timely and comprehensive fashion, with the data embargoed under NIH guidelines to prevent publication of potentially inaccurate results. The OncoArray Steering Committee can alter this policy at any time, if a credible case is made for breaking the embargo in the interests of patients or their health advisors.
Key Principles Regarding Sharing of Phenotypic Information, and Nominated Single Nucleotide Polymorphisms (SNPs)
- Each OncoArray Network member has control over its own phenotype data.
- Use of data within the OncoArray Network is subject to rules agreed to by the Consortium.
- SNP "ownership" is study-based and dependent on the SNPs nominated by and allocated to each study at the time of the OncoArray design and for the duration of the specified embargo periods described below. SNPs in the common region and the GWAS backbone can be evaluated by all network members.
- SNPs were nominated for inclusion on the OncoArray by many investigators, following multiple analyses within each consortium, together with a GWAS backbone and additional and SNPs of common interest (for example, SNPs relevant to pharmacogenetics). The actual SNPs that have been placed on the array often deviate from those that were nominated, so defining SNP ownership is somewhat imprecise. However, the Informatics Working Group will, to the best of its ability, strive to identify the sources of SNP selections. Members of the OncoArray Network should recognize the efforts of other investigators when developing manuscripts that include SNPs contributed by others, specifically during the embargo period (see section 2 below, Estimated Project Timelines). Additionally, it is suggested that consortia develop proposals describing proposed analyses for the use of data derived from the OncoArray to streamline work and avoid overlaps within consortia. Additionally, sharing of proposals among consortia will facilitate cross-cancer analyses.
- It is expected that studies based on the same phenotype will usually publish jointly as a consortium. Within each consortium, the rules for use of the data and the rights of the SNP nominators to the OncoArray are for each consortium to decide.
- During the period of the embargo, specific consortium members described above will control the timing of publication of data for SNPs they nominated. However, they do not have any ownership of the genotype data for those SNPs for samples from other studies.
Each group has control of all OncoArray SNP data obtained by testing the samples it provided. A proposed timeline is indicated below, which includes dates for completion of genotyping for all samples in all of the studies along with quality control steps and the embargo periods.
Data Deposition: Data from the OncoArray Network for samples genotyped at the Center for Inherited Disease Research (CIDR) will be deposited into dbGaP once all quality control has been completed. Data will be accessible to scientists who apply at the time of data deposition but will be embargoed for one year after the data are deposited. Data from other NIH-funded projects will also need to be deposited in the same timeframe (see section 5 below, Data Analysis and Publication of Data Related to Samples Belonging to the Network).
Publications based on genotype data from SNPs not nominated by a consortium are also embargoed for the same period.
1.1 Additional Purchases
Additional investigators may join the Network subject to approval of the OncoArray Network Steering Committee. In doing so, they commit to sharing data as appropriate for combined analyses and to abide by the Consortium publication rules. These investigators will get the unannotated list, but with information indicating which consortium nominated which SNP.
The OncoArray will be available to investigators outside of the OncoArray Network through Illumina once the OncoArray Network has completed its purchases. No annotation will be available to these groups.
Estimated Project Timelines
- July 31, 2014:
- Projected completion of main Consortium genotyping at CIDR and other genotyping sites, and receipt of genotype data by each study/consortium
- January 31, 2015:
- Projected completion of genotyping data quality control and deposition of NIH-funded genotyping data in dbGaP
- July 31, 2015:
- End of 6 month period when data are only available to Oncoarray consortium. Outside groups can then apply for deposited data
- January 31, 2016:
- Embargoes end on publications, including SNPs nominated by other investigators and of publication by outside groups utilizing deposited data
Each study is responsible for the initial management – before submission to dbGaP – for their own samples for data that are generated outside of CIDR.
The study-specific genotype dataset will be distributed to the data analyzing centers designated by each study.
Final genotype calling and quality control steps that merge data from CIDR with data generated by other genotyping centers will be managed by each study.
The rules governing the distribution of study-specific data to individual investigators within that study are the responsibility of each study.
Data Deposition for NCI/NIH funded Studies
It is a requirement that all data generated using NCI/NIH funds be deposited in dbGaP. These data should be deposited after completion of quality control, together with basic phenotype data. To expedite the uploading of data, sites should begin uploading phenotype data to dbGaP once the data certifications have been completed, and before genotyping is finished. These data will be available on a restricted access basis to all OncoArray Network investigators.
Proposed phenotype data:
- Case-control status and index age
- Basic BRCA1/2 mutation status will be supplied by CIMBA
Additional data of interest (where compatible with consents and approved by local IRBs):
- Smoking behavior, when available (for lung cancer: current, former, never, age at initiation, number of years smoked, packs/day)
- Family history of the cancer at the same site in a first-degree relative
- Case subtype information (histopathological subtypes, ER status for breast cancer, etc.)
Summary data on the study (case/control selection, the source of the samples, the DNA processing protocol, and amplification procedures if any were used etc.) should also be provided.
Data Analysis and Publication of Data Related to Samples Belonging to the Network
Each OncoArray Network participating study is free to analyze genotype data for their own samples for any SNPs at any time. SNP "nominators" have control over the timing of publication of results relating to their SNPs (through the embargo) but have no control over the timing of data analysis by other consortia. Rules of use from participating groups need to be considered with respect to analyses.
5.1 SNPs that are nominated by only one study participating in the Network
The SNP nominator has exclusive rights to submit for publication data for their own SNPs with their own phenotype data for one year after completion of the genotyping quality control.
There is a one-year embargo from the data of genotyping quality control completion by NCI/NIH funded studies on other studies from submitting papers for their own phenotype data related to SNPs owned by another study.
After the one-year embargo, the other studies will be free to publish data relating to those SNPs and their own phenotypes. There is no obligation to acknowledge the SNP nominator with authorship, but there is an expectation that consortia contributing to the array should at least be acknowledged on publications and it is preferable to include coauthorship from contributing investigators. All funding sources relevant to each genotype must be indicated in subsequent papers. Other studies can negotiate with the SNP nominator to publish within the one-year embargo. An offer of authorship may be appropriate.
5.2 SNPs that are nominated by more than one study participating in the Network
The joint nominators will have joint rights over the publication of these data for one year after completion of the genotyping quality control. Either can publish the data for their phenotype – the agreement of the other is not required.
There is a one-year embargo on other studies from submitting papers, including the data related to these SNPs for their phenotype.
After the one-year embargo, the other studies will be free to submit data relating to those SNPs and their own phenotypes. There is no obligation to acknowledge the SNP with authorship.
Other studies can negotiate with the SNP to publish within the one-year embargo. An offer of authorship may be appropriate.
5.3 SNPs that are jointly nominated by all OncoArray Network participating studies or a third party
Each study is free to analyze and publish data for their phenotype at any time.
Some SNPs in this category were nominated by third parties (for example, unpublished SNPs associated with other phenotypes). Appropriate authorship for the nominating parties should be given, where possible.
5.4 Additional considerations
There are some situations in which a study might reasonably expect to publish data on SNPs that they did not nominate, within the embargo period. For example:
5.4.1 Mapping of known loci: Where a study nominated a set of SNPs for fine mapping of a known locus, it is possible that there are other SNPs in the region owned by the other studies. While permission of the publication of those SNPs is required to publish before the embargo ends, there is an expectation that permission will be granted for all SNPs across the particular fine-mapping region.
5.4.2 Mapping of new loci identified by a study: A study may have identified a new hit or near hit for a SNP that it owns, and there may be other SNPs in the same region, for example, because another study has fine-mapped the region or because another study has nominated a nearby SNP for other reasons. While permission of the publication of those SNPs is required to publish before the embargo ends, there should be an expectation that permission will be granted for SNPs across the particular region. The initial hit should be close to genome-wide significance (P<10-6).
5.4.3 Identification of new loci not nominated by the studies: If a study finds a hit for a SNP that it did not nominate, it can negotiate with the original nominator to publish results from that SNP within the one-year embargo, but there should not be the expectation that such permission would be automatically granted. It is anticipated that only a limited number of investigators involved in the original nomination of a SNP would participate in cross-consortium analyses and be recognized in subsequent papers.
5.4.4 Imputation: It is expected that the OncoArray will be imputed to provide estimated genotypes for additional variants on reference panels (e.g., 1000 Genomes Project). It is recognized that each group will use all SNPs on the array, not just the SNPs they nominate, and may publish on the results of the imputed SNPs. However, the embargo still applies to SNPs nominated by investigators from another group, so 5.4.3 still applies.