Preparing Data Sharing Plans for Grant Applications

Overview

All resource sharing plans (including data sharing plans) submitted in the grant application may be considered as part of the funding decision. NCI may request detailed information in resource sharing plans prior to making a funding decision for a grant application.

When NCI approves grant applications for funding, NCI's acceptance of the Principal Investigator's (PI) submitted or negotiated sharing plan will be incorporated as a term and condition of the award.

NCI’s Epidemiology and Genomics Research Program (EGRP) has developed tips (see section on Key Required Elements below) and templates for investigators to use as a guide when preparing their data sharing plans to ensure compliance with the NIH Data Sharing Policy (2003) or the NIH Genomic Data Sharing Policy (2015). These templates are not required forms. PIs are encouraged to contact their assigned EGRP program director (PD) to obtain one of these templates. Those who do not already have an assigned PD are invited to review the EGRP staff list to identify PDs with related scientific responsibilities.

In addition, specific Funding Opportunity Announcements (FOAs) may have additional data sharing requirements that should be addressed in the data sharing plan. You are encouraged to reach out to the FOA’s scientific/research point of contact for any questions.

Whether a grant application falls under one or multiple data sharing policies or requirements, PIs will need to submit ONLY ONE data sharing plan that a) addresses the required elements of all applicable policies; and b) addresses different provisions that apply to the same data.

When multiple policies or requirements apply, the first step is to determine what data falls under which policy/requirement and if there is any overlap. See the Special Considerations section below.

For additional considerations for grants that fall under both NIH Data Sharing Policy (2003) and Genomic Data Sharing Policy (2015), please contact your EGRP program director.

Key Required Elements of Data Sharing Plans

When preparing a data sharing plan, investigators ARE REQUIRED to address the following questions. Click on the “+” below to see examples of language that addresses each key required element of a data sharing plan.

What data will be shared?

We, the investigators, will share all research data required to reproduce the specific aims of the study:

  • Data collected/generated by study. Demographic and other measures derived from public data sources with no restrictions on data sharing, clinical, behavioral, and patient reported outcomes data on all participants. All de-identified data will be made available.
  • Pre-existing data to be used by project. Genome-wide association study (GWAS) data from previous studies that have the appropriate consent for sharing
  • Aggregate/summary data. Summary data will only be made available in summary statistics
  • Information necessary to interpret the submitted data. Study methods and data collection, protocols, questionnaires, manuals of procedures and operations, consents, and data dictionaries will be made available.

We, the investigators, will not be able to share data that follow under the following restrictions:

  • Data derived from restricted access data sources and those governed by data use agreements that do not allow secondary release of these data (e.g., state registry-linked data, time series business data). Since this data may not be shared through collaborations, we will work with investigators who wish to use the data and give them guidance on how to obtain it directly from the source.

Where will the data to be shared be located?

We, the investigators, will make the data available through a locally maintained project database and when possible, through NIH-supported databases. 

We will submit all genomic data to the following NIH-designated repositories:

  • Database of Genotype or Phenotypes (dbGaP)

Who will have access to the data and how will it be located and accessed?

We, the investigators, plan to make all data that result from this study easily available to the broad research community (not limited to our collaborators), using established public repositories whenever possible.

When will the data be shared (includes submission and release timeline)?

Please note that the timeline for sharing data under the GDS policy differs from the 2003 data sharing policy. The GDS policy requires that data be submitted to an NIH-designated repository once it has been cleaned (i.e., the analytic dataset is finalized). The repository will release the data for access six months after cleaning was completed, or at the time of publication, whichever comes first. The 2003 data sharing policy requires that the data be shared at the time that the primary manuscript is accepted for publication or before the end of the project period, whichever comes first. In addition, FOAs may also have specific data sharing requirements (e.g., multiple data submission timelines).

  1. For genomic data, we will follow timelines consistent with the Supplemental Information to the NIH GDS Policy.

    Data Submission: We, the investigators, will submit the data described above immediately after the genomic data have been cleaned, i.e., once the QA/QC of the raw data is complete. Data submission is expected approximately _______.

    Data Release: We, the investigators, understand that once data submission has been initiated (i.e., when QA/QC is complete; timeline is based on when submission is expected under the policy), the data will be released either a) after six-months, or b) at the time of publication, whichever point comes first. The data will be available for secondary research access without restrictions on publication (i.e., there will be no publication embargo).
  2. For phased/waves of data collection (Data cleaning must be completed before data are released). As data are being collected continuously, we, the investigators, will clean the research dataset and make it available for sharing at the end of each period of data collection. We will then make the final combined dataset available (i.e. data collected over all years of follow-up) before the end of the current funding period, or not later than the acceptance for publication of the main findings from the final dataset.

How will researchers locate and access the data?

We, the investigators, will create a website that is discoverable through internet searches and includes information on the study, available data types, publications, and ancillary projects. Access to the data will be password protected. Researchers outside the investigative team will be need to submit a web-based application to access the data. Access will be granted to requesting researchers once the data use request is approved. A collaboration or co-authorship with the primary investigators will not be a condition for access to these data.

For data (e.g., genomic data) submitted to NIH-designated repositories, access will be governed by the NIH GDS Policy, i.e., controlled access will be obtained through dbGaP under the terms of the dbGaP Data Use Certification. Limitations on uses of the data will be based on the informed consent of participants, as described in the Institutional Certification.

Special Considerations for Grant Applications that Fall Under the 2003 Data Sharing Policy and the Genomic Data Sharing Policy

Investigators with grant applications that fall under both the 2003 Data Sharing Policy and the Genomic Data Sharing Policy should consider all data that will be used in the grant:

  • Will the genomic data be existing or newly generated?
  • How will non-genomic (e.g., epidemiologic/clinical data) data be used in analysis? (This is regardless of whether it is existing or newly collected by the grant.)

Determine which data falls under what policy, and where there is overlap.

  • Typically, all data that falls under the Genomic Data Sharing Policy also falls under the 2003 Policy.
  • Some data may fall under the 2003 Policy but not under the Genomic Data Sharing Policy.

Generally, when each policy has differing provisions that apply to the same data (e.g., timeline, required repository), the stricter/more specific policy must be followed. Only one data sharing plan will need to be submitted addressing the different provisions in the relevant sections.

This graphic is intended to help investigators with grant applications with proposed data collection that may fall under multiple NIH data sharing policies determine which data falls under what policy, and where there is overlap. For data that fall under the 2003 data sharing policy, data (existing genomic data, epidemiologic or clinical data not used in genomic analyses) are made available via a local project database or submitted to an NIH repository (or both). Data availability follows 2003 policy timelines and requirements. Data that falls under the genomic data sharing (GDS) policy must be submitted to NIH repositories/dbGaP. Additionally, data (newly generated genomic data, epidemiologic and clinical data used to interpret genomic data) can be made available via a local project database. Data availability and submission follows GDS policy timelines.

Tips for writing a resource sharing plan addressing both policies

If all data will be submitted to an NIH repository, start with developing a plan that describes compliance to the Genomic Data Sharing Policy data sharing policy.

  • If all data fall under the Genomic Data Sharing Policy, this plan is sufficient.
  • If data falling solely under the 2003 Policy are also optionally submitted to an NIH repository, clearly explain if the timeline for submission will differ from the Genomic Data Sharing Policy data.

If some of the data will be submitted to an NIH repository (required for Genomic Data Sharing Policy data) and some will be made available in a local/project database (available only for 2003 Policy data), then the plan should clearly explain the following:

  • Both an NIH repository and a local/project database will be used; 
  • What data will be available in each, and whether specific data will be available in both; and
  • The timeline for submission/availability for both repositories/databases.

For data shared in a local/project database, clearly address how the data will be made available and the terms for secondary data sharing.

Comparison of Expectations for Data Sharing under 2003 Policy and Genomic Data Sharing Policy

(Note: On January 25, 2023, the new NIH Data Management and Sharing policy will come into effect for any NIH-funded research. This new policy will replace the 2003 NIH Data Sharing Policy.)

2003 Policy Genomic Data Sharing Policy
Data type Final research data (i.e., data used to address specific aims). This includes existing data such as previously generated genomic data and previously collected epi/clinical data.
  • Genomic data: All genomic data generated by the grant; previously generated genomic data is not expected to be submitted
  • Phenotype (e.g., epi/clinical data): Data pertinent to the interpretation of the expected genomic data (i.e., data need to reproduce primary analysis regardless of how phenotype data collection was funded, or whether it is existing data)
Data repository
  • NIH-supported repository (e.g., dbGaP) or
  • Local/project database
dbGaP and other approved NIH repositories (e.g., Genomic Data Commons)
Dave availability timeline PI should make data available to secondary researchers at whichever time comes first (regardless of whether data is shared locally or at NIH):
  • Primary manuscript is accepted for publication or
  • Before the end of the project funding
Data submission: PI should submit data when the QA/QC is complete (i.e., data are clean and analytic dataset is finalized) Data release: dbGaP/NIH will release the data at whichever time comes first:
  • Six months after QA/QC is complete (i.e., when data submission begins) or
  • At the time of publication
Terms for secondary data sharing
  • Local institution requirements for sharing data outside of institution apply, e.g., Data Use Agreement or other agreements, consistent with applicable laws/regulations
  • Cannot require the requestor(s) to establish a collaboration or co-authorship as a condition of accessing the data
  • Cannot place limits on research questions asked, as long as research is consistent with informed consent
  • Cannot limit research-based overlap with PI’s analyses
  • Secondary research use is governed by Genomic Data Sharing Policy access requirements / dbGaP Data Use Certification
Consent and IRB approval requirements
  • For dbGaP sharing, Genomic Data Sharing Policy expectations apply
  • For local sharing, local IRB standards for consent and data sharing apply, as consistent with applicable laws/regulations
  • Genomic Data Sharing Policy expects informed consent for future research uses and broad sharing of data for samples collected on or after January 25, 2015
  • Consistency with consent, IRB review of risks, and other elements assured by the Institutional Certification

Additional Resources