How Should New Technologies Be Integrated into Cancer Epidemiology?

The information on this page is archived and provided for reference purposes only.

Trends in 21st Century Epidemiology: From Scientific Discoveries to Population Health Impact

The Epidemiology and Genomics Research Program (EGRP) has initiated a strategic planning effort to develop scientific priorities for cancer epidemiology research in the next decade in the midst of a period of great scientific opportunity and resource constraints. EGRP would like to engage the research community and other stakeholders in a planning effort that will include a workshop in December 2012 to help shape new foci for cancer epidemiology research.

EGRP Invites Your Feedback

To facilitate this process, we invite the research community to join in an ongoing Web-based conversation to develop priorities and the next generation of high-impact studies.

This week, we address the issue of new technologies in use for cancer epidemiology research. Tools of molecular biology, genomics, and other high throughput “omic” technologies are increasingly integrated into epidemiologic investigations along with advances in bioinformatics and technology. With these opportunities, however, come the major challenge of dealing with the data deluge and uncovering true causal relationships from the millions and millions of observations that can create "background noise."

Word Cloud Illustrating the Meaning of Technologies

We would like to get your feedback on the following fundamental questions:

  • Which technologies do you feel are ready for "prime time" in epidemiologic research and for what purpose?
  • What criteria would you use to determine when emerging technologies should be integrated into epidemiologic research?

Please use the comment section below to share your perspectives.

We encourage you to be as specific as possible. You can use or be inspired by the NCI Provocative QuestionsExternal Web Site Policy exercise. Comments provided through our blog will be used to shape the workshop discussion in December. Ultimately, we will all benefit from a vibrant dialogue in helping shape the future of cancer epidemiology in the next decade.

Comments are also still welcome in response to last month's question:

EGRP’s Workshop Science Advisory Group


  • Yun-Ling Zheng - July 25, 2012 at 10:58 AM (UTC -4)

    Compelling evidence indicated that telomeres play critical role in cancer and other aging-associated diseases. Methods that can assess telomere health/function, not just telomere length, are needed to understand the relationship between telomere health and aging-associated diseases.

  • Elinor Schoenfeld - July 25, 2012 at 11:09 AM (UTC -4)

    The epidemiology community in partnership with the biomedical informatics community should work to evaluate, improve and where needed develop comprehensive systems for standard paperless data collection systems. Most epidemiologic studies are very labor intensive in terms of data collection and data management. Developing standardized systems for electronic data capture will ease the burden on researchers and participants while potentially deceasing study costs and turn around time to data analysis.

  • Bin Zheng - August 25, 2012 at 3:29 PM (UTC -4)

    The efficacy of current uniform population-based cancer screening paradigm is very controversial. To help establish optimal personalized cancer prevention and screening paradigms, more research efforts should be focused on developing near-term risk prediction models. Unlike a fixed lifetime risk, the near-term risk can change in different life period of an individual person. Such models should have substantially higher discriminatory power that can applied to identify the people (small fraction in general population) with high risk of developing cancer in near term (e.g., <5 years). Thus, more aggressive screening methods can be recommended and applied, while for the majority of low risk people, the screening can be conducted in a longer interval until their near-term risk status changes to reduce false-positive detections and health care cost.

  • Georgia Tourassi - October 15, 2012 at 10:20 AM (UTC -4)

    Accumulating evidence supports that information technology can be a powerful addition to epidemiological research facilitating collection and mining of unprecedented volumes of multi-source, multi-modality data from not only -omics technologies but also from electronic medical records, patient controlled health records, as well as the World Wide Web including social media [1]. There is increasing research activity on the development of informatics tools to harness in a time-efficient way such high-volume, disparate data sources [2]. We can only expect that this trend will escalate as -omics technologies continue to advance, electronic health records continue to expand, and the digital divide among population groups due to demographic and socio-economic factors continues to decrease. Although the scientific community is on-board with the need for a paradigm shift, the field is still in its infancy. In my opinion, there is one critical question we need to consider first before we embark on tackling the numerous and significant technological challenges that this paradigm entails. Is the paradigm shift necessary? The underlying premise of such research is that more data is better. But when and how do we know that more data leads indeed to reliable information, and more important to meaningful knowledge? Answering this question is an extremely challenging task in any data-driven exploration, particularly when it comes to cancer epidemiology research due to the long disease latency. The criteria of meaningful knowledge must be always set based on the state-of-the-art health management practices and outcomes within the context of the specific application domain. However, the quantitative measures of reliable information will be the ones driving initially the decision whether a proposed technology is a step towards the right direction. To set such measures it is important to separate failure due to poor quality of the data sources from failure due to inadequacy of the informatics tools and the underlying knowledge discovery approach. Different performance measures are needed to capture these very different sources of failure. Complex, incomplete, inaccurate, and biased data sources should be handled carefully in data management and data mining [3]. Informatics experts and scientists with expertise in large-scale dynamic data modeling outside the biomedical field could be instrumental in offering novel solutions. However, carefully curated data sets are critical to properly validate the proposed solutions. To assess the knowledge discovery process as a whole, separate benchmarks are needed to measure the reliability and novelty of the discovered knowledge. Ultimately, the threshold of acceptance and readiness for "prime time" will depend on the implications of the new-found knowledge within the allowable margin of error for the specific application domain. We are still in the beginning stages exploring what is currently feasible and envisioning what could be possible in the future if we develop and properly apply advanced information technology as a time-efficient, cost-effective knowledge discovery technology to support cancer epidemiology research. As more studies emerge, we need to openly communicate and regularly summarize mistakes made, lessons learned, and solutions found to derive a knowledge discovery framework for epidemiological application based on appropriate performance metrics and careful elucidation of special conditions that may lead to failure.


    1. G. Eysenbach, "Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet," J Med Internet Res 2009; 11(1):e11.
    2. J.F. Pearson, C.A. Brownstein, J.S. Brownstein, "Potential for electronic health records and online social networking to redefine medical research," Clin Chem 2011; 57(2):196-204.
    3. G. Hripcsak, D.J. Albers, "Next-generation phenotyping of electronic health records", J Am Med Inform Assoc 2012 [published online first, September 6, 2012].
  • Thomas A. Sellers - October 16, 2012 at 12:00 PM (UTC -4)

    Use of a Patient Portal as a venue for epidemiologic research on patient cohorts

    Thomas A. Sellers, Jennifer Camps, Paul Jacobsen, Dana E. Rollison

    Recruitment and retention of study participants in longitudinal cohort studies is critical to the validity of study results. Novel, cost-effective means to recruit and obtain self-reported risk factors and outcomes are critical for successful long-term follow-up of patient cohorts. The Moffitt Total Cancer Care protocol seeks to enroll every newly-diagnosed cancer patient in a survivorship cohort that includes self-reported data at baseline, tissues for molecular characterization, integration of medical record information, follow-up for future events and potential recontact for additional studies.

    Critical to the success of the protocol has been the creation of the MyMoffitt Patient Portal, a free, secure, intuitive, web-based medium that supports research and provides value and inducements to the patients for frequent contact. A major goal was to enhance care and improve patient experience by providing easy access to relevant information about their diagnosis and care in a manner that was accurate and medically sound. Features of the portal therefore include the ability to complete patient questionnaires, view upcoming appointments and associated instructions, request prescription renewals, pay bills, view and request updates to personal information such as mailing address, insurance, and e-mail address. Selected laboratory results can be viewed four days after their completion along with clinical notes, office visit notes, procedure notes and a discharge summary. Full copies of their medical record are easily requested and assistance is provided to find relevant support groups based on proximity to the patient's current location. Importantly, the portal enables the patients to download copies of the TCC consent and search for clinical trials for which they may be eligible. A video consent deployed through the portal has been developed and is under IRB review.

    The second major goal for the portal was to empower and educate patients with accurate and medically sound information that was most relevant to their specific health situation. This required intelligent search capabilities and tailoring the information returned. Therefore, we entered into a partnership between Moffitt's Information Technology department and Aeturnum, Inc. to search across multiple sources of information both inside and outside the organization. Patients can explore different treatment options and learn about lifestyle changes that may help their condition. Moreover, there is access to a searchable medical dictionary and an online directory of Moffitt physicians.

    Launched in 2009, more than 29,000 users have created accounts and monthly logins are now 12,000 and rising. Of new patients coming to Moffitt, 84% create a login account. Approximately 24,000 patients have completed their clinical intake form via the Portal, providing information on demographics, personal medical and family history, physical activity, diet, smoking, alcohol consumption, and quality of life. These data are integrated with other source systems (e.g., medical record, cancer registry, specimen biobank) in an enterprise wide data warehouse into a single record per patient. For those who consent, follow-up questionnaires to capture self-reported outcomes, including quality of life, will be pushed to the patients at established intervals using the portal. The feasibility, cost and flexibility make this an attractive technology for epidemiologic research.

  • Michael Snyder - November 14, 2012 at 2:00 PM (UTC -4)

    Which technologies do you feel are ready for "prime time" in epidemiologic research and for what purpose?

    Our health is a product of our genome and our exposome. Currently, we can determine our genome sequencing, and our exposome is less accessible. However, in principle, the impact of our exposome can be ascertained using comprehensive molecular profiling. I believe that detailed longitudinal profiling of subjects for as many molecular components as possible will provide a detailed picture of disease onset, progression and treatment response. Suitable technologies for research include genome sequencing, DNA methylation, transcriptomics, proteomics, metabolomics, immune monitoring, cell free nucleic acid, and microbiome analyses. Ideally we need better measurements for environmental exposures. When applied to samples collected at reasonable frequency, together their analysis will form a more comprehensive view of health and disease states and the physiological patterns that change during the acquisition of these states.

    What criteria would you use to determine when emerging technologies should be integrated into epidemiologic research?

    We don't know what we don't know. Therefore, for research it is difficult to know which assays will be most fruitful. Scientists tend to follow the assays that lie closet to the disease symptoms and outcome (i.e. metabolites for diabetes) which is fairly biased. By following patterns of results from the different assays, one hopes to be able to see which ones will provide the most values for monitoring disease onset, progression, severity and treatment efficacy. Undoubtedly the biggest factor will be cost.

  • Geoffrey S. Ginsburg - November 16, 2012 at 10:30 AM (UTC -4)

    Technology-driven epidemiology: a paradigm shift

    The last decade has witnessed staggering growth of high dimensional data generation and information capture about health and disease. At the same time the Internet has increased the connectivity of populations and digital medicine applications have increasingly captured relevant phenotypic information from individuals and populations. Although we now measure genetic factors in large human populations, quantitative assessment of human environmental exposures and their impact on disease pathogenesis is lagging. The lack of high-throughput methods of exposure assessment has motivated epidemiologists to rely upon self-reported data to categorize exposures from environmental, endogenous, and dietary sources. Wild (2005) defined the "exposome," representing all environmental exposures from conception onwards (including exposures from diet, lifestyle, and endogenous sources) as a quantity of critical interest to disease etiology. If we expect to succeed in identifying the combined effects of genetic and environmental factors on chronic diseases, we must develop 21st-century tools to characterize exposure levels in human populations.

    Measures of the exposures and their relationship to the etiology and progression of human diseases are now being achieved. The "personal dynamic genome" measured the full complement of molecular responses to the environment in an individual (Chen, 2012) who, over time, traversed from physiologic states of health to states of disease and back again. In other studies the repertoire of exposure response relationships has been systematically quantified and modeled using the human experimental model and measures of the clinical/phenotypic response to specific environmental challenges (physiologic, pharmacologic, toxigenic, pathogenic) while at the same time capturing time series genomic, transcriptomic, proteomic, and microbiome data (Zaas 2009; Huang 2011; David, 2012). Beyond individuals, innovative population registries now capture biological specimens linked to self-reported clinical and demographic information on health and well-being, mental health, socioeconomic status, environment, and lifestyle. Geospatial mapping of a participant's residence allows for environmental information (e.g., proximity to recreational spaces, health care services, or pollutants) to be associated with their incidence of disease (Tenenbaum, 2012). Expansion of these studies through collaborative partnerships regionally, nationally, and globally, enables meta- analyses, cross-population studies, and validation of findings in geographically, environmentally, ethnically, and racially diverse populations and the opportunity for combined molecular epidemiology studies involving diverse genetic pools and health care delivery systems.

    Finally, health systems are now being transformed into engines of research contributing to our understanding of the biology and epidemiology of disease (Ginsburg, 2011). The "rapid-learning health care" model integrates data routinely generated through patient care and clinical research and feeds these data into a growing set of coordinated databases. Using information and data capture technologies as well as clinical decision support tools the system "learns" by routinely and iteratively (i) capturing data systematically; (ii) analyzing collected data both retrospectively and prospectively; (iii) implementing findings into subsequent clinical care; (iv) evaluating resulting clinical outcomes; and (v) generating additional hypotheses for future investigation (Friedman, 2010). Thus, information purposefully obtained in real time in the course of routine clinical practice drives the process of discovery and ensures continuous innovation, quality improvement, and safety.

    The convergence of these data streams from individuals, populations, and health care systems provides an unprecedented opportunity to redefine epidemiologic approaches to health and disease. It is, in fact, creating a new taxonomy of disease and enabling our ability to develop and practice stratified and precision medicine.

    Wild CP (2005). Cancer Epidemiol Biomarkers Prev. Aug;14(8):1847-50.
    Chen R et al. (2012). Cell. Mar 16;148(6):1293-307.
    Zaas AK et al. (2009). Cell Host Microbe. Sep 17;6(3):207-17.
    Huang et al (2011). PLoS Genet. 2011 Aug;7(8):e1002234.
    David L and Alm E (2012). Personal communication.
    Bhattacharya S et al (2012). Am J Transl Res. 4(4):458-70.
    Ginsburg GS et al (2011). Sci Transl Med. Sep 21;3(101):101cm27.
    Friedman CP et al (2010). Sci Transl Med. Nov 10;2(57):57cm29.

  • Zdenko Herceg - December 12, 2012 at 1:03 PM (UTC -4)

    Although there is a consensus that exposures to environmental factors accounts for over two thirds of cancers, and therefore the majority of cancers is potentially avoidable, there is a paucity of evidence regarding the critical molecular events that occur in early stages of cancer development or in precursor lesions as well as environmental factors and endogenous cues that trigger these changes. In addition, the challenge posed by numerous sequencing efforts, is to identify the deregulated genes/pathways and changes in the genome and epigenome that precede and promote tumour development, and to differentiate functionally important ("drivers") from non-functional "passenger" events. The spectacular advances in epigenomics and the emergence of powerful technologies that allow the analysis of the genome and epigenome with unprecedented resolution in both high throughput and genome-wide settings have dramatically accelerated investigations in the area of cancer research and molecular epidemiology.

    These advances have opened the exciting possibility of simultaneously identifying multiple changes affecting the genome and epigenome of normal, precursor and cancer cells as well as their link to the environment. Therefore it will now be possible to improve our understanding of the mechanisms underlying carcinogenesis and define which genetic and epigenetic alterations, or combinations thereof, can be interpreted as reliable biomarkers of exposures. By identifying changes in the genome and epigenome (genetic and epigenetic signatures) associated with tumour cells and surrogate tissues associated with specific known and suspected environmental risk factors, it may be possible to identify individuals who are at a particularly high risk, and potentially design an efficient strategy for cancer prevention.

Return to Top

The information on this page is archived and provided for reference purposes only.