Skip to Main Content
An official website of the United States government
Epidemiology and Genomics Research Program

Evaluating risk prediction models for use in lung cancer screening across diverse populations around the world

Project Title

Evaluating risk prediction models for use in lung cancer screening across diverse populations around the world

Primary Contact Information

Hilary A Robbins

Scientist

hrobbins827@gmail.com

International Agency for Research on Cancer

Alternate Contact Information

Mattias Johansson

Scientist

johanssonm@iarc.fr

IARC

European Prospective Investigation into Cancer and Nutrition (EPIC)

Paul Brennan-brennan@iarc.fr

Project Details

Lung

For cohorts that are already participating in the Lung Cancer Cohort Consortium (LC3) as part of our NCI-funded U19 program (INTEGRAL, Project 2), funds for data preparation are already allocated in INTEGRAL.

For cohorts outside of INTEGRAL that would like to participate, we may be able to provide a small amount of funds To this end, we ask each cohort who is interested in participating in to communicate with IARC the extent to which financial costs represent a barrier to participation.

We are currently preparing an R03 application for this work, but the budget is quite limited.

Screening for lung cancer by low-dose computed tomography was first shown to reduce lung cancer mortality among heavy smokers in 2011 by the National Lung Screening Trial (NLST). In September 2018, the Dutch-Belgian NELSON trial also announced results, finding an even stronger benefit: the mortality reduction was 24% in men and 53% in women at 9 years of follow-up.2 With the announcement of NELSONs results, there is enthusiasm throughout Europe and beyond for implementing CT screening.

Secondary analyses of the NLST showed that screening is more effective and efficient if eligibility is based on continuous, individual lung cancer risk (i.e., from a risk prediction model) rather than categorical guidelines (e.g., US Preventive Services Task Force criteria). Accordingly, individual risk-based eligibility has now been applied in studies including the Manchester Lung Health Checks and the International Lung Screen Trial. The National Comprehensive Cancer Network now allows screening for those with greater than 1.3% 6-year risk as calculated by the PLCOm2012 model.

A major challenge facing risk-based CT screening eligibility is that the most accepted risk models were developed and validated in healthy, largely non-Hispanic white, American populations, and their portability outside of that context is largely unknown. The PLCOm2012 model, for example, was developed using data from the U.S. PLCO trial and has thus far been validated in two U.S. cohorts (Figure 1), as well as in Australia and Germany. However, whether the PLCOm2012 model provides accurate risk estimates (i.e. calibration) and the extent to which it can distinguish future lung cancer cases to non-cases (i.e. discrimination) outside of those settings … particularly in U.S. minorities, in Asia, and broadly throughout Europe … has not been studied.

To better understand how established risk models would perform in selecting subjects for CT-screening in diverse populations worldwide, we propose a joint effort to harmonize and analyze data across cohorts participating in the NCI Cohort Consortium, as well as within the Asian Cohort Consortium. The study will be conducted in parallel with a recently initiated project conducted within the Lung Cancer Cohort Consortium (LC3) that aims to identify biomarkers for use in the context CT-screening (20 cohorts participating).

To determine whether established lung cancer risk models can be validly applied in diverse populations worldwide, and to develop new tools for emerging settings in lung CT screening.

1) Evaluate the calibration and discrimination of lung cancer risk prediction models across the geographical settings represented by large prospective cohorts.
2) Evaluate the calibration and discrimination of risk models in important population subgroups, particularly by racial/ethnic group.
3) Describe the population selected by different proposed risk thresholds for screening eligibility across different geographical settings and among important subgroups.
4) Assemble and validate a model to predict lung cancer risk in Asian settings that could be used to select ever-smokers into CT screening across Asia.

We propose to gather and harmonize a limited set of baseline variables that are relevant for lung cancer risk modeling, along with follow-up information for all participants in participating cohorts. We will fit approximately 10 U.S. and European lung cancer risk prediction models on the harmonized cohort population. Subsequently, for each model, we will quantify calibration using the ratio of expected to observed lung cancer cases, and discrimination using the area under the curve (AUC) statistic. We will report these statistics overall and stratified by continent/country (Aim 1) and racial/ethnic group (Aim 2). We will thereby identify any contexts and groups in which current models perform, and therefore further model development is necessary to appropriately select smokers for screening.

For models where risk thresholds to determine screening eligibility have been proposed (e.g., 1.3% 6-year risk by PLCOm2012), we will classify cohort participants as screening eligible (i.e., above threshold) or ineligible (i.e., below threshold). Then, for each model threshold pair, we will (Aim 3) calculate the proportion of individuals eligible for screening and describe patterns of eligibility across different geographical settings and groups.

We anticipate that U.S. and European lung cancer risk models may calibrate or discriminate poorly in Asian settings. Therefore, after restricting to Asian cohorts, we will assemble and validate a new lung cancer risk model (Aim 4). We will divide the cohorts into training (2/3) and validation (1/3) sets, distributing geographical settings approximately equally. We will select variables for the model using a combination of known risk factors (age, sex, smoking) and variable selection procedures such as the least absolute shrinkage and selection operator (lasso). After building the model in the training set, we will estimate calibration and discrimination in the validation set as described above.

Analysis of many cohorts from different populations and geographical settings is the only approach that will provide a comprehensive answer to the question of lung cancer model validity for selection into CT screening. Previous efforts in single cohorts have been important, but unable to provide information outside of a limited context.

There is no specific minimum, but ideally each cohort would have about 30 lung cancer cases or more.

€ Incident lung cancer (timing, stage, topology, histology)
€ Lung cancer death and other-cause death

• Smoking status, number of years smoked, age at smoking initiation, age at smoking cessation, years since quitting, smoking pack-years, average cigarettes per day, type(s) of tobacco product, any measures of addiction (e.g. time to first cigarette, difficulty not smoking)• Indoor secondhand smoke exposure, asbestos exposure, other known exposures such as cookstoves

• Age, sex, education, race/ethnicity, body mass index, family history of lung cancer, personal history of cancer, COPD/emphysema, asthma, daily cough, health problems requiring use of special equipment (including a cane, a wheelchair, a special bed, or a special telephone), any liver condition, diabetes, weak or failing kidneys, chronic bronchitis, hypertension, history of stroke, history of heart attack, coronary heart disease, angina pectori

No