Skip to Main Content
An official website of the United States government
Epidemiology and Genomics Research Program

Dietary Screener Questionnaire in the NHIS CCS 2015: Data Processing and Scoring Procedures

Data Processing

The NCI research team followed several steps to formulate the Dietary Screener Questionnaire (DSQ) scoring algorithms. These steps are described for researchers who may be interested in the methodologic process our team used. However, it is not necessary for researchers to follow these steps; a SAS program that integrates these steps is publicly available on NCI’s Short Dietary Assessment Instruments webpage, as are computed variables for the dietary variables in the NHIS CCS.

Our steps consisted of:

Converting Frequency Data to Daily Frequency

Reported frequency responses were converted to daily estimates using the procedures described in DSQ in the NHANES 2009-10: Data Processing and Scoring Procedures. The frequency responses for sports drinks/energy drinks and fruitades were summed, thus creating the fruitades/sports drinks variable parallel to that in the NHANES DSQ.

Identifying Extreme Exposure Values

The definition of extreme values for the NHANES 2009-10 DSQ was used in the NHIS CCS 2015 DSQ. These values and the number of identified values in the NHIS CCS 2015 DSQ are presented below. Values identified as extreme were top-coded (not excluded).

Food Group NHIS CCS 2015 (N=33,672 Adults 18+)
Maximum Acceptable Daily
Frequency Value
Number of Identified
Fruit 8 18
Fruit juice 8 8
Salad 5 91
Fried potatoes 5 15
Other potatoes 3 58
Dried beans 4 43
Other vegetables 5 106
Tomato sauce 2 26
Salsa 3 42
Pizza 2 13
Soda 8 33
Sports drinks/Fruit drinks 7 26
Cookies, cake, pie 7 4
Doughnuts 5 5
Frozen desserts 5 13
Sugar/honey in coffee/tea 10 37
Candy 8 4
Any milk (not soy) 10 16
Cheese 6 59
Any cereal 7 50
Whole grain bread 6 74
Cooked whole grains 4 32
Popcorn 3 10
Red meat 6 29
Processed meat 4 34

Classifying Cereal Data

The DSQ includes questions about cereal intake and allows respondents up to two responses on which cereals they consume. We classified each cereal reported along four dimensions: density of added sugars, whole grains, fiber, and calcium, as described below. Processing of the cereal data consisted of:

Processing of the cereal data consisted of:

  1. Identification of distinct cereals. Determine cereal categories based on nutrient density. For each nutrient, i.e. added sugars, whole grains, fiber, and calcium, we ordered listed cereals by density (nutrient/100 grams). We then divided each distribution into tertiles. Note: this density categorization was based on the cereal composition and not the absolute frequency of reported consumption.
  2. Application of this classification to all listed cereals. Thus each cereal listed is coded along the following attributes: category for added sugars; category for whole grains; category for fiber; and category for calcium.
  3. Weighting each cereal frequency according to order of report. For those respondents who reported two different cereal types, we assumed that the first cereal reported was the most frequently consumed and the second was less frequently consumed. Accordingly, we weighted the first cereal at 0.75 and the second at 0.25. For those who reported only one cereal type, no weighting was necessary.

In the 2015 NHIS CCS, a total of 600 different responses were made to the questions about what cereals were consumed. These reports consisted of non-specific responses (e.g. "whatever is available"), general responses (e.g. various healthy cold cereals), and specific cereal brands (e.g. Cap'n Crunch Sprinkled Donut). All responses were coded to specific food codes from the United States Department of Agriculture's (USDA) Food and Nutrient Database for Dietary Studies (FNDDS). Of these 600 responses, 296 were distinct entities (i.e. had different USDA food codes), 13 additional cereals since 2010. These cereals were classified by nutrient density and added into the existing categories used for the DSQ 2009-10 based on their nutrient density, as before.

Following are the classification criteria for cereals by nutrient density. Note that any given cereal may fall into different tertiles for different nutrients.

Table 1: Classification Criteria for Cereals with Regard to Added Sugars Density

Cereal Density (tsp added sugars/100 grams) No. of cereals
Lowest tertile added sugars ≤0.71 97
Second tertile added sugars 0.72 - 5.49 98
Highest tertile added sugars >5.49 101

Table 2: Classification Criteria for Cereals with Regard to Whole Grain Density

Cereal Density (ounce-equivalents of whole grains/100 grams) No. of cereals
Lowest tertile whole grain ≤0.21 99
Second tertile whole grain 0.22 - 1.40 100
Highest tertile whole grain >1.40 97

Table 3: Classification Criteria for Cereals with Regard to Fiber Density

Cereal Density (grams of fiber/100 grams) No. of cereals
Lowest tertile fiber ≤2.1 96
Second tertile fiber 2.2 - 7.3 103
Highest tertile fiber >7.3 97

Table 4: Classification Criteria for Cereals with Regard to Calcium Density

Cereal Density (milligrams of calcium/100 grams) No. of cereals
Lowest tertile calcium ≤21 97
Second tertile calcium 22-100 97
Highest tertile calcium >100 102

For documentation, a data file that includes all cereals reported in the NHIS 2015, their USDA food codes, and their ranking for the four different exposures is found in Table 5 [XLSX - 30 KB].

Scoring Procedures for NHIS CCS 2015

NCI staff developed scoring algorithms to convert screener responses to estimates of individual dietary intake for fruits and vegetables (cup equivalents), dairy (cup equivalents), added sugars (teaspoon equivalents), whole grains (ounce equivalents), fiber (g), and calcium (mg).

The scoring algorithms applied to the NHIS CCS 2015 DSQ data are those developed for the NHANES 2009-10 DSQ. Those scoring algorithms were based on directly modeling the DSQ with usual intake derived from two non-consecutive days of 24-hour recall among the NHANES respondents ages 2 through 69. For those interested only in accessing the NHIS 2015 variables, please visit the Computed Variables page.

For more information about screeners and how they are used, including information about other screeners developed by the Risk Factor Assessment Branch in NCI’s Epidemiology and Genomics Research Program, please visit Overview of Dietary Screeners.

To derive scoring algorithms from the NHANES 24HR and DSQ data, we used a regression model that included as the dependent variable, dietary intake of each exposure from the NHANES 24HR data, and as independent variables, frequency of intake of the relevant foods from the DSQ and a sex-age specific portion size estimate for each food from the NHANES 24HRs. In addition, we included variables for age groups: Kid (2-11), Teen (12-17), and Adult (18-69). For example, the equation for estimating intake of fruits and vegetables is:

E (Fruits and Vegetables) = b0+ b1Kid + b2Teen + b3NFG3P3 + b4NFG4P4 + ... + b12NFG12P12

  • Where E is the expected value;
  • b is the regression coefficient for each term;
  • Kid is 1 if a kid and otherwise is 0;
  • Teen is 1 if a teen and otherwise is 0;
  • N is the daily frequency of intake of Food Group k;
  • P is the sex-age specific portion size of Food Group k.

Dependent variables included intake of fruits and vegetables (cup equivalents), dairy (cup equivalents), added sugars (teaspoon equivalents), whole grains (ounce equivalents), fiber (g), and calcium (mg). Individual foods used as independent variables are found in: Relationships between Dietary Factors & Food Items.

In the NHIS, only adults are interviewed. The equation used in NHANES 2009-2010 remains valid for any or all of the age groups. View the equations and components.