Data Processing & Scoring Procedures Using Current Methods (Recommended)


Data Processing

Our NCI research team followed several steps to formulate the DSQ scoring algorithms. These steps are described here for researchers who may be interested in the methodologic process our team used. However, it is not necessary for researchers to follow these steps; SAS programs that integrate these steps are publicly available for external researchers using the DSQ in their own studies. (See Questionnaires and SAS Programs.)

Our steps consisted of:

  1. converting frequency data to daily frequency,
  2. identifying extreme exposure values, and
  3. classifying cereal data.

Return to Top

Scoring Procedures

Developing scoring algorithms

We developed scoring algorithms to convert screener responses to estimates of individual dietary intake for fruits and vegetables (cup equivalents), dairy (cup equivalents), added sugars (teaspoon equivalents), whole grains (ounce equivalents), fiber (g), and calcium (mg) using the What We Eat in America 24-hour dietary recall data from the 2009-2010 NHANES. Equations were estimated in the NHANES 2009-2010, using SAS PROC REG.

Estimating mean intakes

To derive scoring algorithms from the NHANES 24HR and DSQ data, we used a regression model that included as the dependent variable, dietary intake of each exposure from the NHANES 24HR data, and as independent variables, frequency of intake of the relevant foods from the DSQ and a sex-age specific portion size estimate for each food from the NHANES 24HR. In addition, we included variables for age groups: Kid (2-11), Teen (12-17), and Adult (18-69). For example, the equation for estimating intake of fruits and vegetables is:

E (Fruits and Vegetables) = b0+ b1Kid + b2Teen + b3NFG3P3 + b4NFG4P4 + ... + b12NFG12P12

  • Where E is the expected value;
  • b is the regression coefficient for each term;
  • Kid is 1 if a kid and otherwise is 0;
  • Teen is 1 if a teen and otherwise is 0;
  • N is the daily frequency of intake of Food Group k;
  • P is the sex-age specific portion size of Food Group k.

Dependent variables included intake of fruits and vegetables (cup equivalents), dairy (cup equivalents), added sugars (teaspoon equivalents), whole grains (ounce equivalents), fiber (g), and calcium (mg). Individual foods used as independent variables are found in: Relationships between Dietary Factors & Food Items.

Estimating probability of intake above or below specific thresholds

Predicted intakes based on linear regression can be used to estimate mean usual intake in the population. Due to their smaller variance, however, they cannot be used to estimate the proportion of a population with usual intake above or below a specific threshold (prevalence).

We used logistic regression to derive scoring equations to predict the probability of usual intake above or below specific thresholds. Predicted probabilities can be averaged over a population to estimate prevalence in the population. Together, predicted intake and predicted probabilities provide more information about the distribution of usual intake in a population than either alone.

Ideally, prevalence would be estimated for thresholds based on dietary recommendations. Unfortunately, these recommended levels are often in the lower or upper tails of the distribution of actual intake in a population, and estimated prevalence for thresholds in the tails are generally less stable than those closer to the middle of a distribution. To avoid this instability, we chose threshold values that approximated the 25th and 75th percentiles of estimated usual intake.

For a fuller description of the statistical methods used, see: Thompson FE et al, Development and Evaluation of the 2009-2010 NHANES Dietary Screener Questionnaire, manuscript in preparation, available by contacting Fran Thompson, Ph.D.

Following are estimates for the two components needed for the scoring algorithms: portion size (Pk) and the regression coefficients (bk).

Return to Top