# Data Processing & Scoring Procedures Using Current Methods (Recommended)

## On this page...

## Data Processing

Our NCI research team followed several steps to formulate the DSQ scoring algorithms. These steps are described here for researchers who may be interested in the methodologic process our team used. However, it is not necessary for researchers to follow these steps; SAS programs that integrate these steps are publicly available for external researchers using the DSQ in their own studies. (See Questionnaires and SAS Programs.)

Our steps consisted of:

- converting frequency data to daily frequency,
- identifying extreme exposure values, and
- classifying cereal data.

## Scoring Procedures

### Developing scoring algorithms

We developed scoring algorithms to convert screener responses to estimates of individual dietary intake for fruits and vegetables (cup equivalents), dairy (cup equivalents), added sugars (teaspoon equivalents), whole grains (ounce equivalents), fiber (g), and calcium (mg) using the What We Eat in America 24-hour dietary recall data from the 2009-2010 NHANES. **Equations** were estimated in the NHANES 2009-2010, using SAS PROC REG.

### Estimating mean intakes

To derive scoring algorithms from the NHANES 24HR and DSQ data, we used a regression model that included as the dependent variable, dietary intake of each exposure from the NHANES 24HR data, and as independent variables, frequency of intake of the relevant foods from the DSQ and a sex-age specific portion size estimate for each food from the NHANES 24HR. In addition, we included variables for age groups: Kid (2-11), Teen (12-17), and Adult (18-69). For example, the equation for estimating intake of fruits and vegetables is:

E (Fruits and Vegetables) = b_{0}+ b_{1}Kid + b_{2}Teen + b_{3}N_{FG3}P_{3} + b_{4}N_{FG4}P_{4} + ... + b_{12}N_{FG12}P_{12}

- Where E is the expected value;
- b is the regression coefficient for each term;
- Kid is 1 if a kid and otherwise is 0;
- Teen is 1 if a teen and otherwise is 0;
- N is the daily frequency of intake of Food Group k;
- P is the sex-age specific portion size of Food Group k.

Dependent variables included intake of fruits and vegetables (cup equivalents), dairy (cup equivalents), added sugars (teaspoon equivalents), whole grains (ounce equivalents), fiber (g), and calcium (mg). Individual foods used as independent variables are found in: Relationships between Dietary Factors & Food Items.

### Estimating probability of intake above or below specific thresholds

Predicted intakes based on linear regression can be used to estimate mean usual intake in the population. Due to their smaller variance, however, they cannot be used to estimate the proportion of a population with usual intake above or below a specific threshold (prevalence).

We used logistic regression to derive scoring equations to predict the probability of usual intake above or below specific thresholds. Predicted probabilities can be averaged over a population to estimate prevalence in the population. Together, predicted intake and predicted probabilities provide more information about the distribution of usual intake in a population than either alone.

Ideally, prevalence would be estimated for thresholds based on dietary recommendations. Unfortunately, these recommended levels are often in the lower or upper tails of the distribution of actual intake in a population, and estimated prevalence for thresholds in the tails are generally less stable than those closer to the middle of a distribution. To avoid this instability, we chose threshold values that approximated the 25th and 75th percentiles of estimated usual intake.

For a fuller description of the statistical methods used, see: Thompson FE, Midthune D, Kahle L, Dodd KW. Development and Evaluation of the National Cancer Institute's Dietary Screener Questionnaire Scoring Algorithms. *J Nutr*. 2017 Jun;147(6):1226-1233.

Following are estimates for the two components needed for the scoring algorithms: portion size (P_{k}) and the regression coefficients (b_{k}).

**Estimates of P _{k}:**

The median sex and age-specific portion sizes for each food were estimated from NHANES 2009-2010 24-hour recalls. For fruit and vegetable variables, the unit was cup equivalents of fruits and vegetables including legumes (Table 1 and Table 2); for fruits, the unit was cup equivalents of fruits (Table 3 and Table 4); for vegetables including legumes, the unit was cup equivalents of vegetables (Table 5 and Table 6); for dairy, the unit was cup equivalents of dairy (Table 7 and Table 8). For added sugars, the unit was teaspoon equivalents of added sugars (Table 9 and Table 10); for whole grains, the unit was grams (Table 11 and Table 12); and for fiber and calcium, the unit was grams(Table 13 and Table 14).

For fruits and vegetables, a cup equivalent is defined by the U.S. Department of Agriculture and the U.S. Department of Health and Human Services in the Dietary Guidelines for Americans, 2015 as

- 1 cup raw or cooked
- 1 cup vegetable or fruit juice
- 2 cups leafy salad greens
- ½ cup dried fruit or vegetable

For dairy, a cup equivalent is defined by the U.S. Department of Agriculture and the U.S. Department of Health and Human Services in the Dietary Guidelines for Americans, 2015 as:

- 1 cup milk, yogurt, or fortified soymilk
- 1 ½ ounces of natural cheese such as cheddar
- 2 ounces of processed cheese

The Dietary Guidelines for Americans, 2015 states that the exposure added sugars does not include naturally occurring sugars such as those found in fruit or milk. They do include these examples when listed as an ingredient: brown sugar, corn sweetener, corn syrup, dextrose, fructose, glucose, high-fructose corn syrup, honey, invert sugar, lactose, malt syrup, maltose, molasses, raw sugar, sucrose, trehalose, and turbinado sugar.

The exposure to sugar-sweetened beverages is defined in the Dietary Guidelines for Americans, 2015 as: "Liquids that are sweetened with various forms of sugars that add calories. These beverages include, but are not limited to, soda, fruitades and fruit drinks, and sports and energy drinks." For our analyses, we defined this exposure as including the above types of drinks plus coffees and teas when sweetened with sugar.

For whole grains, an ounce equivalent is defined by the U.S. Department of Agriculture and the U.S. Department of Health and Human Services in the Dietary Guidelines for Americans, 2010 as:

- ½ cup cooked whole grain rice, pasta, or cereal
- 1 ounce dry whole grain pasta or rice
- 1 medium (1 ounce) slice whole grain bread
- 1 ounce ready-to-eat whole grain cereal (about 1 cup of flaked cereals)

**Estimates of regression coefficients:**

Table 20. Estimated Regression Coefficients for Foods Predicting Cup Equivalents of Dairy by Sex

Table 24. Estimated Regression Coefficients for Foods as Predictors of Fiber (g): Males

Table 25. Estimated Regression Coefficients for Foods as Predictors of Fiber (g): Females

Table 26. Estimated Regression Coefficients for Foods as Predictors of Calcium (mgs): Males

Table 27. Estimated Regression Coefficients for Foods as Predictors of Calcium (mgs): Females