Skip to Main Content
An official website of the United States government
Epidemiology and Genomics Research Program

Diet History Questionnaire: General Coding Rules

Alternate Version Available

You are viewing the web site for the original version of the DHQ. The latest version for the U.S. is the DHQ III; however, DHQ III does not yet have a Canadian version. The Canadian version of DHQ II is still available.

Follow these guidelines when coding information in DHQ data files analyzed by Diet*Calc.

  1. Formatted Questions instruct the respondent to select 1 choice from a list of possible answers. One character is used to code the response. Typically, this would be a digit, 0 to n-1, where n = the number of possible choices. If you have added questions that allow more than 10 responses then you must use letters rather than digits. Make this change by using "A" or "a" as the Start Code in Diet*Calc (General Formats on the Settings menu of Diet*Calc's dictionary editor). The number of valid responses and their meaning are defined in Formats. For formatted questions, data dictionaries and codebooks provided by the NCI use "M" to indicate a missing response and "E" for an error (multiple marks when only one mark is appropriate). Diet*Calc does permit other characters to be used for Missing or Error Codes. If a multi-oval question has a partial response, code the ovals as they were answered. For example, if the first 5 digits in the social security number are properly marked (e.g.,12345) but the last 4 are left blank, you should code the digits in the first 5 places and Ms in the last 4 (the field would be coded as "12345MMMM").

    The following are exceptions to the coding rules for formatted questions:

    • "Are you male or female?" - any characters can be used to code the response to this question. Data dictionaries and codebooks provided by the NCI use "0" for male and "1" for female (to be consistent with other formatted questions since male is the first response listed on the NCI forms). However, the coding scheme for this question is defined separately to allow you to switch the order of the responses or use other characters (note: M and F can only be used if M is not used as the missing code). Edit the "Sex" variable in the data dictionary to modify the coding scheme for this question. The response to this question determines the sex-specific nutrient values used in the analysis. A default value is used when this question is skipped or not asked. This default value can be specified when editing the "Sex" variable or by selecting "Sex" from the dictionary editor's Settings Menu.
    • Dates - year is coded as printed on the questionnaire. For example, the year field in Today's Date has 5 choices and uses 4 character codes, "2007", "2008", etc., rather than "0", "1", and "2". The entire field should be filled with the missing or error character if applicable. For example, if M and E are used for missing and error then "MMMM" and "EEEE" should be used as appropriate. Months are coded with a 2 character code: 01, 02, 03,...,12, MM, EE (if M and E are the missing and error codes).
  2. Filled in vs. left blank - in some cases the respondent is simply asked to mark an oval if it is appropriate. Leaving it blank is a valid answer (not a skipped question). For example, many questions on the DHQ ask the respondent to "mark as many as apply". For each oval, "1" is typically used to indicate "filled in" and "0" is used to code "left blank." However, alternative codes can be used (see General Formats on the Settings menu of Diet*Calc's dictionary editor).
  3. Other Questions are fields not analyzed by Diet*Calc. These should be defined as "Other Question" variables in the QDD. No edit checks are performed on these fields, therefore any coding scheme can be used to code them.