Skip to Main Content
An official website of the United States government
Epidemiology and Genomics Research Program

Diet History Questionnaire II and Canadian Diet History Questionnaire II: Coding Guidelines

Alternate Version Available

You are viewing the web site for DHQ II. The latest version for the U.S. is the DHQ III; however, DHQ III does not yet have a Canadian version. The Canadian version of DHQ II is still available.

A questionnaire data file is an ASCII text file containing data from completed Diet History Questionnaires. If using paper forms, this file can be created by a scanner or a data entry system. If using DHQ*Web, the questionnaire data file is created automatically. Each line in a questionnaire data file contains the coded responses for one questionnaire. The format of each line of data is specified in the questionnaire's codebook (a readable format) and in the questionnaire data dictionary (used by Diet*Calc). Codebooks and data dictionaries for NCI versions of the questionnaire designed for specific scanners or for data entry are provided in the DHQ Paper-based Forms.

Questionnaire data analyzed by Diet*Calc must be coded according to the guidelines described here. These guidelines apply to all NCI versions of the DHQ and to modified versions of the DHQ that you create. Follow these guidelines when developing new questions for the instrument and when configuring your scanner or data entry system.

The codebooks for the NCI versions of the questionnaire were created in wordprocessing software. The same information is stored in the Questionnaire Data Dictionary (QDD) used by Diet*Calc. The information in the QDD can be printed from Diet*Calc. However, the current version of the software provides a limited amount of detail in this report. It was designed to provide a report of potential problems in the QDD and does not provide column-by-column coding information that can be used by a programmer to configure a scanner or data entry system. If you modified the questionnaire then you should consider creating a codebook specifically for your instrument. The codebooks for the NCI versions of the questionnaire are provided in generic file formats so that they can be edited in any wordprocessing package.

The following sections summarize the coding guidelines that must be used for DHQ data:

General Coding Rules

Follow these guidelines when coding information in DHQ data files analyzed by Diet*Calc.

  1. Formatted Questions instruct the respondent to select 1 choice from a list of possible answers. One character is used to code the response. Typically, this would be a digit, 0 to n-1, where n = the number of possible choices. If you have added questions that allow more than 10 responses then you must use letters rather than digits. Make this change by using "A" or "a" as the Start Code in Diet*Calc (General Formats on the Settings menu of Diet*Calc's dictionary editor). The number of valid responses and their meaning are defined in Formats. For formatted questions, data dictionaries and codebooks provided by the NCI use "M" to indicate a missing response and "E" for an error (multiple marks when only one mark is appropriate). Diet*Calc does permit other characters to be used for Missing or Error Codes. If a multi-oval question has a partial response, code the ovals as they were answered. For example, if the first 5 digits in the social security number are properly marked (e.g.,12345) but the last 4 are left blank, you should code the digits in the first 5 places and Ms in the last 4 (the field would be coded as "12345MMMM"). The following are exceptions to the coding rules for formatted questions:
    • "Are you male or female?" - any characters can be used to code the response to this question. Data dictionaries and codebooks provided by the NCI use "0" for male and "1" for female (to be consistent with other formatted questions since male is the first response listed on the NCI forms). However, the coding scheme for this question is defined separately to allow you to switch the order of the responses or use other characters (note: M and F can only be used if M is not used as the missing code). Edit the "Sex" variable in the data dictionary to modify the coding scheme for this question. The response to this question determines the sex-specific nutrient values used in the analysis. A default value is used when this question is skipped or not asked. This default value can be specified when editing the "Sex" variable or by selecting "Sex" from the dictionary editor's Settings Menu.
    • Dates - year is coded as printed on the questionnaire. For example, the year field in Today's Date has 5 choices and uses 4 character codes, "2007", "2008", etc., rather than "0", "1", and "2". The entire field should be filled with the missing or error character if applicable. For example, if M and E are used for missing and error then "MMMM" and "EEEE" should be used as appropriate. Months are coded with a 2 character code: 01, 02, 03,...,12, MM, EE (if M and E are the missing and error codes).
  2. Filled in vs. left blank - in some cases the respondent is simply asked to mark an oval if it is appropriate. Leaving it blank is a valid answer (not a skipped question). For example, many questions on the DHQ ask the respondent to "mark as many as apply". For each oval, "1" is typically used to indicate "filled in" and "0" is used to code "left blank." However, alternative codes can be used (see General Formats on the Settings menu of Diet*Calc's dictionary editor).
  3. Other Questions are fields not analyzed by Diet*Calc. These should be defined as "Other Question" variables in the QDD. No edit checks are performed on these fields, therefore any coding scheme can be used to code them.

Format Definitions

Many fields in the Diet History Questionnaire (DHQ) use the same coding scheme or format. A format defines the number of choices for a question and the meaning of each choice. For example, the response choices for "How often did you eat..." questions are coded using one of several formats. The formats are set in the Questionnaire Data Dictionary (QDD). You may modify the existing formats using the dictionary editor in Diet*Calc.

  • Frequency formats are used for questions that ask "How often did you eat/drink...."
  • Size formats are used to code serving size questions, i.e., "When you ate (food), how much did you usually eat?"
  • "Filled in" and "left blank" codes are used when the respondent is asked to mark an oval if appropriate (not filling in the oval is a valid response and is not a skip for this type of question). For example, some DHQ questions provide a list of choices and instruct the respondent to "mark as many as apply."
  • Proportion formats are used to code questions that ask the respondent to specify how often (in fractions) the food was of a specific type. For example, the question "How often were your fruit drinks diet or sugar-free drinks?" has valid responses of "almost never or never", "about ¼ of the time", "about ½ of the time", "about ¾ of the time", and "almost always or always."
  • Duration format is used in supplement questions to indicate length of time, for example, "For how many years have you taken multi-vitamins?"

Missing and Error Codes

A missing code indicates that the respondent skipped a question when a response was required. An error character indicates that the respondent marked two or more responses to a question where only one answer was appropriate. The following guidelines must be used for coding fields as missing or error.

  1. Letters or symbols (such as '*', '#', or '!' ) must be used as the missing and error characters. If letters are used to code formatted responses, symbols must be used for missing and error. Missing and error characters may never be numeric.
  2. When multiple characters are used to code a single oval, set all characters in the field to the missing character or to the error character when appropriate. For example, the year field in Today's Date uses one oval but is coded with four characters ("2002", "2003", etc.). If all choices are skipped, the entire field should be filled with the missing character ("....", for example).

You may not use the same character to represent both the missing and the error characters. In the DHQ II, '.' and '*' are the missing and error characters, respectively.