# Usual Dietary Intakes: SAS Macros for Analysis of a Single Dietary Component

## On this page...

## SAS Macros Version 2.1

Three macros are available to support modeling of a single dietary component (either consumed nearly every day or episodically):

- MIXTRAN Macro: fits a model to obtain parameter estimates and allows for the evaluation of covariate effects.
- DISTRIB Macro: uses parameter estimates from MIXTRAN and a Monte Carlo method to estimate the distribution of usual intake for a food or nutrient.
- INDIVINT Macro*: uses parameter estimates from MIXTRAN or other appropriate model to predict individual food or nutrient intake for use in a disease model.

** Note that the INDIVINT macro requires SAS IML. The SAS Institute has reported an error that can occur when running SAS IML in SAS 9.2 TS1MO - the error relates to variables with missing values. Read the SAS Problem Note and get a link to the Hot Fix. The problem is fixed in SAS 9.2 TS2M2, and this error is not encountered in SAS 9.1.3.*

The MIXTRAN macro alone is sufficient for testing covariate effects on intakes of a dietary component. The DISTRIB and INDIVINT macros are generally used in conjunction with the MIXTRAN macro.

Documentation for all three macros is provided in the User's Guide for Analysis of Usual Intakes: For use with versions 2.1 of the Mixtran, Distrib and Indivint SAS macros [PDF - 12.1 MB]. Applications of these macros are described in Tooze et al, 2006 and Kipnis et al, 2009. The current version of the macros is 2.1. Older versions (1.1) are available here, for analysts who wish to reproduce past results.

**Release notes for the current macros (describing changes and enhancements since version 1.1.)**

### MIXTRAN and DISTRIB Release Notes

**MIXTRAN**

- The data are now sorted by subject and sequence number (the parameters "subject" and "repeat"). This is to ensure that the SAS NLMIXED procedure detects repeated records belonging to the same subject.
- For an episodically consumed food (i.e. a correlated or an uncorrelated model) a frequency is provided grouping subjects according to the number of observed 24-hour recalls by the number of recalls with consumption greater than 0 for the food being analyzed. If a replicate variable is in use the percentage shown will be weighted. The counts are not weighted.
- In the amount model, there must be at least two subjects with at least two positive recalls. In the unlikely event that this criterion is not met the execution of the MIXTRAN macro is stopped.
- In the uncorrelated and correlated models there must be at least one subject with two positive recalls, or else MIXTRAN will be stopped. Otherwise if there are 10 or fewer subjects with two positive recalls in any of the models, a warning is printed in the log, stating that results might be unstable.
- For amount models a consumption value of 0 is changed to half the minimum amount consumed.
- In the parameter "weekend", the user's variable name is no longer changed to "WEEKEND". The parameter name is unchanged, but any two part recall variable can be used. This change is reflected in the DISTRIB macro (see below).
- In the reports the values for "Std Err" and "Prob>|t|" are not reported if a weight variable (i.e. the parameter "replicate_var") is in use.
- The values of the "type" variable in the parameter names have changed. The value 1.bi is now 1.pr and 2.in is now 2.am.
- If the user wants to analyze subgroups in the DISTRIB macro, it is no longer necessary to invoke the SUBGROUP parameter in MIXTRAN as this parameter is now available in the DISTRIB macro itself. However, for compatibility this parameter is still available in MIXTRAN.
- The following minor issues have been corrected: stray comments such as the listing of "eta sans u2" to the log, warnings about attempt to delete files already deleted, and other small matters. DISTRIB
- A 9-point approximation method has replaced the Taylor linearization method in the back-transformation of the amount consumed. For most cases, the new method will produce estimates very similar to those produced using the older Taylor linearization method, but for cases where the Box-Cox parameter is small. (e.g., for more extreme transformations) the 9-point approximation method is considered more accurate (Tooze et al., 2010).
- Code for the Monte Carlo simulations has been streamlined.
- If the "weekend" parameter is used, the default weights for 1st day of recall and 2nd day of recall are 4/7 and 3/7 respectively; however, these weights can now be replaced with user defined values. The user can specify the proportion for the second day of recall via a new parameter, "wkend_prop." Either a fraction or decimal number is acceptable. The proportion for the 1st day of recall will then be calculated as (1-"wkend_prop") within the macro. Additional details are provided in the documentation for the "weekend" and "wkend_prop" parameters.
- If the "cutpoints" parameter is used, there is no longer a requirement that at least two cut points be specified.
- The DISTRIB Macro has been restructured. It now contains two sub-macros, one of which calculates the estimated intake values, and the other the percentiles and other descriptive statistics. This means it is possible to create the estimated intake values (the mc_sim file) without necessarily producing percentiles and other descriptive variables (the descript file). The percentiles can be created in subsequent calls to the DISTRIB macro, which will use the estimated intake values created in a previous execution of DISTRIB. This allows percentiles to be calculated for subgroups, while using the same basis of estimated intake. The default remains the calculation of both the mc_sim file and the descript file in the same call to DISTRIB. A new parameter named
**call_type**has been added to DISTRIB. The options are:**Full**- Both the mc_sim file and the descript file are produced;**MC**- Only the mc_sim (estimated intakes) file is produced;**PC**- Only the descript file (percentiles etc.) is produced.

- The ability to test estimated intake against a recommend amount of intake has been added. A flag variable is set to 1 if the comparison is true. The proportion of the population or subgroups meeting the requirement will be saved in the descript data set output by the DISTRIB macro. The new parameters introduced for the recommended amount comparison are:
**Recamt_co**- The comparison operator. The values permitted are:**LT**- (less than)**LE**- (less than or equal to)**GE**- (greater than or equal to)**GT**- (greater than)**R**- (a range, inclusive of the minimum and maximum values)

**Recamt**- The name of the variable containing the value for comparison, or the lower end of a range.**Recamnt_hi**- The name of the variable containing the maximum value for the range. This parameter is only used if the value of the parameter**recamt_co**is R.

- The parameter
**subgroupd**a has been renamed**add_da**. This is to accommodate the fact that data containing subgroup information can now include additional information for the recommended amount. - The parameter
**subgroup**has been added. It is used to obtain percentiles and other descriptive information for the levels of the subgroup variable. It is no longer necessary to include the subgroup variable in the call to MIXTRAN. (Please note that the indicators of the subgroup should have been included in the covariates for the MIXTRAN model.) - Some minor clean up of titles and so forth have been completed.

**Reference:**

Tooze JA, Kipnis V, Buckman DW, Carroll RJ, Freedman LS, Guenther PM, Krebs-Smith SM, Subar AF and Dodd KW. A mixed-effects model approach for estimating the distribution of usual intake of nutrients: The NCI method. *Statist Med.* 2010 Nov 30; 29:2857–68

### INDIVINT Release Notes

- The back-transformation calculation uses a 9-point approximation instead of the Taylor series approach used in version 1.1.
- When an amount-only model is specified, an error message is issued in the log file if the 24-hour recalls include any zero values. In this situation, the *.lst file also includes a frequency count of the zero values with an error message as in version 1.1.

To help analysts get started, NCI has developed sample programs and analytic datasets. These programs employ the various macros in conjunction with preliminary analytic datasets containing data from the National Health and Nutrition Examination Survey (NHANES). The first three examples use a dataset that is based on NHANES 2001-04 data; it includes the addition of balanced repeated replication (BRR) weights, imputed values for some of the MyPyramid equivalents data, some variable names that differ from the names used in the original NHANES file, and some derived variables. The last example uses a dataset based on NHANES 2003-04 data, with a small set of variables used to illustrate the method.

- Sample programs and output:
- Example 1 [ZIP - 14 KB] illustrates model fitting and estimation of the distribution of usual intake for a dietary component that is consumed nearly daily.
- Example 2 [ZIP - 18 KB] illustrates model fitting and estimation of the distribution of usual intake for a dietary component that is consumed episodically.
- Example 3 [ZIP - 23 KB] illustrates time-saving programming techniques and macro features for a more complicated situation.
- Example 4 [ZIP - 24 KB] illustrates model fitting to evaluate the relationship between a single dietary component and a health parameter.

- Analytic datasets for examples
- Details of dataset contents for examples 1-3 [TXT - 39 KB]
- Details of dataset contents for example 4 [PDF - 17 KB]

## SAS Macros Version 1.1

Three macros are available to support modeling of a single dietary component (either consumed nearly every day or episodically):

- MIXTRAN Macro: fits a model to obtain parameter estimates and allows for the evaluation of covariate effects.
- DISTRIB Macro: uses parameter estimates from MIXTRAN and a Monte Carlo method to estimate the distribution of usual intake for a food or nutrient.
- INDIVINT Macro*: uses parameter estimates from MIXTRAN or other appropriate model to predict individual food or nutrient intake for use in a disease model.

** Note that the INDIVINT macro requires SAS IML. The SAS Institute has reported an error that can occur when running SAS IML in SAS 9.2 TS1MO - the error relates to variables with missing values. Read the SAS Problem Note and get a link to the Hot Fix. The problem is fixed in SAS 9.2 TS2M2, and this error is not encountered in SAS 9.1.3.*

The MIXTRAN macro alone is sufficient for testing covariate effects on intakes of a dietary component. The DISTRIB and INDIVINT macros are generally used in conjunction with the MIXTRAN macro.

Documentation for all three macros is provided in the User's Guide for Analysis of Usual Intakes: For use with versions 1.1 of the Mixtran, Distrib and Indivint SAS macros [PDF - 98 KB]. Applications of these macros are described in Tooze et al, 2006 and Kipnis et al, 2009.

To help analysts get started, NCI has developed sample programs and analytic datasets. These programs employ the various macros in conjunction with preliminary analytic datasets containing data from the National Health and Nutrition Examination Survey (NHANES). The first three examples use a dataset that is based on NHANES 2001-04 data; it includes the addition of balanced repeated replication (BRR) weights, imputed values for some of the MyPyramid equivalents data, and some variable names that differ from the names used in the original NHANES file. The last example uses a dataset based on NHANES 2003-04 data, with a small set of variables used to illustrate the method.

- Sample programs and output:
- Example 1 [ZIP] illustrates model fitting and estimation of the distribution of usual intake for a dietary component that is consumed nearly daily.
- Example 2 [ZIP] illustrates model fitting and estimation of the distribution of usual intake for a dietary component that is consumed episodically.
- Example 3 [ZIP] illustrates time-saving programming techniques and macro features for a more complicated situation.
- Example 4 [ZIP] illustrates model fitting to evaluate the relationship between a single dietary component and a health parameter.

- Analytic datasets for examples
- Details of dataset contents for examples 1-3 [PDF - 29 KB]
- Details of dataset contents for example 4 [PDF - 17 KB]