 Methodology
 Open Access
 Published:
The added value of food frequency questionnaire (FFQ) information to estimate the usual food intake based on repeated 24hour recalls
Archives of Public Health volume 75, Article number: 46 (2017)
Abstract
Background
Statistical methods to model the usual dietary intake of foods in a population generally ignore the additional information on the neverconsumers. The objective of this study is to determine the added value of Food Frequency Questionnaire (FFQ) data allowing distinguishing the neverconsumers from the nonconsumers while modeling the usual intake distribution.
Methods
Three food items with a different proportion of neverconsumers were selected from the database of the Belgian food consumption survey of 2004 (N = 3200). The usual intake distribution for these food items was modeled with the Statistical Program for Analysis of Dietary Exposure (SPADE) and modeling parameters were extracted. These parameters were used to simulate (a) a new database with two 24h recalls per respondent and (b) a “true” usual intake distribution. The usual intake distribution from the new database was obtained by modeling the 24h recalls with SPADE, once without and once with the inclusion of the FFQ data on the neverconsumers. Ratios were calculated for the different percentiles of the usual intake distribution: the modeled usual intake (g/day) (for both SPADE with and without the inclusion of FFQ data on neverconsumers) was divided by the corresponding percentile of the simulated “true” usual intake (g/day). The closer the ratio is to one, the better the model fits the data.
Results
Inclusion of the FFQ information to identify the neverconsumers did not improve the estimation of the higher percentiles of the usual intake distribution. However, taking into account this FFQ information improved the estimation of the lower percentiles of the usual intake distribution even when the proportion of neverconsumers was low.
Conclusions
The inclusion of FFQ information to identify the neverconsumers is beneficial when interested in the whole usual intake distribution or in the lower percentiles only, no matter how low the proportion of neverconsumers for that food item may be. However, when interest is only in the higher percentiles of the usual intake distribution, inclusion of FFQ information to identify the neverconsumers will have no benefit.
Background
Studies comparing dietary and disease patterns in large populations provided evidence for the relation between nutrition and disease incidence. This led to the recognition that an unhealthy diet and lifestyle factors, such as a lack of physical activity, are key risk factors for developing a large variety of chronic conditions, such as cardiovascular diseases, cancer and diabetes [1,2,3]. This illustrates the importance of assessing the prevalence and distribution of food health indicators in the population.
Information on the diet of a population can be obtained by using a food consumption survey, where the food and nutrient intake can be assessed at an individual level. There is a large variety in dietary collection methods that are available for conducting such surveys. Many of them make use of a (repeated) 24h recall (24HR), where the respondent is asked to reproduce all the types and amounts of foods consumed during the preceding full day. However, the measurement of the usual food intake is challenging when the number of 24HRs is limited [1, 4,5,6,7,8,9,10].
A first shortcoming is that an individual’s food consumption varies from day to day. In addition 24HRs suffer from measurement error, due to recall bias, the use of standard recipe files, etc. These limitations result in a substantial withinindividual variability, which leads to a poor estimate of the usual intake distribution [5,6,7,8,9, 11,12,13]. In practice, the withinindividual variability tends to widen the usual intake distribution, which will result in an overestimation of the more extreme percentiles [5, 6, 13]. The majority of the statistical methods consider this first drawback, by integrating out (removing) the withinindividual variation from the usual intake distribution during modeling [5,6,7, 14,15,16,17].
The use of a limited number of 24HRs has another drawback, namely it can become very challenging to capture infrequently consumed foods, which makes it difficult to differentiate the nonconsumers from the neverconsumers [6, 7, 16, 18, 19]. Nonconsumers are participants that sometimes consume some specific food items, but did not have consumption on any of the recall days. Neverconsumers are participants who do never consume a particular food, nor on any recall day nor on any other day [1, 6, 7, 19]. This second drawback, the difficulty of differentiating the neverconsumers from the nonconsumers, is generally not considered during the modeling of the usual intake distribution. Also during the analysis of the BNFCS2004 (Belgian National Food Consumption Survey), the available information on the neverconsumers was ignored.
A possible solution is to supplement the 24HR data with additional information about the frequency of consumption, such as the one collected with a Food Frequency Questionnaire (FFQ). The latter contains more information on the long term dietary behaviour. This approach allows for the identification of neverconsumers of a given food in a population, provided that the FFQ contains a frequency category “never” [1, 6, 7, 16, 18, 19].
The objective of this study is to determine with a simulation study the added value of FFQ information to distinguish the neverconsumers from the nonconsumers during the modeling of the usual intake distribution. Subsequently we evaluated whether the added value depends on the proportion of neverconsumers. Also Goedhart et al. [6] performed a simulation study, where they amongst others assessed the effect of the use of FFQ information to identify the never consumers. However, Goedhart et al. [6] used artificial data to assess the effect, while in this study the simulation will be based on real food items whose intake was assessed in the Belgian population during the BNFCS2004.
Methods
Data of the BNFCS 2004 study
Three thousand twohundred individuals, who were 15 years or older participated to the BNFCS2004. The goal of the survey was to describe the usual food consumption in Belgium in both genders and in four predefined agegroups (15–18 years, 19–59 years, 60–74 years and ≥75 years) separately. The sample size calculation indicated the need for 400 individuals per group. Individuals were selected using a multistage sampling procedure from the national population register [1].
The study design of the BNFCS2004 followed largely the recommendations of the European Food Consumption Survey Method project (EFCOSUM) [4, 10]. A twice repeated nonconsecutive facetoface 24HR and a selfadministered FFQ (covering a 12 month period) were used to gather information on food intake. The 24HR was repeated once to obtain more details on the withinindividual variation and randomly included (in a large group of individuals) all seasons of the year and all days of the week [1].
EPICSoft (European Prospective Investigation into Cancer and Nutrition Software) was used to obtain standardized 24HR interviews [20]; the program was adapted to the Belgian dietary context [1]. The FFQ contained a frequency category “never”, which is essential to make the distinction between (non)consumers and neverconsumers [1]. More detailed information about the study design can be found in De Vriese et al. [1] and on the website of the Scientific Institute of Public Health [2].
Statistical program to assess dietary exposure
The Statistical Program for the Assessment of Dietary Exposure (SPADE), an R package developed at the Dutch National Institute of Public Health was selected to estimate the usual intake distribution [21, 22], because both R and the SPADE package are freeware. In addition SPADE allows including information on the neverconsumers without a large increase in the analysis time [21, 22]. For the data simulation we used R version 3.1.1 and SPADE version spade.rivm_v2.32.12.
SPADE provides different modeling options. This study only made use of the SPADE 2part model which models episodical (nondaily) intake [21, 22].
The panel on the left in Fig. 1 shows the basic steps of the SPADE 2part model without inclusion of the neverconsumers information: 24HRs of all respondents are used to model (a) the intake frequency and (b) the intake amount. Combining both results in the usual intake distribution for whole the population [21, 22].
The panel on the right in Fig. 1 presents the basics steps of the SPADE 2part model with inclusion of the neverconsumers information. The latter get assigned a zero usual intake. The modeled usual intake distribution of the (non) consumers and the zero intakes of the neverconsumers are combined to obtain the global usual intake, which will reflect the correct proportion of neverconsumers [21, 22].
Figure 2 shows in detail how SPADE models the usual intake distribution. Firstly the consumption frequency is modeled with a betabinomial model as a function of age. Secondly the consumption amount is modeled. The intake amounts are transformed to normality using a BoxCox transformation. These transformed amounts are then modeled as a function of age by a fractional polynomial regression and all model parameters are estimated including the total residual variance. The latter has to be partitioned in the between and within individual variance. A Gaussian quadrature backtransformation is subsequently used to (a) integrate out the withinindividual variance and (b) to backtransform the resulting shrunken distribution to the original scale [21,22,23].
In the third step the distributions of the intake frequency and intake amount are combined by a Monte Carlo simulation to obtain the usual intake distribution [21, 22].
More detailed information on the SPADE modeling can be found in Additional file 1.
Selection of the food items
We used the following criteria to select the food items used in the current study: (a) they needed to have different proportions of neverconsumers, and (b) even when the proportion of neverconsumers was large, the amount of participants consuming the food on both recall days had to be sufficiently large, to avoid convergence problems in SPADE (convergence problems occur when the available amount of data is insufficient to obtain an adequate model fit) [5, 21, 22].
Data simulation
The simulation of a new database
The simulated BNFCS2004 was generated by simulating two 24HRs and basic FFQ information (only information on neverconsumers versus consumers). The simulated BNFCS2004 was limited to individuals, aged 15–74 years (n = 2363). The simulation was performed stratified in the three different age groups (15–18 years, 19–59 years and 60–74 years), which allows for more variation of the food consumption in function of age. The simulation took place in two stages: (a) simulate the consumers only and (b) simulate the neverconsumers only (neverconsumers are individuals who indicated in the FFQ that they never consumed the food item during the last 12 months).
Simulation of the consumers only – Simulated BNFCS2004 consumers only
For the simulation of the consumers, an approach similar to Souverein et al. [9] was used. SPADE can model both the intake frequency and the intake amounts in function of age [21, 22]. To avoid convergence problems only the intake amounts were modeled in function of age during the simulation.
First all neverconsumers were excluded from the original BNFCS2004 database using FFQ data, resulting in a subdatabase with consumers only. Then the SPADE 2part model without information on the neverconsumers was used to obtain the usual intake distribution for consumers only from the original BNFCS2004. During the modeling some parameters were extracted: the mean usual intake for every age (μ_{age}), the withinindividual standard deviation (σ_{w}), the betweenindividual standard deviation (σ_{b}) and the BoxCox transformation parameter (λ_{bc}) (Fig. 3 box A).
After the extraction of all needed parameters the simulation could start on the transformed scale. Firstly the age for all respondents was simulated, making the assumption that the age was uniformly distributed in each of the three age strata. Then each respondents’ mean usual intake was simulated, using a normal distribution, with the mean equal to the age dependent mean usual intake and with the variance equal to the betweenindividual variance. Next two 24HRs were simulated for each respondent using again a normal distribution with the mean equal to the individuals mean usual intake (simulated in the previous step) and the variance equal to the withinindividual variance. The withinindividual variance was assumed to be equal for each individual. These intakes were then backtransformed to the original scale using λ_{bc}.
During this simulation also the consumption frequency must be considered. This was simulated using a betabinomial model taking into account the mean intake frequency and the correlation of the intake frequencies (Fig. 3 box B). In a final step the distributions of the intake frequency and the intake amount were combined.
Simulation of the neverconsumers – Simulated BNFCS2004 neverconsumers only
The correct number of neverconsumers in each age stratum was calculated based on the FFQ data of the original BNFCS2004. For each neverconsumer two 24HRs with a consumption equal to zero were generated resulting in the “simulated BNFCS2004 neverconsumers only” (Fig. 3 box C).
The simulation of a “true” usual intake distribution
The simulation of the simulated “true” usual intake distribution was very similar and was based on the methods described by Goedhart et al. [6], Tooze et al. [8] and Souverein et al. [9].
Simulation of the consumers only – Simulated “true” usual intake distribution consumers only
A “true” usual intake distribution was obtained by simulating 15,000 individuals, similar steps as described above were used. Instead of simulating two 24 HRs per individual, one thousand 24 h were simulated for each individual. The median intake over these thousand days can be considered as the “true” usual intake on a consumption day for that individual, consequently almost no withinindividual variance was left. Taking into account the intake frequency will thus directly results in the simulated “true” usual intake distribution, without the need for additional modeling (Fig. 4, box B).
Simulation of the neverconsumers – Simulated “true” usual intake for neverconsumers
The procedure was exactly the same as described for the simulated BNFCS2004 neverconsumers only (Fig. 4 box C).
More detailed information on the simulation process can be found in Additional file 1.
Evaluation of the simulation
Firstly the center (mean and median) of the simulated BNFCS2004 and the simulated “true” usual intake distributions for consumers only must be similar to that of the original BNFCS2004 for consumers only.
Secondly the within, betweenindividual and the total residual variance of the simulated BNFCS2004 should be similar to those obtained in the original BNFCS2004. However, in the simulated “true” the variance should be similar to the betweenindividual variance of the original BNFCS2004.
Effect of inclusion of FFQ information during modeling
To assess the usual intake distribution for the different food items for the whole Belgian population (15–74 years), the (non)consumers and neverconsumers were combined in all age strata and then all age strata were merged together. In other words the simulated BNFCS2004 was obtained by combining the simulated BNFCS2004 consumers only and the simulated BNFCS2004 neverconsumers only. Similarly the simulated “true” usual intake distribution was obtained by merging the simulated “true” usual intake distribution consumers only and the simulated “true” usual intake for neverconsumers. Because of the stratified design (by age) of the simulations, normalized survey weights were calculated and used during the analysis with SPADE.
Two versions of the SPADE 2part model were used to model the simulated BNFCS2004 in order to obtain the usual intake distributions: firstly a model not including the information on the neverconsumers: in this situation everyone is considered as a potential consumer. And secondly the model that considered the information on the neverconsumers, which allows for taking into account the correct proportion of neverconsumers in the population [21, 22]. Based on those models the weighted percentiles of the usual intake distribution were estimated (5; 25; 50; 75 and 95%). The simulated “true” usual intake distribution does not require any modeling. The same percentiles could be determined directly after taking into account the normalized weights.
The difference in the fit of both SPADE models (without and with the inclusion of information on the neverconsumers) was evaluated using relative differences. The ratios of the modeled usual intake distributions (obtained from the simulated BNFCS2004) versus the simulated “true” usual intake distribution were calculated. E.g. the usual intake amount (g/day) obtained by one of the SPADE models was divided by the corresponding usual intake amount (g/day) obtained by the simulated “true” usual intake distribution, and this for all percentiles. The closer the ratio is to one, the better the SPADE 2part model resembles the simulated “true” usual intake distribution at the given percentile.
The relative differences obtained by both models were also plotted in a graph in function of the percentiles. Two specific outcomes with undefined ratios were taken into account: (a) when a ratio of (0 g/day)/(0 g/day) is obtained, which indicates a perfect fit, the ratio will get assigned a value of one, and (b) the ratio (x g/day)/(0 g/day) will get assigned an artificial value of 0.6 to make clear in the graphs that the fit was not perfect.
Goedhart et al. [6] suggested that three replicate simulations are sufficient to check whether replicates are similar. Therefore a sensitivity analysis was done by repeating the simulation three times (four simulations in total) to evaluate the variability of the simulations.
The above described procedure was performed for three different food items (water, cheese and fat spread) with a different proportion of never consumers (respectively 1.9, 6.7 and 31.7%).
Results
Selected food items
Three food items, being water, cheese and fat spread were selected for the purpose of this study. The main characteristics can be found in Table 1.
Table 1 shows that all food items fulfill the predefined requirements. Firstly the proportion of neverconsumers and the weighted proportion of neverconsumers is different for the selected food items. Secondly the proportion of daily consumers for all food items is sufficiently large to avoid convergence problems in SPADE [5, 21, 22].
Evaluation of the simulation
In order to double check the simulation process, the usual intake distribution of the simulated BNFCS2004 and the simulated “true”, both for the consumers only, were compared with the results obtained from the original BNFCS2004 for the consumers only. The estimated usual intake distributions for the water, the cheese and the fat spread dataset for the consumersonly are shown in Table 2AC, for one of the four simulations.
The usual intake distributions (g/day) for consumers only obtained by the original BNFCS2004, the simulated BNFCS2004 and the simulated “true” are shown in Table 2AC. The mean and median of the usual intake distributions are very similar for all three food items. However, the differences in the usual intake distributions for consumers only becomes larger in the more extreme percentiles. Probably this is caused by the difference in the within and betweenindividual variance in the original BNFCS2004 and the simulated BNFCS2004. Meanwhile the betweenindividual variance is similar in all age strata for the original BNFCS2004 and the simulated “true”, which is a consequence of the method used to simulate the simulated “true” usual intake distribution.
Effect of the inclusion of FFQ information during modeling
After adding the correct proportion of neverconsumers in each of the age strata, the three agestrata of the water, cheese and fat spread dataset were combined. The usual intake distribution for the whole population (both (non)consumers and neverconsumers) was obtained by modeling the simulated BNFCS2004 with the SPADE 2part model, once with and once without the inclusion of the FFQ information on the neverconsumers. The obtained usual intake distributions for the whole Belgian population (15–74 years), together with the relative difference as compared to the simulated “true” usual intake distribution are shown in Table 3AC for water, cheese and fat spread for one of the four simulations.
The usual intake distributions for the whole population obtained after SPADE modeling (with and without FFQ information) are similar to the simulated “true” usual intake distribution for cheese and fat spread. The absolute values are somewhat different for water, however the relative differences are not that large and are similar to those found in the cheese dataset (Table 3AC). For all three foods the largest difference in the usual intake distributions between the SPADE model without versus with FFQ information was observed for the lower percentiles. Taking into account the correct proportion of neverconsumers resulted in a downwards correction of the usual intake at the lower percentiles. In addition there could be estimated correctly that the proportion of neverconsumers was higher than 2.5% for cheese and higher than 25% for fat spread. The influence of the inclusion of the information on the neverconsumers while estimating the median and the higher percentiles seemed to be limited for all three food items.
The relative difference of the usual intake distribution without and with inclusion of FFQ information obtained from the simulated BNFCS2004, to the simulated “true” usual intake distribution for the whole population were plotted in function of the corresponding percentiles. Figure 5 shows the results of the simulation together with three replicate simulations to get an idea of the variability of the simulations for the water, cheese and fat spread dataset.
Figure 5 confirms the observations from Table 3AC. When the FFQ information on neverconsumers is used we found that (a) the proportion of neverconsumers can be estimated more correctly and (b) the usual intake at the lower percentiles obtained a downwards correction. The benefits of the inclusion of information on the neverconsumers, already shows up when the proportion of neverconsumers is low (e.g. the water dataset). However, the benefits from the inclusion of the information on the neverconsumers increases as the number of neverconsumers increases.
To allow the within simulation comparison of the effect of the inclusion of FFQ information, the Additional file 2 contains one figure per simulation.
Discussion
The inclusion of FFQ information for the estimation of the usual intake distribution is possible in two different ways: (a) use of FFQ information as a covariate or (b) use of the basic FFQ information to identify the correct proportion of neverconsumers [6]. The main goal when using the FFQ information as a covariate is to improve the estimation of the intake frequency [6, 7, 16, 18, 19], whereas inclusion of the FFQ information to identify the neverconsumers allows for reflecting the correct proportion of neverconsumers in the population [6, 16, 19]. This study focused on the second option. Goedhart et al. [6] performed a large simulation study and studied amongst others the effect of the use of FFQ information to identify the neverconsumers. The current study is somehow similar, but using the SPADE method only. In addition the simulations in this study were based on real food items that were assessed in the Belgian population, whereas in Goedhart et al. [6] the usual intake data were artificial.
Evaluation of the simulation
The mean and the median for the consumers only of the simulated BNFCS2004 and the simulated “true” are similar to those in the original BNFCS2004. The within and between individual variance are different in the original BNFCS2004 and the simulated BNFCS2004. A possible explanation of such a difference could be the small number of simulated cases, e.g. for the water dataset 745 adolescents, 807 adults and 766 elderly. However, as expected, the betweenindividual variance in the original BNFCS2004 and the variance of the simulated “true” usual intake are similar [6, 8]. The SPADE 2part model namely states to estimate the between and withinindividual variance, since the aim is to estimate the usual intake distribution of the population, SPADE will remove the withinindividual variance from the usual intake distribution [21, 23]. As a consequence the variance of the original BNFCS2004 usual intake distribution will be equal to the betweenindividual variance. Simulating 1000 recall days for each individual in the simulated “true” database corresponds to following these individuals during 2 years and 8 months. When an individual is followed for so many days, the usual intake of that individual is more certain, and (almost) no withinindividual variance will be left [6, 8]. The variance in the simulated “true” usual intake is indeed nearly equal to the betweenindividual variance observed in the different age strata in the original BNFCS2004, as shown in Table 2AC.
Since the interest lays in estimating the usual intake at population level, rather than the individual intake, the goal is to integrate out the withinindividual variance from the data, to obtain a usual intake distribution where only the betweenindividual variance is considered [5]. This approach assumes that the mean of a sufficient amount of 24HRs in one individual results in the “true” usual intake of that individual. This implicates the assumption that the 24HR is unbiased at the individual level [7, 8]. However, biomarker studies of dietary intake showed that selfreport instruments are biased [24,25,26,27].
A limitation during modeling was related to the fact that the simulation was performed in the three age strata separately. As a consequence the whole population simulated “true” usual intake distribution consists of three fitted models, more precisely one model in each age stratum. At first sight the same is happening in the simulated BNFCS2004, but the difference is that the data are remodeled by the SPADE 2part model, after merging the three age strata together. At this point only one model is fitted for the complete age range and this affects the usual intake distribution. For instance the water dataset: when working in the different age strata (three models) it was shown that the water consumption was highest in the adults age group (534 g/day in adolescent, 619 g/day in adults and 499 g/day in elderly). However, when the SPADE 2part model (only one model) was used on the simulated BNFCS2004 data, the intake amount seemed to decrease with the age from 539 g/day in adolescents to 525 g/day in adults and 428 g/day in elderly. In addition the adults age group was underrepresented most and received the highest weight [28]. All this together resulted in a higher usual intake of water in the simulated “true” usual intake distribution and an underestimation of the intake of water when the SPADE 2part model was used on the simulated BNFCS2004.
Another observation is that in the fat spread dataset the SPADE 2part model without FFQ information on neverconsumers (which does not take into account the proportion of neverconsumers) predicts zero intakes for both P0.025 and P0.05. There are two possible explanations for those results: (a) because of the larger proportion of days without intakes, intake amounts will be regularly multiplied with an intake frequency close to zero, (b) at the same time fat spreads are consumed in rather small quantities. In this situation the benefit of the inclusion of FFQ information is no longer present in the lowest percentiles (e.g. P0.025 and P0.05 in the fat spread dataset). However, benefits were still present in the percentiles just above (e.g. P0.25 in the fat spread dataset).
Effect of the inclusion of FFQ information during modeling
Inclusion of FFQ information to identify the neverconsumers is not beneficial while estimating the higher percentiles of the usual intake distribution. On the other hand the results indicate that using the FFQ data to identify the neverconsumers is crucial while estimating the lower percentiles of the usual intake distribution, even when the proportion of neverconsumers is low. E.g. a benefit was seen for water where only 2% indicated to be a neverconsumer. Both results were in accordance with the findings in the simulation study of Goedhart et al. [6].
This means in practice that when interest is in the food safety issue, the goal is typically to focus on the consumers with the highest intake, as the high consumers are at risk [6, 29]. Since inclusion of FFQ information on the neverconsumers does not seem to improve the estimation of the higher percentiles, inclusion of this information will probably have no benefits.
On the other hand if interest is in the food adequacy issue, the interest is typically in the individuals with the lowest intake [29]. Since inclusion of information on neverconsumers improves the estimation of the lowest percentiles, inclusion of the information on the neverconsumers will be beneficial. In a national food consumption survey the usual intake distribution of the whole population is measured, both upper and lower percentiles are of interest in this situation [6, 29]. Again inclusion of the FFQ information on the neverconsumers will be beneficial to better estimate the lower percentiles of the usual intake distribution. Finally the benefit of inclusion of FFQ information on neverconsumers to estimate the lower percentiles of the usual intake distribution becomes larger, as the proportion of neverconsumers increases.
Strength and limitations of the study
The simulation was performed in the three age strata separately, with the consequence described above. In addition, this age stratification also limited the number of food items that could be selected. Namely, the number of individuals with consumption on both recall days had to be sufficiently large in all subgroups to avoid convergence problems in SPADE [5, 21, 22]. Though this problem is not unique for SPADE, also other statistical modeling methods, like the ISU (Iowa State University) and the NCI (National Cancer Institute) method require a sufficient number of respondents with at least two positive intake days in order to avoid convergence problems [7, 19, 30]. The decision to perform this simulation in the three separate age strata was made because usual intakes can vary substantially depending on the age of the individuals [7, 8, 31]. This was also shown in the results section, especially for fat spread and water. When all age groups would have been simulated at the same time, the differences over the age groups would no longer be present in the same magnitude. Since SPADE can take into account the age during modeling, a part of the age effect would still be captured [21, 22].
The added value of the current study is that the simulation was performed on the basis of real data, which allows a better evaluation of the effect of the inclusion of information on neverconsumers in a real life situation. In addition, in the current simulation some difficulties were encountered in the translation from theory to practice. Firstly, it is not always easy to make a straightforward link between the FFQ questions and the food items obtained from the 24HR. This illustrates at the same time the importance of constructing the FFQ questions in function of the analysis that are planned. Secondly, the use of real data showed convergence problems, when the number of respondents with two positive intakes during the recall days became too low, as was shown in other studies [7, 16, 19, 30]. Such convergence problems occur more often during subgroup analysis, because of the smaller number of observations. These kinds of problems are more difficult to spot when the simulation is purely theoretical.
Conclusions
The inclusion of FFQ information to identify the neverconsumers improves the estimation of the usual intake distribution, but only at the lower percentiles. When interest is in the whole usual intake distribution (lower and upper percentiles) or interest is only in the lower percentiles of the usual intake distribution, inclusion of this FFQ information is beneficial even when the proportion of neverconsumers is low. However, when interest lies only in the higher percentiles of the usual intake distribution, inclusion of FFQ information on the neverconsumers will have no benefit.
Abbreviations
 24HR:

24hour recall
 BNFCS2004:

Belgian National Food Consumption Survey 2004
 EFCOSUM:

European Food Consumption Survey Method project
 EPICSoft:

European Prospective Investigation into Cancer and Nutrition software
 FFQ:

Food frequency questionnaire
 ISU:

Iowa State University method
 NCI:

National Cancer Institute method
 SPADE:

Statistical Program for Dietary Exposure
 λ_{bc} :

BoxCox transformation parameter
 μ:

Mean usual intake
 σ_{b} :

Betweenindividual standard deviation
 σ_{w} :

Withinindividual standard deviation
References
 1.
De Vriese S, De Backer G, De Henauw S, Huybrechts I, Kornitzer K, Leveque A, et al. The Belgian food consumption survey: aims, design and methods. Arch Public Health. 2005;63:1–16.
 2.
WIVISP. More information objectives. 2014. https://fcs.wivisp.be/info/SitePages/Objectives.aspx?WikiPageMode=Edit&InitialTabId=Ribbon.EditingTools.CPEditTab&VisibilityContext=WSSWikiPage. Accessed 26 Aug 2016.
 3.
Ezzati M, Riboli E. Behavioral and dietary risk factors for noncommunicable diseases. N Engl J Med. 2013;369(10):954–64.
 4.
Brussaard J, Johansson L, Kearney J. Rationale and methods of the EFCOSUM project. Eur J Clin Nutr. 2002;56:S4–7.
 5.
Dodd KW, Guenther PM, Freedman LS, Subar AF, Kipnis V, Midthune D, et al. Statistical methods for estimating usual intake of nutrients and foods: a review of the theory. J Am Diet Assoc. 2006;106(10):1640–50.
 6.
Goedhart PW, van der Voet H, Knüppel S, Dekkers ALM, Dodd KW, Boeing H, et al. A comparision by simulation of different methods to estimate the usual intake distribution for episodically consumed foods 2012. Supporting publications 2012: En299. www.efsa.europa.eu/publications. Accessed 26 Aug 2016.
 7.
Tooze JA, Midthune D, Dodd KW, Freedman LS, KrebsSmith SM, Subar AF, et al. A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. J Am Diet Assoc. 2006;106(10):1575–87.
 8.
Tooze JA, Kipnis V, Buckman DW, Carroll RJ, Freedman LS, Guenther PM, et al. A mixedeffects model approach for estimating the distribution of usual intake of nutrients: the NCI method. Stat Med. 2010;29(27):2857–68.
 9.
Souverein OW, Dekkers AL, Geelen A, Haubrock J, de Vries JH, Ocke MC, et al. Comparing four methods to estimate usual intake distributions. Eur J Clin Nutr. 2011;65:S92–S101.
 10.
Brussaard J, Löwik M, Steingrimsdottir L, Møller A, Kearney J, De Henauw S, et al. A European food consumption survey methodconclusions and recommendations. Eur J Clin Nutr. 2002;56:S89–94.
 11.
Beaton GH, Milner J, Corey P, McGuire V, Cousins M, Stewart E, et al. Sources of variance in 24hour dietary recall data: implications for nutrition study design and interpretation. Am J Clin Nutr. 1979;32(12):2546–59.
 12.
Beaton GH, Milner J, McGuire V, Feather T, Little JA. Source of variance in 24hour dietary recall data: implications for nutrition study design and interpretation. Carbohydrate sources, vitamins, and minerals. Am J Clin Nutr. 1983;37(6):986–95.
 13.
Mackerras D, Rutishauser I. 24hour national dietary survey data: how do we interpret them most effectively? Public Health Nutr. 2005;8(06):657–65.
 14.
National Research Council, Subcommittee on Criteria for Dietary Evaluation. Nutrient adequacy: assessment using food consumption surveys. Washington: DC: National Academy Press; 1986.
 15.
Nusser SM, Carriquiry AL, Dodd KW, Fuller WA. A semiparametric transformation approach to estimating usual daily intake distributions. J Am Stat Assoc. 1996;91(436):1440–9.
 16.
Haubrock J, Nothlings U, Volatier JL, Dekkers A, Ocke M, Harttig U, et al. Estimating usual food intake distributions by using the multiple source method in the EPICPotsdam calibration study. J Nutr. 2011;141(5):914–20.
 17.
Slob W. Probabilistic dietary exposure assessment taking into account variability in both amount and frequency of consumption. Food Chem Toxicol. 2006;44(7):933–51.
 18.
Subar AF, Dodd KW, Guenther PM, Kipnis V, Midthune D, McDowell M, et al. The food propensity questionnaire: concept, development, and validation for use as a covariate in a model to estimate usual food intake. J Am Diet Assoc. 2006;106(10):1556–63.
 19.
Kipnis V, Midthune D, Buckman DW, Dodd KW, Guenther PM, KrebsSmith SM, et al. Modeling data with excess zeros and measurement error: application to evaluating relationships between episodically consumed foods and health outcomes. Biometrics. 2009;65(4):1003–10.
 20.
Slimani N, Valsta L. Perspectives of using the EPICSOFT programme in the context of panEuropean nutritional monitoring surveys: methodological and practical implications. Eur J Clin Nutr. 2002;56:S63–74.
 21.
Dekkers AL, VerkaikKloosterman J, van Rossum CT, Ocké MC. SPADE, a new statistical program to estimate habitual dietary intake from multiple food sources and dietary supplements. J Nutr. 2014;144(12):2083–91.
 22.
Dekkers AL, VerkaikKloosterman J, van Rossum CT, Ocké MC. SPADE: Statistical Program to Asses habitual Dietary Exposure, User’s Manual version 2.0, for SPADE version 3.0; December 2014. Bilthoven: RIVM (National Institute for Public Health and the Environment); 2014.
 23.
Dekkers ALM, Slob W. Gaussian Quadrature is an efficient method for the backtransformation in estimating the usual intake distribution when assessing dietary exposure. Food Chem Toxicol. 2012;50(10):3853–61.
 24.
Freedman LS, Midthune D, Carroll RJ, KrebsSmith S, Subar AF, Troiano RP, et al. Adjustments to improve the estimation of usual dietary intake distributions in the population. J Nutr. 2004;134(7):1836–43.
 25.
Macdiarmid J, Blundell J. Assessing dietary intake: who, what and why of underreporting. Nutr Res Rev. 1998;11(02):231–53.
 26.
Subar AF, Kipnis V, Troiano RP, Midthune D, Schoeller DA, Bingham S, et al. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. Am J Epidemiol. 2003;158(1):1–13.
 27.
Kipnis V, Subar AF, Midthune D, Freedman LS, BallardBarbash R, Troiano RP, et al. Structure of dietary measurement error: results of the OPEN biomarker study. Am J Epidemiol. 2003;158(1):14–21.
 28.
HahsVaughn DL. A primer for using and understanding weights with national datasets. J Exp Educ. 2005;73(3):221–48.
 29.
De Boer E, Slimani N, van’t Veer P, Boeing H, Feinberg M, Leclercq C, et al. The European food consumption validation project: conclusions and recommendations. Eur J Clin Nutr. 2011;65:S102–S7.
 30.
Nusser SM, Fuller WA, Guenther PM. Estimating usual dietary intake distributions: adjusting for measurement error and nonnormality in 24hour food intake data. In: Lyberg L, Biemer P, Collins M, DeLeeuw E, Dippo C, Schwartz N, et al., editors. Survey measurement and process quality. New York: Willey; 1997. p. 689–709.
 31.
Waijers P, Dekkers ALM, Boer JMA, Boshuizen HC, van Rossum CTM. The potential of AGE MODE, an agedependent model, to estimate usual intakes and prevalences of inadequate intakes in a population. J Nutr. 2006;136(11):2916–20.
Acknowledgements
The authors would like to thank Arnold Dekkers PhD, senior statistician at the Dutch National Institute for Public Health and the Environment, one of the developers of the program SPADE. He was always willing to answer all questions and the discussions helped to improve our understanding of the SPADE program.
Funding
The Belgian National Food Consumption Survey 2004 was funded by the Belgian Federal Public Service for Health, Food Chain Safety and Environment and by the Scientific Institute of Public Health.
This study was performed in the context of a master dissertation at Ghent University and received no funding.
Availability of data and materials
Data presented in this manuscript are available upon request to the corresponding author.
Author information
Affiliations
Contributions
CO developed the concept and executed the simulation study. Based on the results of the simulation study, CO wrote this manuscript. HVO was promotor of the master dissertation and gave guidance for conducting the simulation study and critically revised the manuscript. KDR and JT critically revised the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
The Belgian Food Consumption Survey 2004 was approved by the medical ethical committee of the Scientific Institute of Public Health, Belgium.
Consent for publication
Not applicable.
Competing interests
HVO is EditorinChief of Archives of Public Health.
All other authors declare they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Details on SPADE and the simulation process. (DOCX 253 kb)
Additional file 2:
Relative fit of the SPADE 2part model without/with inclusion of FFQ information on neverconsumers, Belgian National Food Consumption Survey 2004. Legend: Relative differences of the usual intakes in function of the percentiles for the four replicate simulations separately. Without FFQ presents the ratio of the usual intake amount (g/day) obtained with the SPADE 2part model without FFQ information on neverconsumers, divided by the simulated “true” usual intake amount (g/day). With FFQ the same, but with the inclusion of the FFQ information on neverconsumers. The reference line represents a ratio of one, which indicates that the model fitted by the SPADE 2part model gives exactly the same result as the simulated “true” usual intake distribution. (PDF 1529 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Ost, C., De Ridder, K.A.A., Tafforeau, J. et al. The added value of food frequency questionnaire (FFQ) information to estimate the usual food intake based on repeated 24hour recalls. Arch Public Health 75, 46 (2017). https://doi.org/10.1186/s1369001702148
Received:
Accepted:
Published:
Keywords
 Usual intake
 Food frequency questionnaire
 FFQ
 24hour recall
 Episodically consumed foods
 Statistical modeling methods
 Neverconsumers
 Spade