- Open Access
Combining self-reported and objectively measured survey data to improve hypertension prevalence estimates: Portuguese experience
Archives of Public Health volume 79, Article number: 45 (2021)
Accurate data on hypertension is essential to inform decision-making. Hypertension prevalence may be underestimated by population-based surveys due to misclassification of health status by participants. Therefore, adjustment for misclassification bias is required when relying on self-reports. This study aims to quantify misclassification bias in self-reported hypertension prevalence and prevalence ratios in the Portuguese component of the European Health Interview Survey (INS2014), and illustrate application of multiple imputation (MIME) for bias correction using measured high blood pressure data from the first Portuguese health examination survey (INSEF).
We assumed that objectively measured hypertension status was missing for INS2014 participants (n = 13,937) and imputed it using INSEF (n = 4910) as auxiliary data. Self-reported, objectively measured and MIME-corrected hypertension prevalence and prevalence ratios (PR) by sex, age group and education were estimated. Bias in self-reported and MIME-corrected estimates were computed using objectively measured INSEF data as a gold-standard.
Self-reported INS2014 data underestimated hypertension prevalence in all population subgroups, with misclassification bias ranging from 5.2 to 18.6 percentage points (pp). After MIME-correction, prevalence estimates increased and became closer to objectively measured ones, with bias reduction to 0 pp - 5.7 pp. Compared to objectively measured INSEF, self-reported INS2014 data considerably underestimated prevalence ratio by sex (PR = 0.8, 95CI = [0.7, 0.9] vs. PR = 1.2, 95CI = [1.1, 1.4]). MIME successfully corrected direction of association with sex in bivariate (PR = 1.1, 95CI = [1.0, 1.3]) and multivariate analyses (PR = 1.2, 95CI = [1.0, 1.3]). Misclassification bias in hypertension prevalence ratios by education and age group were less pronounced and did not require correction in multivariate analyses.
Our results highlight the importance of misclassification bias analysis in self-reported hypertension. Multiple imputation is a feasible approach to adjust for misclassification bias in prevalence estimates and exposure-outcomes associations in survey data.
Reliable and precise estimates of hypertension prevalence are essential to inform health policies at the global, regional, national, and local levels [1,2,3]. High blood pressure prevalence is one of the harmonized European Core Health Indicators continuously monitored at the European Union (EU) and its member states, using self-reported survey data [1, 2].
There is substantial evidence in epidemiological research on limited validity of self-reports to measure hypertension [4,5,6]. Survey participants could misclassify their health status and report it as healthy due to inaccurate recall or lack of awareness [4,5,6,7,8]. Several studies have shown that, across EU countries, the awareness of hypertension is still far from perfect (ranging between 33.9 and 82.2%) [5,6,7, 9, 10] and may be differential among population subgroups [5,6,7, 9], since some are more prone to reporting errors in surveys.
Incorrect reports of hypertension lead to inaccurate survey inference on prevalence and measures of association, so called misclassification bias [4,5,6, 8], with implications for public health planning, interventions evaluation, and research. To reduce misclassification bias objective measure of blood pressure through health examination surveys (HES) has thus been recommended . However, objective measurements are more expensive and time-consuming, and represent a higher burden for participants . Surveys with objective measurements are usually implemented in a smaller scale with implications for estimates precision, limited level of disaggregation and subsequent limited subpopulation analysis. Even in high-income countries, frequent HES may not be a feasible substitute of large-scale health interview surveys, and as such, decision makers and researchers still rely highly on self-reports. A possible way to address this issue would be to maintain self-report measures from large-scale surveys with high precision but attempt to adjust it for the potential misclassification bias using more accurate HES data.
Several methods, such as regression calibration, maximum likelihood and Bayesian approaches [12,13,14,15,16] and respective software solutions [14, 15] have been put forward to account for misclassification in this context. However, these are complex and might not be intelligible for the average public health researcher. A more feasible alternative is to consider misclassification as a missing data problem and apply multiple imputation techniques for misclassification error correction (MIME) [12, 17,18,19]. MIME is deemed simple to use and does not require strong programming skills, since multiple imputation routines are included in all standard statistical software . Another advantage of MIME is that it can be applied to correct in a single step bias in both prevalence estimates and exposure-outcome associations, allowing accounting for differential and non-differential misclassification errors either in outcome or exposure, and using internal and external validation data [12, 17,18,19,20].
National Health Interview Survey (INS)  represents a key tool for monitoring trends in hypertension and other cardiovascular disease risk factors in Portugal, providing evidence for public health planning and National Health Programs [3, 22]. Since 2014, INS is the Portuguese component of European Health Interview Survey, used for European Core Health Indicators monitoring in the EU. In 2015, Portugal developed its first HES combining self-reported and objectively measured data on hypertension for the same individuals, thus providing an opportunity to investigate the magnitude and direction of misclassification bias in self-reports and the feasibility of statistical adjustments.
Despite extensive methodological research on methods performance in a variety of settings and recently growing attention to misclassification bias in self-reports in epidemiological literature, this issue is still overlooked in public health practice; few studies attempted to assess results robustness in the presence of misclassification or adjust for it [23, 24].
This study aims to illustrate the application of multiple imputation for misclassification bias correction in self-reported hypertension prevalence and prevalence ratios in the INS2014, using data on objectively measured blood pressure from the first Portuguese HES (INSEF).
We used data from two population-based surveys: with interview (INS2014) and examination (INSEF) conducted in Portugal in 2014–2015. A detailed description of surveys design, sampling, and data collection has been provided elsewhere [25, 26].
Briefly, INSEF is a cross-sectional study conducted in 2015 by the Instituto Nacional de Saúde Doutor Ricardo Jorge, in partnership with the five Regional Health Administrations, the Regional Secretariats for Health of the Autonomous Regions (Azores and Madeira) and the Norwegian Institute of Public Health. INSEF collected objectively measured and self-reported data on a multi-stage probabilistic sample (n = 4911, response rate of 43.9%) of non-institutionalized Portuguese population aged 25–74 years old (yo), through physical examinations and interviews.
INS2014 was developed by Statistics Portugal and Instituto Nacional de Saúde Doutor Ricardo Jorge as an integrated part of European Health Interview Survey, wave 2. It is a cross-sectional study targeting non-institutionalized resident population aged 15 years or over. Survey sample (n = 18,204) was selected using multistage stratified design, participation rate was 80.8% . INS2014 collected self-reported data on sociodemographic characteristics, health status and its determinants. For this study, INS2014 sample was restricted to individuals aged 25–74 yo (n = 13,937), i.e., the age group available in both surveys.
Self-reported hypertension prevalence was estimated using the following question of INS2014: “During the past 12 months, have you had any of the following chronic disease or conditions? High blood pressure/Hypertension (Yes/No). Consider disease/conditions even if the symptoms were not present due to medical treatment”. Individuals who answered “yes” were considered hypertensive.
For INSEF, self-reported hypertension prevalence was defined using two questions: i) “Do you have any of the following longstanding diseases or conditions: High blood pressure or hypertension? (Yes/No)” Consider longstanding disease/conditions which have lasted, or are expected to last, for 6 months or more. , and if yes, ii) “Were these conditions diagnosed by a medical doctor? (Yes/No)”. Individuals who answered positively to both questions were considered hypertensive.
Prevalence of objectively measured high blood pressure was defined as the proportion of those: i) with systolic blood pressure ≥ 140 mmHg, or ii) with diastolic blood pressure ≥ 90 mmHg, or iii) reporting to take prescribed medication to control blood pressure in the 2 weeks prior to the interview. Blood pressure was measured in a sitting position after 5 min of rest using automated measurement device OMROM M6. Three sequential blood pressure measurements were taken on the right arm with one-minute intervals. The average of the 2nd and 3rd readings for systolic and diastolic blood pressure was considered. Medication intake was assessed by the questions: “During the past 2 weeks, have you used any medicines that were prescribed for you by a doctor?” and if yes, “Were the medicines for hypertension?”
Both surveys collected data on sex, age group, region of residence, urbanization, education, income, health behaviours and healthcare use, using similar questions (Table S1).
Participants’ characteristics and self-reported hypertension prevalence rates were compared between the two surveys using chi-square test. Exploratory analysis of missing data was performed, logistic regression was used to investigate whether probability of objectively measured hypertension being missing is related to observed data.
We used logistic regression imputation method for monotone missing data patterns for misclassification correction. We assumed that objectively measured hypertension was missing at random for INS2014 participants and imputed it using INSEF as auxiliary data. Logistic regression model was fitted on the INSEF sample considering objectively measured hypertension as outcome and self-reported hypertension and set of other covariates as independent variables. Among covariates available in both surveys only statistically significant were included in the final imputation model. Model performance was assessed using area under receiver operating curve (AUC) and Archer-Lemeshow goodness-of-fit test . Fitted model was used to impute “objectively measured” hypertension values in the INS2014 sample. Imputation was based on a set of 100 imputation iterations.
We estimated objectively measured, self-reported and MIME-corrected prevalence of hypertension and respective 95% confidence intervals (95CI) for overall sample and stratified by sex, age group and educational level. Poisson regression models were fitted to estimate prevalence ratios (PR) of self-reported, objectively measured and MIME-corrected hypertension according to sex, age group and educational level. Poisson regression was chosen since it allows to estimate prevalence ratio directly and is recommended as alternative to logistic regression when outcome is not rare [28, 29].
Self-reported and MIME-corrected INS2014 prevalence and prevalence ratios estimates were compared in terms of bias, standard error (SE) and mean squared error (MSE). Bias in self-reported and MIME-corrected estimates were computed using objectively measured INSEF data as gold standard. MSE was estimated as a sum of the variance and the bias squared.
Stata 15.1 was used for data analyses . All analysis presented in the manuscript, including multiple imputation, were performed using sampling weights to account for the complex sample design of INS2014 and INSEF samples. Significance level of 5% was considered.
Participants characteristics and surveys comparability
INS2014 and INSEF respondents were similar regarding sociodemographic characteristics (Table S2), which is consistent with the samples being representative of the Portuguese population. Differences between surveys were observed for three of 11 variables. Notably, we observed a higher proportion of INSEF participants reporting to have their blood pressure measured by a health professional in the last 12 months (82.2% vs. 78.1%), and to consume alcohol in the last 12 months (80.1% vs. 73.1%). In contrast, a lower proportion of INSEF participants reported to have consulted a general practitioner in the last 12 months (65.2% vs.74.9%).
Of potential 11 covariates: self-reported hypertension, sex, age group, region of residence, level of education and practice of physical activity (Table S3) logistic regression model used for MIME included six that were statistically significant. Model showed good fit (Archer-Lemeshow goodness-of-fit test p-value = 0.167) and excellent classification accuracy (AUC = 0.92).
Self-reported prevalence estimates were similar between both surveys, except for the 55–64 age group, where higher estimates (p-value = 0.0288) were obtained for the INSEF sample (Table 1). In both surveys, self-reported prevalence was lower compared to objectively measured; the extent of underestimation varied by population subgroup.
Following MIME correction, prevalence estimates increased substantially and approximated to their objectively measured INSEF counterparts. Overall, MIME reduced misclassification bias from 11.5 pp. to 0.7 pp. (Table 2). Bias reduction was observed in all studied population subgroups, yet, for 55–64 yo the difference between INS2014 and INSEF remained considerable after correction (5.7 pp).
Regarding prevalence estimates precision (Table S4), when comparing SE for MIME-corrected estimates obtained with larger INS2014 sample and objectively measured ones from the smaller INSEF sample, we observed marginal improvements for 6 population subgroups, while for other 4 corrected estimates were less precise. Gains in terms of MSE after MIME correction were observed in all studied subgroups (Table 2).
Prevalence ratios of hypertension according to sex, age group and educational level estimated by Poisson regression are summarized in Table 3.
Comparing men and women based on self-reported INS2014 data, the prevalence ratio was 0.8 [95CI: 0.7, 0.9], indicating lower hypertension prevalence among men. Objectively measured INSEF data pointed out in opposite direction, indicating 1.2-fold increase in prevalence among men (PR = 1.2 [95CI: 1.1, 1.4]), when compared to women. MIME correction produced statistically significant PR estimate of 1.1 [95CI: 1.0, 1.3], closer to the objectively measured.
INS2014 data indicated a 7.7-fold increase in hypertension prevalence [95CI: 6.6, 9.1] in the 65–74 yo group, compared to the reference group (25–44 yo), while a smaller effect of age was estimated with objectively measured INSEF data (PR = 5.9 [95CI: 4.9, 7.2]). Accounting for outcome misclassification through MIME correction produced a PR estimate of 5.2 [95CI: 4.3, 6.4].
INS2014 estimates of prevalence ratios according to educational level were similar to the INSEF, indicating no need for correction. Prevalence ratios from MIME correction remained close to the original ones (Table 3).
After adjustment for confounding, we observed misclassification bias required correction in self-reported prevalence ratio of hypertension by sex but not by age group or educational level. MIME-corrected PR of 1.2 [95CI: 1.0, 1.3] resulting from multivariate Poisson regression (Table S5) was similar to objectively measured in INSEF.
In this study, we successfully applied MIME correction to adjust self-reported hypertension prevalence and prevalence ratios estimates from INS2014, a large-scale population-based study using external data on objectively measured hypertension from the first Portuguese HES as reference.
Our results showed that self-reported data on hypertension, European Core Health Indicator collected by the Portuguese component of the European Health Interview Survey wave 2, is subject to differential misclassification, and, if ignored, it leads to inaccurate inference and misleading scientific conclusions. Notably, for Portuguese aged 25–74 years old, the underestimated prevalence and severity of bias varied among subgroups. Misclassification bias in prevalence estimates were larger among men, older age groups, and less educated people. Such differences might be explained by recall bias, confusion between “controlled” disease and “cure” and differential health-seeking/contact behaviors (e.g. women have more contact with healthcare services during key life events such as pregnancy and thus might be more aware of high blood pressure) [6,7,8]. While an understanding of the reasons behind these differences is beyond the scope of our work, a more throughout investigation can shed further light regarding which groups are at higher risk of not being correctly diagnosed.
As expected, MIME approach markedly reduced misclassification bias in overall and strata-specific prevalence estimates. INS2014 MIME-corrected estimates were similar to the objectively measured INSEF prevalence for overall sample (35.3% vs. 36.0%) and in all but the 55–64 yo population subgroups. This is likely to be related to the magnitude of bias, with more heavily biased estimates being harder to correct. Furthermore, we identified and accounted for bias in exposure-outcome associations. It has been previously shown that in the presence of differential misclassification of the outcome the measures of association may be biased in any direction (away or towards null) . In our study, compared to objectively measured, INS2014 self-reported data yielded similar prevalence ratio estimates by education level, but overestimated association with age (7.7 vs. 5.9) and considerably underestimated association with sex (0.8 vs. 1.2). In multivariate analysis, only prevalence ratio by sex required correction (0.8 vs. 1.2). MIME approach successfully corrected direction of association with sex in bivariate and multivariate analysis and also corrected magnitude of the associations with age, where bias were less pronounced. It should be noted that smaller bias in multivariate analysis for the self-reported data might be related to distinct directions of bias to the included variables. Overall, these results are in line with previous research, that reported comparable performance of MIME for prevalence estimates and associations correction with both internal and external validation data [19, 20].
INS2014 sample size was approximately 3-fold the INSEF sample, we thus expected MIME-corrected estimates to be more precise than those derived from objectively measured INSEF data. However, we achieved little or no gain in estimates precision. MIME yielded 12–39% smaller SE for prevalence estimates stratified by sex and age group, whereas for prevalence rates by educational level MIME-corrected SE were 3–11% higher compared to INSEF sample. Simulation studies indicate that gains in precision and MSE depend both on the relative sample size of survey and validation datasets and quality of the multiple imputation models [12, 17, 20]. As recommended in literature, we included in the model potential risk factors for hypertension and variables related to misclassification process in the Portuguese context; our model showed excellent discriminating accuracy (AUC = 0.92). However, we were not able to include other, potentially relevant variables (e.g., body mass index), not present in both surveys or not collected with comparable questions and therefore which could not be included in the imputation model. Inclusion of these additional variables could improve MIME performance. In addition, the sample size ratio between two surveys was small, which might also explain the low gain in precision. More pronounced improvements in precision have been reported in the USA, with 17-fold sample size ratio between NHANES and NHIS . Our results were similar to Edwards et al. , who used data with 3.4-fold sample size ratio. Although MIME yielded a small gain in precision, we observed considerably lower MSE for corrected estimates than in the original self-reported data due to bias reduction. Even with little or no gain in precision, bias correction in survey estimates may be important for evidence-based decision-making and public health planning. This is particularly relevant when bias is large or changes the direction of the associations, as in our case, where self-reported data completely distorted sex differences in hypertension prevalence, indicating higher amount of disease among women.
Whilst research may be interested solely in prevalence estimates, most frequently the interest is in exposure-outcome associations. MIME has been previously used in different study designs to correct odds ratios, risk ratios, and hazard ratios [15, 17]. In our study, MIME correction was successfully applied to prevalence ratios estimated by Poisson regression. This result corroborates the flexibility of the MIME approach, applicable to several model types. Another advantage of this correction method is that it allows to account for complex sample design . Nowadays, almost all large-scale surveys use complex sample features (stratification, clustering) to reduce data collection cost. Furthermore, multiple imputation might be extended to account simultaneously for misclassification error and missing data . In our study, the proportion of missing data in covariates used for imputation was below 5%, so we excluded item-missing data. Nevertheless, this might be an important aspect to consider in other studies.
Our approach had several limitations. First, we assumed that information on objectively measured hypertension in INS2014 was missing at random (Table S6), and that missingness was not dependent on individuals’ hypertension status. While we cannot formally assess this, a possible selection bias related to hypertension status were minimized by survey design and data collection method, given that INS2014 is representative of Portuguese population .
Second, we assumed that misclassification error properties and imputation model are transportable between surveys. Transportability is a critical issue and it arises regardless of the method used to correct estimates for measurement error [12, 20, 31]. When validation data does not correctly reflect the relationship between self-reported and the objectively-measured, more severe bias may be introduced by correction . We extensively investigated surveys comparability for a large number of covariates and demonstrated that proportion of different categories of participants were similar between them. Although the reference period for INSEF and INS2014 survey questions for self-reported hypertension was different (“past 12 months” vs. “Do you have any longstanding…”) self-reported hypertension prevalence and prevalence ratio estimates were also comparable, which is reassuring.
Third, data collection settings may represent an additional source of bias, affecting the transportability of imputation model. In INSEF, interviews and examinations were conducted by health professionals in primary healthcare facilities, while in INS2014 most participants were interviewed at home (93.1%) and a small proportion participated via self-administrated questionnaire filled in a web application (6.9%) . We were not able to take these differences into account.
Finally, our approach may be applied only if individual-level data on misclassification and good predictors for multiple imputation model are available. If this is not the case and research has no reliable auxiliary data on misclassification error, alternative methods have been proposed .
Although external validation data has been successfully used for self-reported estimates correction in different contexts [16, 20, 32], it may be more reliable to use internal validation data for MIME. Therefore, we recommend addressing the issue of self-reported misclassification by conducting a large-scale survey that incorporates collection of objectively measured blood pressure for a random subset of participants. A combined survey will benefit from a large sample size, precision and representativeness of estimates on required levels of disaggregation and, at the same time, provide relevant information on misclassification error associated to self-report, making possible its analysis and correction reducing transportability issues. It also requires less preparation that two separated surveys, thus increasing efficiency. Selection of predictors for the imputation model should be done for each particular case, as misclassification error properties verified in the present study may not hold for forthcoming European Health Interview Survey waves and other surveys in Portugal, other target populations or settings. National and regional HES carried out in several EU member states in the last decades have demonstrated that direction and severity of misclassification bias in self-reported hypertension vary across place [5, 6, 9] and time  and thus might seriously compromise comparisons of European Core Health Indicator between EU member states, regardless of efforts to produce comparable health information. Moreover, reliance on hypertension self-report might affect monitoring of time trends and evaluation of health interventions. Portugal and many other EU countries consider cardiovascular disease as priority for action . When health programs targeting high blood pressure and cardiovascular disease risk factors are in place it is reasonable to expect that awareness of risk factors, in particular hypertension, increases over time with programs implementation, leading to changing misreporting patterns. Therefore, collection of objectively measured blood pressure using standardized measurement protocols should be continued to reassess misclassification bias in the European Core Health Indicator and adjust for it in the further research. The method now proposed can be easily extended to other health indicators subject to misclassification for which a gold-standard measures are feasible to be collected by HES, such us obesity, elevated cholesterol levels, smoking, anemia, among others.
In conclusion, our results support previous research questioning accuracy of self-reported hypertension to estimate hypertension prevalence and exposure-outcome associations in general population and highlight the importance of bias analysis when using self-reported data on hypertension. MIME approach may be useful to assess the robustness of the research conclusions and correct for bias in this instance.
Availability of data and materials
Access to the micro data for the INS2014, Portuguese component of the European Health Interview Survey 2, can be requested from Eurostat via a research contract. INSEF data may be provided upon reasonable request and with permission of Ethical Committee of the Instituto Nacional de Saúde Dr. Ricardo Jorge.
Area under receiver operating characteristic curve
Health examination survey
95% confidence interval
National Health Interview Survey
Portuguese Health Examination Survey
- ISCED 2011:
2011 International Standard Classification of Education
Multiple imputation for misclassification error
Mean squared error
The USA National Health and Nutrition Examination Survey
The USA National Health Interview Survey
Verschuuren M, Gissler M, Kilpeläinen K, Tuomi-Nikula A, Sihvonen A-P, Thelen J, Gaidelyte R, Ghirini S, Kirsch N, Prochorskas R, Scafato E, Kramers P, Aromaa A. Public health indicators for the EU: the joint action for ECHIM (European Community Health Indicators & Monitoring). Arch Public Health. 2013;71(1):12. https://doi.org/10.1186/0778-7367-71-12.
Bogaert P, Van Oyen H. An integrated and sustainable EU health information system: national public health institutes’ needs and possible benefits. Arch Public Heal. 2017;75(1):3. https://doi.org/10.1186/s13690-016-0171-7.
Direção Geral da Saúde. Plano Nacional De Saúde. Revisão e Extensão a 2020. Lisboa; 2015.
Paalanen L, Koponen P, Laatikainen T, Tolonen H. Public health monitoring of hypertension, diabetes and elevated cholesterol: comparison of different data sources. Eur J Pub Health. 2018.
Gonçalves VSS, Andrade KRC, Carvalho KMB, Silva MT, Pereira MG, Galvao TF. Accuracy of self-reported hypertension: a systematic review and meta-analysis. J Hypertens. 2018;36(5):970–8 [cited 2019 Jun 11]. Available from: http://www.ncbi.nlm.nih.gov/pubmed/29232280.
Tolonen H, Koponen P, Mindell JS, Männistö S, Giampaoli S, Dias CM, et al. Under-estimation of obesity, hypertension and high cholesterol by self-reported data: comparison of self-reported information and objective measures from health examination surveys. Eur J Pub Health. 2014;24(6):941–8. https://doi.org/10.1093/eurpub/cku074.
Mills KT, Bundy JD, Kelly TN, Reed JE, Kearney PM, Reynolds K, et al. Global disparities of hypertension prevalence and control. Circulation. 2016;134(6):441–50 [cited 2020 Sep 16]. Available from: https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.115.018912.
Kislaya I, Tolonen H, Rodrigues AP, Barreto M, Gil AP, Gaio V, Namorado S, Santos AJ, Dias CM, Nunes B. Differential self-report error by socioeconomic status in hypertension and hypercholesterolemia: INSEF 2015 study. Eur J Pub Health. 2019;29(2):273–8. https://doi.org/10.1093/eurpub/cky228.
Polonia J, Martins L, Pinto F, Nazare J. Prevalence, awareness, treatment and control of hypertension and salt intake in Portugal. J Hypertens. 2014;32(6):1211–21. https://doi.org/10.1097/HJH.0000000000000162.
Rodrigues AP, Gaio V, Kislaya I, Graff-Iversen S, Cordeiro E, Silva AC, Namorado S, Barreto M, Gil AP, Antunes L, Santos A, Miguel JP, Nunes B, Dias CM, INSEF Research group. Sociodemographic disparities in hypertension prevalence: results from the first Portuguese national health examination survey. Rev Port Cardiol. 2019;38(8):547–55. https://doi.org/10.1016/j.repc.2018.10.012.
Tolonen H, Koponen P, Aromaa A, Conti S, Liv SG, Mark G, et al. Recommendations for the Health Examination Surveys in Europe. National Public Health Institute, editor: Julkaisija-Utgivare-Publisher; 2008.
Ni J, Dasgupta K, Kahn SR, Talbot D, Lefebvre G, Lix LM, Berry G, Burman M, Dimentberg R, Laflamme Y, Cirkovic A, Rahme E. Comparing external and internal validation methods in correcting outcome misclassification bias in logistic regression: a simulation study and application to the case of postsurgical venous thromboembolism following total hip and knee arthroplasty. Pharmacoepidemiol Drug Saf. 2019;28(2):217–26. https://doi.org/10.1002/pds.4693.
Lash TL, Fox MP, Maclehose RF, Maldonado G, Mccandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43(6):1969–85. https://doi.org/10.1093/ije/dyu149.
Keogh RH, Shaw PA, Gustafson P, Carroll RJ, Deffner V, Dodd KW, et al. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: part 1—basic theory and simple methods of adjustment. Stat Med. 2020;39(16):2197–231 [cited 2020 Aug 14]. Available from: https://pubmed.ncbi.nlm.nih.gov/32246539/.
Shaw PA, Gustafson P, Carroll RJ, Deffner V, Dodd KW, Keogh RH, et al. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: part 2—more complex methods of adjustment and advanced topics. Stat Med. 2020;39(16):2232–63 [cited 2020 Aug 14]. Available from: https://pubmed.ncbi.nlm.nih.gov/32246531/.
Mentz G, Schulz AJ, Mukherjee B, Ragunathan TE, Perkins DW, Israel BA. Hypertension: development of a prediction model to adjust self-reported hypertension prevalence at the community level. BMC Health Serv Res. 2012;12(1):312. https://doi.org/10.1186/1472-6963-12-312.
Edwards JK, Cole SR, Troester MA, Richardson DB. Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data. Am J Epidemiol. 2013;177(9):904–12. https://doi.org/10.1093/aje/kws340.
Cole SR, Chu H, Greenland S. Multiple-imputation for measurement-error correction. Int J Epidemiol. 2006;35(4):1074–81 [cited 2019 Jul 19]. Available from: http://www.ncbi.nlm.nih.gov/pubmed/16709616.
Livingston MD, Cannell B, Muller K, Komro KA. Comparing methods of misclassification correction for studies of adolescent alcohol use. Am J Drug Alcohol Abuse. 2018;44(2):160–6 [cited 2019 Jul 19]. Available from: https://www.tandfonline.com/doi/full/10.1080/00952990.2017.1421212.
Schenker N, Raghunathan TE, Bondarenko I. Improving on analyses of self-reported data in a large-scale health survey by using information from an examination-based survey. Stat Med. 2010;29(5):533–45. https://doi.org/10.1002/sim.3809.
Dias CM. 25 anos de Inquérito Nacional de Saúde em Portugal. Rev Port Saúde Pública. 2009:51–60.
Direção-Geral da Saúde. Programa nacional para as doenças cérebro-cardiovasculares 2017. Lisboa; 2017.
Brakenhoff TB, Mitroiu M, Keogh RH, Moons KGM, Groenwold RHH, van Smeden M. Measurement error is often neglected in medical literature: a systematic review. J Clin Epidemiol. 2018;98:89–97 Elsevier USA. [cited 2020 Sep 4]. Available from: https://pubmed.ncbi.nlm.nih.gov/29522827/.
Ranker LR, Petersen JM, Fox MP. Awareness of and potential for dependent error in the observational epidemiologic literature: a review. Ann Epidemiol. 2019;36:15–19.e2 Elsevier Inc. [cited 2020 Sep 4]. Available from: https://pubmed.ncbi.nlm.nih.gov/31402082/.
Instituto Nacional de Estatística. Inquerito Nacional de Saúde 2014. Lisboa: INE; 2016. p. 310. Available from: www.ine.pt
Nunes B, Barreto M, Gil AP, Kislaya I, Namorado S, Antunes L, et al. The first Portuguese National Health Examination Survey (2015): design, planning and implementation. J Public Health (Bangkok). 2018; [cited 2018 Sep 30]; Available from: http://www.ncbi.nlm.nih.gov/pubmed/30239797.
Archer KJ, Lemeshow S. Goodness-of-fit test for a logistic regression model fitted using survey sample data. Stata J. 2006;6(1):97–105. https://doi.org/10.1177/1536867X0600600106.
Barros AJ, Hirakata VN. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol. 2003;3(1):21. https://doi.org/10.1186/1471-2288-3-21.
Tamhane AR, Westfall AO, Burkholder GA, Cutter GR. Prevalence odds ratio versus prevalence ratio: choice comes with consequences. Stat Med. 2016;35(30):5730–5 [cited 2021 Feb 10]. Available from: /pmc/articles/PMC5135596/.
StataCorp. Stata statistical software: release 15. College Station: StataCorp LP; 2017.
Lohr SL, Raghunathan TE. Combining survey data with other data sources. Stat Sci. 2017;32(2):293–312. https://doi.org/10.1214/16-STS584.
Drieskens S, Demarest S, Bel S, De Ridder K, Tafforeau J. Correction of self-reported BMI based on objective measurements: a Belgian experience. Arch Public Heal. 2018;76(1):16–9.
The authors are grateful to all the professionals that were involved in the INSEF and INS2014 fieldwork and to all the INSEF and the INS2014 participants.
No specific funding was received for this study. The Portuguese National Health Examination Survey 2013–2017 (INSEF) was developed as part of the Pre-defined project of the Public Health Initiatives Program, “Improvement of epidemiological health information to support public health decision and management in Portugal. Towards reduced inequalities, improved health, and bilateral cooperation”, that benefits from a 1.500.000€ Grant from Iceland, Liechtenstein and Norway, through the EEA Grants.
Ethics approval and consent to participate
INSEF was approved by the National Commission for Data Protection, by the Ethical Committees of the Instituto Nacional de Saúde Dr. Ricardo Jorge and ethical committees of all project partners. All participants provided written inform consent. The INS2014 was developed in accordance with the principles of statistical confidentiality under the Regulation (EC) No 223/2009 of the European Parliament and of the Council of 11 March 2009 on European statistics.
Consent for publication
Not applicable, there are no details on individual participants within the manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. Common variables in INS2014 and INSEF used in the study. Table S2. Sociodemographic characteristics of INSEF and INS2014 participants aged 25–74 years old. Table S3. AUC for imputation logistic regression models. Table S4. Standard error (SE) for INS2014 self-reported, MIME-corrected and INSEF objectively measured hypertension prevalence estimates. Table S5. Self-reported, objectively measured and MIME-corrected adjusted prevalence ratios of hypertension according to sex, age group and educational level. Table S6. Coefficients of logistic regression model for the probability of examination-based hypertension being missing.
About this article
Cite this article
Kislaya, I., Leite, A., Perelman, J. et al. Combining self-reported and objectively measured survey data to improve hypertension prevalence estimates: Portuguese experience. Arch Public Health 79, 45 (2021). https://doi.org/10.1186/s13690-021-00562-y
- Misclassification error
- Bias correction
- Multiple imputation