Cross-national comparisons of health indicators require standardized definitions and common data sources

Background Health indicators are used to monitor the health status and determinants of health of the population and population sub-groups, identify existing or emerging health problems which would require prevention and health promotion activities, help to target health care resources in the most adequate way as well as for evaluation of the success of public health actions both at the national and international level. The quality and validity of the health indicator depends both on available data and used indicator definition. In this study we will evaluate existing knowledge about comparability of different data sources for definition of health indicators, compare how selected health indicators presented in different international databases possibly differ, and finally, present the results from a case study from Finland on comparability of health indicators derived from different data sources at national level. Methods For comparisons, four health indicators were selected that were commonly available in international databases and available for the Finnish case study. These were prevalence of obesity, hypertension, diabetes, and asthma in the adult populations. Our evaluation has three parts: 1) a scoping review of the latest literature, 2) comparison of the prevalences presented in different international databases, and 3) a case study using data from Finland. Results Literature shows that comparability of estimated outcomes for health indicators using different data sources such as self-reported questionnaire data from surveys, measured data from surveys or data from administrative health registers, varies between indicators. Also, the case study from Finland showed that diseases which require regular health care visits such as diabetes, comparability is high while for health outcomes which can remain asymptomatic for a long time such as hypertension, comparability is lower. In different international health related databases, country specific results differ due to variations in the used data sources but also due to differences in indicator definitions. Conclusions Reliable comparison of the health indicators over time and between regions within a country or across the countries requires common indicator definitions, similar data sources and standardized data collection methods.

Conclusions: Reliable comparison of the health indicators over time and between regions within a country or across the countries requires common indicator definitions, similar data sources and standardized data collection methods.
Keywords: Health information, Obesity, Hypertension, Diabetes, Asthma, Indicator, Comparability, Standardization Background Health indicators are used to monitor the health status and determinants of health of the population and population sub-groups and identify existing or emerging health problems which would require prevention and health promotion activities. They also help to target health care resources in the most adequate way as well as for evaluation of the success of public health actions both at the national and international level. Health indicators are also used in international comparisons and benchmarking between countries and estimating changes over time within countries.
For example, at the European level, several healthrelated indicators are regularly reported in the Health at a Glance reports [1], and the European Core Health Indictor (ECHI) tool [2], both allowing cross-country comparisons and benchmarking of several indicators. More globally, the WHO Global action plan for prevention and control of noncommunicable diseases (NCDs) in 2013-2020 has explicitly defined targets related to harmful use of alcohol, physical inactivity, tobacco use, obesity, hypertension, and diabetes, and it promotes monitoring of these indicators [3,4]. It has also provided definitions for these indicators and recommendations for data sources to ensure comparability of the results between countries.
For cross-country comparisons, standardized definitions of the used indicators are needed. The European Health Examination Survey (EHES) has provided standardized indicator definitions for example for obesity, elevated blood pressure, elevated cholesterol, and diabetes when data from objective health measurements are available [5]. The ECHI shortlist is a more extensive list of standardized indicator definitions including 88 indicators which are split into five main chapters: demographic and socioeconomic, health status, health determinants, health services and health promotion. For each indicator, a metadata definition is provided in the documentation sheets [6]. The ECHI indicators cover a wide range of health-related topics, thus they are relevant for several policy areas, supporting the health in all policies (HIAP) approach [7]. Data used to calculate health indicators can be obtained from a variety of data sources such as administrative registers, disease registries, health interview or health examination surveys, and cohort or longitudinal studies [8]. These can be national or regional and the updating frequency may vary by country and data source. One of the common data sources, especially in the international comparisons, is self-reported data from health interview surveys. From another angle, information could also be obtained through objective measurements during the health examination survey or from administrative health records, e.g., covering. in-and out-patient health care visits, and prescribed medications.
In the framework of the Joint Action on Health Information (InfAct -Information for Action), the aim is to create and develop a sustainable solid infrastructure on EU health information through improving the availability of comparable, robust, and policy-relevant data on health status, health determinants and health care services, as well as health system performance information [9]. One specific aim relates to the standardization of health information instruments, tools and methods including health indicators and related data sources.
In this study we will evaluate existing knowledge about comparability of different data sources for the definition of health indicators, compare how selected health indicators presented in different international databases possibly differ, and finally, present the results from a case study from Finland on comparability of health indicators derived from different data sources at national level. Based on these findings we will discuss implications of these potential differences for cross-country comparisons, as well as when estimating time trends at national level.

Methods
For comparisons, four health indicators, commonly available in international databases and for the Finnish case study, were selected. These indicators were prevalence of obesity, hypertension, diabetes, and asthma in adult population. Our evaluation has three parts: 1) a scoping review of the latest literature, 2) a comparison of the prevalences presented in different international databases, and 3) a case study using data from Finland.

A scoping review
A scoping review for the comparability of different data sources for estimation prevalence of obesity, hypertension, diabetes, and asthma was conducted focusing to the literature published in English after 2000. For outcomes where limited number of publications were available, also publications from late 1990's were considered. In comparison to systematic review, a scoping review will provide an overview of the available research evidence answering the question on what kind of information has been presented in the literature [10]. For this scoping review, a search was conducted in December 2020-January 2021 using PubMed and the following search terms in different combinations: obesity, hypertension, diabetes, asthma, prevalence differences, different data sources, comparison of different data sources, self-reported, HIS, health examination, health examination survey, medical record, register, administrative data, objective, validation and validity. See more details about the scoping review process in the Appendix 1.

Review of international health related databases
The following international databases were reviewed for information about the definition and prevalence of obesity, hypertension, diabetes, and asthma: the European Common Health Indicator (ECHI) database through the ECHI data tool [11], the WHO Global Observatory data repository [12], the WHO European Health for All database [13], and the OECD Health data [14].

Case study from Finland
As a case study for the comparison between different data sources we used data from the Finnish national health examination survey, the National FinHealth Study conducted in 2017 [15]. The FinHealth 2017 survey covered the general population of Finland aged 18+ years and living in the mainland of Finland (i.e., excluding Åland). Data was collected through questionnaires (self-administered and interviewed), physical measurements and collection of biological samples during the visit to the examination centres located around the country. A total of 10,247 persons were randomly selected from the national population register using stratified one-and two-stage sampling and oversampling for person age 80+ years. Participation rate to the health examination was 58% (n = 5952). The study was approved by the Ethics Committee of Helsinki and Uusimaa hospital district (37/13/03/00/2016 on 2016-03-22) and written informed consent was obtained from all participants. Informed consent included also consent to link survey data to different administrative registers.
Survey data was linked at the individual level to administrative health registers using the unique personal identification code (PIC) provided for all permanent residents in Finland. This PIC is systematically used in different data sources for identification of individuals.
Linked health registers were Care Register for Health Care [16] and Register of Primary Health Care visits [17] covering all visits for public health care sector but excluding visits to the private health care sector. In both registers, each visit is recorded on individual level together with the date and diagnoses for the visit (ICD-10 or ICPC-2 partly for primary health care visits). These two registers also include information about prescribed medications using ATC (Anatomical Therapeutic Chemical) Classification codes. The Care Register for Health Care also includes information about surgical operations using NCSP (Nordic Classification of Surgical Procedures). For record linkage, permission from the register owners was obtained.
Survey weights were used in the calculation of prevalences in order to take into account the study design and to adjust for non-participation. Cohen's kappa coefficient (κ) [18] was used to measure the agreement of survey and register data on the individual level.
For estimation of the prevalences from different data sources, definitions given in the Table 1 were used.

Scoping review
The scoping review focused on differences in the obtained prevalence estimates for obesity, hypertension, diabetes, and asthma when derived from different data sources for same population (Appendix 2). Observed differences between data sources varied by health outcomes (Fig. 1).
For obesity, several studies have demonstrated that prevalence based on self-reported height and weight tends to be lower when compared to prevalence based on measured data, usually from health examination surveys [19][20][21][22][23][24]. When comparing selfreported results with those obtained from medical registers results are inconclusive, i.e. there is indication that self-reported obesity prevalence would be higher [25] and other studies show similar results between these two sources [32]. Measured obesity prevalences tend to be higher when compared to data from medical records [32].
For hypertension, prevalence based on self-reported information tends results lower prevalence of hypertension in comparison to data obtained through objective survey measurements of blood pressure [21,[26][27][28]. When results based on self-reported information are compared to register based information, some studies are reporting lower results [26,[29][30][31][32], and some higher results [33][34][35][36]. For measured hypertension, higher results are often observed in comparison to medical records [29,32].
For diabetes, in some studies, self-reported information has provided lower diabetes prevalence than prevalence based on objective survey measurements [21,31] and also self-reported diabetes prevalence has been higher than what has been obtained from medical records [37].
For asthma, limited number of studies comparing the prevalence among adults through different data sources was available even though there were several studies  focusing on paediatric asthma. Among adults, reported results indicate that prevalence of asthma based on selfreported information is lower in comparison to medical records [31,34,36,38,39].

International health related database
Reviewing four different international databases (ECHI data tool, WHO Global Observatory data repository, WHO Europe Health for All Database, and OECD database) which cover health indicators, we evaluated the definitions used for these indicators, the data sources these indicators were calculated from, and how actual reported prevalences by European Union (EU) Member States (MSs) differed.
Only the ECHI data tool covered all four indicators (obesity, hypertension, diabetes, and asthma), generally results were presented for age group 18+ years, except in the OECD database which presented results for age group 15+ years. Only the ECHI data tool used systematically data from the European Health Interview Survey for all four indicators, i.e. self-reported data. In other databases, data sources varied between indicators but also within indicators between countries (Table 2).
When the actual prevalence estimates at the country level within EU MSs were compared between different international databases, we observed substantial differences (Fig. 2). For comparisons year 2014 was selected since in all databases data for this year was available.
When comparing prevalences presented in different databases (Appendix 2), we observed differences, even though results represent the same age group and year. The OECD data cannot be directly compared with other databases since it covered a different age group and presented prevalence for overweight or obesity instead of obesity only.
For the prevalence of obesity and hypertension, the WHO Global Health Observatory numbers tend to be little higher than those reported in the ECHI data tool which can be explained by different data sources. For diabetes, the difference is more divided.

Case study from Finland
Obesity, and elevated blood pressure or hypertension, were two indicators which had major differences between data sources. For obesity, prevalence ranged from 2.7% in register-based data to 26.1% when survey data (self-reported or measured) was used, and for hypertension from 16.8 to 45.5%. For diabetes and asthma, the obtained prevalences were rather similar between the data sources (Table 3).
Looking at the population level, prevalence of different indicators does provide only one perspective to the comparability of different data sources. It does not tell how much of all the identified cases can be obtained from a specific data source. The agreement of register and survey data in the individual level varied from κ = 0.11 for obesity to κ = 0.83 for diabetes (Table 4).
For obesity, 91% of all cases were identified only by survey data, i.e. those persons did not have any diagnoses in the administrative registers about obesity, 8% were identified as obese by both survey and register, data and only 0.1% had obesity related diagnose but were not identified as obese based on self-reported or measured BMI in the survey.
Also, for hypertension or elevated blood pressure, the majority of the cases were observed in the survey data. Almost 2/3 of all cases were identified only by survey data, 1/3 by both survey and administrative register data, and only few cases (2.0%) were identified only by register data. For diabetes and asthma, for which at the population level prevalence provided rather similar results between different data sources, we can see that for only 59% of asthma and 73% of diabetes cases were identified by both survey data as well as through administrative registers. For diabetes, most of the additional cases (18%) come from survey data while for asthma, 23% of cases come only from the administrative register data.

Discussion
Health indicators are widely used for many purposes from monitoring and allocation of available resources to information-based policy decision making. Therefore, it is important to have reliable, well defined indicators which allow at the national level evaluation of trends and differences between regions and population subgroups. The indicators should allow also cross-country comparisons and benchmarking at the international level.
For reliable health indicators we can define two main components; 1) the data that was used and 2) the definition of the indicator. Both affect the quality of obtained estimates. If your data is biased or incomplete, your indicator estimate will also be biased regardless of the definition you use. On the other hand, if the definition of the indicator is not valid for the phenomena to be measured, high quality data does not help, and your estimates will be biased.
As our study has demonstrated, the same health indicators such as prevalence of obesity, hypertension, diabetes, and asthma, can be defined in several different ways either based on person's own statement of the situation, using objective measurements, or relying on medical records of health care visits or use of prescribed medications.
From the literature and previous studies, we know that self-reported information may suffer from recall bias [40], awareness bias and from nonresponse bias [41]. A part of recall bias is something called a telescoping bias [42], where a person perceives recent events as being more remote than they are and distant events as being more recent than they actually are. Awareness bias is evident when a person has never been diagnosed with the health condition for example hypertension which can be asymptomatic for a long time. Non-response bias is well documented for survey research and unfortunate fact is that survey non-response is usually selective and not at random. This may result that those with health problems are not responding to the survey and obtained prevalence estimates get underestimated.
Information obtained through objective health measurements conducted during the health examination survey do not suffer from recall or awareness bias but are still prone for non-response bias similar to selfreported data. Measurements during the survey visit do not correspond to clinical diagnoses. For example,  diagnoses of hypertension require subsequent measurements over time. However, measured survey data can be used to identify persons with potential risk and need for further screening.
From medical records, we can obtain information about diagnosed and treated cases but again, the asymptomatic cases which are not yet diagnosed remain uncounted for. There may also be differences in the coverage of the target population by different registers which again may cause some coverage bias to the results. A study from the Netherlands demonstrates relatively good comparability of the data between different registers for diabetes [43]. Combining different registers for example increased the prevalence of diabetes by 3%.
In our case study from Finland, the main limitations are that data from the medical reports does not cover data from occupational health care if provided by private companies the data hasn't been transferred to the national records, and problems in data transfer from local patient records to the national registers due to differences in the used electronic systems. Also, not all practitioners in the public health centres or out-patient clinics systematically record the diagnoses for each visit. The coverage of hospital data is known to be better.
As has been demonstrated by our results, none of the data sources is perfect but they all have some limitations. In an ideal case, one could combine several different data sources at the individual level and compose indicators based on this combined information. In the Netherlands, different health institutes have formed a consortium which provides information for the Ministry of Health and they jointly define most relevant data sources for specific indicators to be reported [44].
Different international databases mainly rely on publicly available data from different data sources or data specifically provided for them by national authorities. Therefore, availability and comparability of different indicators varies by age group but also by year for which information is available.
When comparing country level prevalence estimates for obesity, hypertension, and diabetes between different international databases, we observed that results from the ECHI data tool were systematically lower than those obtained from WHO Global Health Observatory Database even though they covered the same age group and represented the same year. This can be partly explained by the differences in used data sources. ECHI information is based on self-reported data from the European Health Interview Survey while the WHO Global Health Observatory uses data from objective measurements. Our scoping review also demonstrated that self-reported information tends to result in lower prevalences than measured data.
For many health indicators, standardized definitions are provided by the ECHI shortlist [6]. International organizations such as European Commission (EC), WHO and OECD have worked together towards common definitions. This is an essential step towards comparable data.
Taking into consideration all the pros and cons of different data sources, and through that their effect on indicator definitions, it is important to know what data is used to build the health indicators you use. This is needed to ensure that when you compare results between countries or over time, you use same data sources, same definitions, and same reference age group. Even when the data sources are similar, there may be methodological differences which limit data comparability, e.g. used sampling frames and response rates in EHIS, using interviews or self-administered mailed questionnaires, as well as differences in patient record systems, clinical and recording practices affecting register data.

Strengths and limitations of the study
Strengths of our study are that we have evaluated possible effects of indicator definitions and used data sources using three different methods: a scoping review, review of existing international databases and conducting a case study on existing data from Finland.
We focused only in four health indicators in this study, and this is obviously also a limitation since some other indicators might have provided different results. Also, our case study covered only one country and results may not be directly generalizable to other countries due to differences in the available data sources. But it was not possible to obtain information for other countries covering all three data sources; self-reported, objective measurements and register data, on same individuals.

Conclusions
Without common definition of the health indicators and standardized methods for data collection, such as provided by ECHI, EHIS or EHES, cross-country comparisons are not meaningful as it is not possible to determine whether observed differences are due to real differences between the countries or to different definitions of the same indicators. It is essential to ensure that similar data sources and standardized data collections are used in these cross-country comparisons. As it is almost impossible to fully standardize data collection, the cross-country comparisons should be made with caution keeping in mind the possible sources of bias.