 Research
 Open Access
 Published:
HIV survey in Mozambique: analysis with simultaneous model in contrast to separate hierarchical models
Archives of Public Health volume 78, Article number: 70 (2020)
Abstract
Background
The analysis of correlated responses obtained one at a time in survey data is not as informative or as useful as modeling them simultaneously. Simultaneous modeling allows for the opportunity to evaluate the system in a more pragmatic form rather than to allow for responses that assumedly originated in isolation.
Methods
This research uses the Mozambique National Survey data to demonstrate the benefits of simultaneous modeling on blood test results, knowledge of HIV/AIDS, and awareness of an HIV/AIDS campaign. This simultaneous modeling also addresses the correlation inherent due to the hierarchical structure in the data collection.
Results
Employment and selfperceived risk of HIV/AIDS have different impact on blood test, awareness of an HIV/AIDS campaign, and knowledge of HIV/AIDS when examined simultaneously as opposed to separate modeling.
Conclusion
Simultaneous modeling of correlated responses improves the reliability of the estimates. More importantly, it provides an opportunity to engage in costsaving decisions when designing future surveys and make better health policies.
Background
It is common in national health research to use survey data to advance health policies. Survey results provide a measure of policy priorities. For example, the Demographic and Health Survey (DHS) is conducted in over 90 nations globally to obtain representative data on population health, nutrition, and HIV/AIDS. Data from these surveys are analyzed to identify trends and to advance global health research agendas and national programs and policies [1, 2].
National health surveys are often used to generate information that are critical in describing national and regional trends to identify gaps in knowledge. However, suboptimal analytic practices threaten the evidence base used for programmatic and policy decisions. Although national surveys capture multiple outcomes of interest, these outcomes are often modeled separately, thereby ignoring the correlation among outcomes or the interplay between outcomes. Also, the hierarchical design of national survey results in obtaining correlated observations are often omitted. This sort of omission leads to incorrect conclusions as incorrect standard errors are computed [3, 4]. The problem is best addressed with simultaneous modeling of responses while accounting for the hierarchical structure of the survey data.
Mozambique is an example of a nation in subSaharan Africa that is severely impacted by the HIV/AIDS epidemic. The disease is one of the single largest global health priorities of the past two decades, with $562.6 billion spent globally between 2000 and 2015 as reported by the Global Burden of Disease Health Financing Collaborator Network 2018. Innumerable analyses have characterized the HIV/AIDS epidemic and its drivers within and across contexts [5, 6]. Many of such decisions are made based on the Mozambique National Survey.
Methods
Mozambique survey data
This research utilizes a nationally representative, random sample of edited and cleaned from the Mozambique health data website. These data represent 270 clusters (primary units) distributed and sampled across Mozambique’s 11 provinces. The data consist of 6232 households (secondary units) eligible for sampling. Men and women aged 15–64 living in these households are at the observational level and are eligible to participate by giving blood samples. There are 9311 adult participants. These data are available to estimate the prevalence of HIV/AIDS in the general population and to determine the impact of factors.
Outcomes of interest
This research concentrates on simultaneous modeling through the demonstration of three binary outcome measures of interest: blood test (positive or negative result), knowledge of HIV/AIDS (from community sources; a composite of a binary measure of participant awareness of HIV/AIDS from five sources: community meetings, school/teachers, conference in hospitals, community health workers, and church or mosque), and awareness of a campaign to combat HIV/AIDS (yes or no). These are binary outcome measures.
Covariates
The covariates include the following demographics: continuous variables in age and years of education; and a categorical variable in gender, religion, marital status, employment in the past 12 months (not working, worked in the past year, currently working), family wealth index measured on 5point ordinal scale (poorest, poorer, middle, rich, richest), and selfperceived risk of contracting HIV/AIDS (no risk, small risk, moderate risk, great risk, respondent HIVinfected). Binary factors include electricity in the household (yes/no) and received any support or social assistance (yes/no).
Statistical model
Separate binary model
The modeling of binary outcomes often make use of a standard logistic regression model. The standard logistic regression model belongs to the group of generalized linear model. It operates on the assumption that the observations are independent. However, when analyzing hierarchical data the independence assumption is no longer acceptable, so the researcher uses a generalized linear mixed model over a generalized linear model. In the generalized linear mixed logistic regression model, one must account for the intraclass correlation at the different levels of the hierarchical structure, most commonly through the use of random effects.
The intraclass correlation coefficient (ICC) indicates how much of the total variation in the probability is accounted for by the hierarchical level of the data. However, in fitting binary models, it often appears that there is no error at the lowest level of the hierarchy (level1), but that incorrect assumption still must be addressed. Therefore, a slight modification is needed to calculate the ICC. This modification assumes the dichotomous outcome comes from an unknown latent continuous variable with a level1 residual that follows a logistic distribution with a mean of 0 and a variance of 3.29. Therefore, 3.29 is used as our level1 error variance in calculating the ICC [7]. In these data, there are three levels, thus two random effects are identified. One random effect represents household effects and the other random effect represents the cluster effects [4]. Then, 3.29 represents variance of the residuals at the observational level (resident) [8].
Thus, for modeling one binary response, a generalized linear mixed model with the clusters and the households incorporated as random effects to model the contribution due to households and clusters respectively is
where p_{ihc} is the probability of a favorable outcome for the i^{th} resident within the h^{th} household within the c^{th} cluster, β_{i} is the regression coefficient associated with the predictor X_{ihc} for i = 1,2, …, n_{hc}; are the covariates associated with the h^{th} household within that c^{th} cluster, and a random effect household h =1, 2, …, n_{c}; and cluster c = 1, …, n; the random intercept u_{oc} measures the unobserved variance attributable to the c^{th} cluster, the random intercept u_{ohc} measures differences of household level h within cluster c . These two random effects are assumed to be normally distributed with \( {\mathrm{u}}_{\mathrm{oc}}\sim \mathcal{N}\left(0,{\upsigma^2}_{{\mathrm{u}}_{\mathrm{c}}}\right) \) and \( {\mathrm{u}}_{\mathrm{ohc}}\sim \mathcal{N}\left(0,{\upsigma^2}_{{\mathrm{u}}_{\mathrm{hc}}}\right) \). Further, this research assumes that the covariance of the random effects, \( {\sigma}_{{\mathrm{u}}_{\mathrm{oc}},{\mathrm{u}}_{\mathrm{ohc}}} \) is zero. Households as random effects represent the differences in the residents’ responses attributable to households, but were not captured by any of the covariates at the household level. Similarly, clusters as random effects represent the differences in the residents’ responses attributable to the clusters, but were not captured by any of the covariates at the cluster level.
Simultaneous models
In public health research, it is common to find subjects providing information on a cadre of health responses with a set of covariates. However, the correlation among these responses is helpful to public health officials and decision makers. The identification of the overlap helps with distribution of resources and helps avoid duplication. Thus, it is advantageous to have simultaneous modeling.
This research demonstrates use of three simultaneous binary outcomes Y_{1ihc}, Y_{2ihc}, and Y_{3ihc} denoting the i^{th} individual on the h^{th} household member of the c^{th} cluster for outcomes q= 1, 2, and 3 for h = 1, …. . n_{c}, and c = 1, …..270. A simultaneous model of these binary outcomes f(Y_{1hc}, Y_{2hc}, Y_{3hc}) consists of a sharedparameter that measures the correlation among the outcomes [8]. For q = 1, the response Y_{1ihc} for blood test follow a Bernoulli distribution with mean p_{1hc} and random effects u_{oc} for clusters and random effects u_{ohc} for households thus,
Similarly, models are available for heard of HIV/AIDS campaign [2] (q = 2) and heard of HIV/AIDS responses (q = 3). The joint modeling of these three outcomes has a vector of random effects u distributed as normal with mean vector 0 and covariance matrix, \( {\mathrm{D}}_{\mathrm{H}\cap \mathrm{C}}^{123} \), such that the random effects for levels in the hierarchy is
This \( {\mathrm{D}}_{\mathrm{H}\cap \mathrm{C}}^{123} \) covariance matrix contains the two random effects operating at different levels for the same response are independent, \( \left(\begin{array}{cc}{\mathrm{d}}_1^{\mathrm{q}}& 0\\ {}0& {\mathrm{d}}_2^{\mathrm{q}}\end{array}\right) \) for q = 1, 2, 3. In this scenario, there is no correlation among the random effects for a given outcome. However, there is correlation among the random effects across the different responses at the same level but not at different levels, for example \( \left(\begin{array}{cc}{\mathrm{d}}_1^{13}& 0\\ {}0& {\mathrm{d}}_2^{13}\end{array}\right) \). If the covariance \( {\mathrm{d}}_1^{qp}=0 \) then they are uncorrelated, and the resulting model is equivalent to modeling the three outcomes separately [9, 10]. The result of a test that [\( {\mathrm{d}}_1^{12},{\mathrm{d}}_2^{12},{\mathrm{d}}_1^{13},{\mathrm{d}}_2^{13},{\mathrm{d}}_1^{23},{\mathrm{d}}_2^{23} \)] are simultaneously zero determines if one needs the simultaneous model or if one will be satisfied with separate models.
Consider the model \( \mathrm{logit}\ \left({\hat{\mathrm{p}}}_{\mathrm{qhc}}\right) \), for q = 1, 2, 3. Let the vector \( {\mathrm{W}}_{\mathrm{Yhc}}^{\mathrm{b}} \) denotes the difference between the observed values and the model values, such that
The joint loglikelihood is
which achieves maximum estimator
where
Through, the use of a modification of the expectationmaximization (EM) algorithm, the researcher is able to obtain maximumlikelihood estimates for model parameters when there is unobserved (hidden) latent variables.
The maximum likelihood estimates for the correlated logistic regression model is obtained [11]. The iteration process in the EM algorithm context provides convergence to the true ML estimates [12]. It is an iterative way to approximate the maximum likelihood function.
This research presents simultaneous generalized linear mixed models for binary responses (knowledge of HIV/AIDS, awareness of an HIV/AIDS campaign, and blood testing for HIV/AIDS) using a shared joint random effects. This research uses these survey data to demonstrate the advantages of simultaneous modeling of these responses. These data are obtained based on a hierarchical structure. The SAS procedure PROC QLIM, among other models, fit simultaneous binary models. This procedure is designed to analyze mainly crosssectional data.
Results
The survey data contained 58% of respondents who are female and nearly 70% of respondents who are married or living with a partner. The average age of the respondents is 31 years, and the average years of education is three. Of the respondents, 54.08% are Catholic or Muslim. Approximately 25% households have electricity. Approximately 16% of the respondents did not work in the last 12 months. About 30.7% of the respondents are classified into the richest category, and 12.2% of the respondents are classified into the poorest category. About 31% of the respondents perceived that they had no risk of contracting HIV/AIDS. The blood tests reveal 13.4% of respondents are HIVinfected. There are 77.2% of the respondents aware of HIV from community organizations and other institutes (schools, hospitals, religious institutions), and about 55% of respondents are aware of a campaign to combat HIV/AIDS. These results are summarized in Table 1.
There are no respondents in the survey who tested positive and did not hear about HIV/AIDS, but heard about the campaign. In addition, there are no respondents who tested negative and did not hear about the disease, but heard about the campaign, as shown in Table 2.
The data are collected in an hierarchical structure. Respondents are nested within households and households are nested within clusters. The correlation due to this structure, households and clusters, are considered as random effects. The variances of the random effects at the household level and at the cluster levels are shown in Table 3. The estimates suggest that the variance of the random effects due to clusters are significant (blood test, aware of HIV/AIDS, and aware of campaign) and too large to ignore in any model [4]. The variance of the random household effects are significant in measuring the blood test, but not significant when modeling for awareness of HIV and awareness of campaigns.
Simultaneous hierarchical logistic models
The simultaneous modeling of the three binary outcomes provides an opportunity to address interplay among the responses. The estimates for this simultaneous model of these three binary outcomes (knowledge of HIV/AIDS, awareness of HIV/AIDS campaign, and blood testing for HIV/AIDS) are given in Table 4.
The model shows that having electricity in the house increases the likelihood of hearing about the HIV/AIDS campaign and decreases the HIVinfected rate (p < 0.0074). Wealthiest Mozambicans are more likely to have a positive blood test, knowledge of HIV/AIDS, and awareness of HIV/AIDS campaign in all models (p < 0.0024). Respondents with more years of education are more likely to be aware of HIV/AIDS campaign (p < 0.001). Respondents who perceived any risk (small, moderate, great) are more likely to have HIVinfected test results compared to those perceiving no risk (p < 0.001). Those residents who are married or living together are more likely to be HIVinfected (p < 0.001). Males are more likely to hear about HIV/AIDS campaign (p < 0.001). Support or social assistance is a significant factor only for knowledge of HIV/AIDS (p = 0.039). Marital status has no effect on knowledge but has an impact on blood test and awareness (p < 0.001 and (p0.034) respectfully. Risk of AIDS and richer residents are significant for all three responses. Similar covariates are significant in modeling awareness of campaigns to combat HIV/AIDS and for modeling awareness of HIV/AIDS (Table 4).
There are marked difference in separate modeling of these responses versus simultaneous modeling the responses. The simultaneous modeling accounts for the other responses in determining the impact of a covariate on a particular response. A separate response model is compared to the simultaneous model, as shown in Table 4. It shows pvalues (0.165 v 0.007 and 0.074 v 0.004) for knowledge of HIV/AIDS and awareness of HIV/AIDS campaigns, respectively. The impact of employed residents [in the past year] on awareness of HIV/AIDS is affected by the association between knowledge of HIV/AIDS and awareness of HIV/AIDS campaigns. The separate response model and the simultaneous model for employment for awareness of campaign showed pvalues of (0.2368 v 0.0267). The impact of [moderate risk] on knowledge of HIV/AIDS is affected by the association between knowledge of HIV/AIDS and awareness of HIV/AIDS campaign. The single model versus a simultaneous model showed pvalues (0.235 v 0.014 [selfperceived risk]) for knowledge of HIV/AIDS.
Conclusion
The survey data are correlated due to the hierarchical structure of the data. Statistical methods for the analysis of correlated data have become more accessible as statistical programs include the opportunity to use such models. The fit of correlated data with a generalized linear mixed model is common. However, it is important to note the analysis of correlated data does not have the same interpretation as when the data are assumed independent in its analysis. The analysis of correlated data with random effects are referred to as subjectspecific model.
Modeling simultaneous responses allows researchers to address correlation and explain the interplay. Such information results in cost saving measures in the design of future surveys. The advantage of simultaneous modeling lies with its ability to address one response while controlling for another. It is typical in survey data to have the respondents provide responses to a series of outcomes. More importantly, the simultaneous modeling of responses on hierarchical data provides policymakers and researchers with results on which to base allocation of resources at a time when funding is a scarce commodity.
Researchers are often faced with data with complicated structure but often choose to forgo complex models and rely on twoatatime modeling, one response and one covariate, with independent observations. However, there are multivariable methods [one response and several covariates] based on independent observations. For analysis, when the observations are not independent, a correlated model is necessary to identify the pattern of association. Such a model provides larger standard errors, which affects the significance of the covariates.
The analysis of the 2018 Mozambique survey data, like most survey data, present simultaneous responses [5, 6]. Modeling simultaneous responses allows for the interpretation of interplay, which can lead to cost saving in future surveys. This approach is unique in that it addresses simultaneously the factors and the extra variation, as well as the interplay usually seen in survey data [13].
Availability of data and materials
The datasets during and/or analyzed during the current study available from the corresponding author on reasonable request.
Abbreviations
 DHS:

Demographic and Health Survey
 HIV:

Human immunodeficiency virus
 AIDS:

Acquired immunodeficiency syndrome
References
Lawn SD, Kranzer K, Wood R. Antiretroviral therapy for control of the HIVassociated tuberculosis epidemic in resource limited settings. Clin Chest Med. 2009;30(4):685–99.
Mbachu C, Okoli C, Onwujekwe O, et al. Willingness to pay for antiretroviral drugs among HIV and AIDS clients in southEast Nigeria. Health Expect. 2017;21(1):270–8.
Collett D. Modelling Binary Data. New York: Chapman and Hall; 1991.
Irimata KM, Wilson JR. Identifying intraclass correlations necessitating hierarchical modeling. J Appl Stat. 2017;16:1–16.
Mayer KH, Beyrer C. HIV epidemiology update and transmission factors: risks and risk contexts16^{th} Int. AIDS conference epidemiology plenary. Clin Infect Dis. 2007;44(7):981–7.
Pradhan, N. S., Su, Y., Fu, Y., et al (2011). Analyzing the effectiveness of policy implementation at the local level: a case study of management of the 2009–2010.
Snijders T, Bosker R. Multilevel analysis: an introduction to basic and advanced multilevel modeling. London/Thousand Oaks/New Delhi: SAGE Publications; 1999.
Ene, M., Leighton, E. A., Blue, G. L., et al. (2015). Multilevel models for categorical data using SAS® PROC GLIMMIX: the basics. SAS Global Forum 2015 Proceedings..
Neuhaus A, Augustin T, Heumann C, et al. A review on joint models in biometrical research. J Stat Theory Pract. 2009;3. https://doi.org/10.1080/15598608.2009.10411965.
Iddi S, Molenberghs G. A joint marginalized multilevel model for longitudinal outcomes. J Appl Stat. 2002;39(11):241–2430.
Fang D, Sun R, Wilson JR. Joint modeling of correlated binary outcomes: the case of contraceptive use and HIV knowledge in Bangladesh. PLoS One. 2018;13(1):e0190917.
Gueorguieva RV, Agresti A. A correlated probit model for joint modeling of clustered binary and continuous responses. J Am Stat Assoc. 2001;96:1102–12.
Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika. 1993;80(2):267–78.
Acknowledgements
None.
Funding
None.
Author information
Authors and Affiliations
Contributions
DF did the manuscript and analyzed the data; AL analyzed the data; JW gave advice and editing. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
None.
Consent for publication
None.
Competing interests
If you do not have any competing interests, please state “The authors declare that they have no competing interests” in this section.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Fang, D., Lang, A. & Wilson, J.R. HIV survey in Mozambique: analysis with simultaneous model in contrast to separate hierarchical models. Arch Public Health 78, 70 (2020). https://doi.org/10.1186/s13690020004538
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13690020004538
Keywords
 Sharedparameter model
 Correlated
 Intraclass correlation