Modeling overdispersed longitudinal binary data using a combined beta and normal random-effects model

Background In medical and biomedical areas, binary and binomial outcomes are very common. Such data are often collected longitudinally from a given subject repeatedly overtime, which result in clustering of the observations within subjects, leading to correlation, on the one hand. The repeated binary outcomes from a given subject, on the other hand, constitute a binomial outcome, where the prescribed mean-variance relationship is often violated, leading to the so-called overdispersion. Methods Two longitudinal binary data sets, collected in south western Ethiopia: the Jimma infant growth study, where the child’s early growth is studied, and the Jimma longitudinal family survey of youth where the adolescent’s school attendance is studied over time, are considered. A new model which combines both overdispersion, and correlation simultaneously, also known as the combined model is applied. In addition, the commonly used methods for binary and binomial data, such as the simple logistic, which accounts neither for the overdispersion nor the correlation, the beta-binomial model, and the logistic-normal model, which accommodate only for the overdispersion, and correlation, respectively, are also considered for comparison purpose. As an alternative estimation technique, a Bayesian implementation of the combined model is also presented. Results The combined model results in model improvement in fit, and hence the preferred one, based on likelihood comparison, and DIC criterion. Further, the two estimation approaches result in fairly similar parameter estimates and inferences in both of our case studies. Early initiation of breastfeeding has a protective effect against the risk of overweight in late infancy (p = 0.001), while proportion of overweight seems to be invariant among males and females overtime (p = 0.66). Gender is significantly associated with school attendance, where girls have a lower rate of attendance (p = 0.001) as compared to boys. Conclusion We applied a flexible modeling framework to analyze binary and binomial longitudinal data. Instead of accounting for overdispersion, and correlation separately, both can be accommodated simultaneously, by allowing two separate sets of the beta, and the normal random effects at once.


Background
In medical and biomedical areas, binary and binomial outcomes are very common. The generalized linear model family [1][2][3] offers, among others, a suitable modeling framework. Such data are often collected repeatedly in time. Let r ij be a longitudinal binary outcome for subject i at the j th time point, such that each subject has n i measurements. The sum Y i ¼ P n i j¼1 r ij follows a binomial distribution. It is well known that, while i.i.d. Bernoulli variables do not contradict the prescribed mean-variance relation, i.i.d. binomial data can, exhibiting extra variability beyond the binomial model, leading to the so-called overdispersion in the latter, in addition to the correlation emanating from the repeated measures nature. In the past, overdispersion and correlation have been handled separately. To deal with overdispersion, the beta-binomial model is a popular and analytically tractable alternative to the binomial model, which accounts for the overdispersion not accommodated for in the binomial model, thereby allowing for a better fit to the observed data [4,5]. On the other hand, correlation is accommodated for by making use of generalized linear mixed models [6][7][8], which combine the general exponential family models with normally distributed random effects, are attractive for repeated measurements. In this paper, we use a general and flexible framework for such combinations, proposed by Molenberghs et al [9]. These authors focused on likelihood based methods for inference. In this paper, we have tried to present how the new combined model proposed by Molenberghs et al [9], can be implemented in the Bayesian paradigm. In addition, the ability to specify prior distribution will help to incorporate more information in inference, especially for complex models, like the combined model, that attempt to capture overdispersion and clustering using two separate sets of random effects. Further, we considered two real world data sets and analyzed, first in the likelihood context, and then in the Bayesian, which could also be considered as sensitivity analysis.
Two longitudinal binary data sets, collected in south western Ethiopia: the Jimma infant growth study, where the child's early growth is studied, and the Jimma longitudinal family survey of youth where the adolescent's school attendance is studied over time, are considered. One of the key indicators of infant growth is Body Mass Index (BMI). Many studies suggest that Breastfeeding status, and socio-economic condition of the parents, among others, are potential risk factors of BMI [10][11][12]. School attendance among adolescents varies among gender groups in a way that girls are at higher risk of school absenteeism as compared to boys. Moreover, adolescents living in urban areas have have a better school attendance rate, unlike those in the rural setting [13,14].
The paper is organized as follows. Section (Methods) briefly reviews standard methods and presents the model combining the normal and conjugate random effects in Section (Models combining conjugate and normal random effects). Avenues for parameter estimation and ensuing inferences are explored in Section (Estimation), with particular emphasis on so-called partial marginalization and Bayesian estimation. The results for the analysis are presented in Section (Results) followed by discussed in Section (Discussion). Some concluding remarks are taken up in Section (Conclusion).

Methods
In this section, we preset the two data sets from the Jimma case studies and briefly describe conventional models used for analysis. We start this section with presenting the data followed by a review of the generalized linear model; we also lay out the notation for the rest of the paper. Section (Overdispersion models) focuses on overdispersion in the binary and binomial situations. Section (Models with normal random effects) reviews the mixed model methodology for longitudinal data analysis. Finally, in Section (Models combining conjugate and normal random effects), the combined model is presented in which ideas from the mixed model methodology are combined with ideas on overdispersion.
The Jimma case studies Two longitudinal datasets, Jimma Infant Growth Study and Jimma Longitudinal Family Survey of Youth, collected in Southwest Ethiopia are considered.
The Jimma Infant Survival Differential Longitudinal Growth Study is an Ethiopian study, set up to establish risk factors affecting infant survival and to investigate socio-economic, maternal, and infant-rearing factors that contribute most to the child's early survival. Children born in Jimma, Keffa and Illubabor, located in Southwestern Ethiopia were examined for their first year growth characteristics. At baseline, there were a total of 7969 infants enrolled in the study, whereby 4317, 1494, and 2158 were from rural, urban, and semi-urban areas, respectively. The children were followed-up every two months, until the age of one year. Of special interest in this manuscript is the risk factor for overweight in children. Overweight, among infants, is associated with various risk factors. It is of particular interest to identify these risk factors in early life through weight and height measurements, which helps in prevention and treatment of overweight and obesity to reduce incidence of several adulthood diseases [15]. This outcome is defined by dichotomization of the Body Mass Index (BMI), with a BMI over the 85th percentile for his or her age referring to overweight. The 85th percentile for age-and sex-specific BMI classification of overweight is used based on Center for Disease Control (CDC) recommendation [16]. The question of interest is whether the percentage of overweight infants changes over time, and whether the evolution differs for gender, place of residence (rural, urban and semi-urban), as well as breast feeding behavior. Table 1 gives a summary of the percentage of overweight infants as a function of gender, location and follow-up time (age).
The Jimma Longitudinal Family Survey of Youth (JLFSY) is another Ethiopian study where data were collected from households. The study began in 2005, and was repeated in 2007. More than 90% of the study subjects present at baseline were visited and willing to respond in the second round. The study population is representative of the relatively large town of Jimma, the small towns of Yebu, Serbo, and Sheki, and nearby rural areas. The sample includes 3700 households as well as 700 adolescents. In this paper, the outcome of interest is the adolescents' current school attendance coded as 0 (not currently attending) or 1 (currently attending). Current school attendance was 90.2% and 91.1% in the first round survey and 93.5% and 92.8% in the second round for male and 3 female adolescents, respectively. The research question is to examine whether or not the percentage of school attendance depends on adolescents involvement in work to support themselves or their families to earn money, whether they are living in urban towns or rural areas, as well as on gender and age.

Standard generalized linear model
A random variable Y follows an exponential family distribution if the density is of the form for a specific set of unknown parameters η and ϕ, and for known functions ψ(Á) and c(Á, Á). Often, η and ϕ are termed 'natural parameter' (or 'canonical parameter') and 'dispersion parameter, ' respectively. For this family, in general, the mean and variance are related [17].
For binary responses, the model of interest is: Y $ Bernoulli (π). We want to explain variability between outcome values based on covariate values with density function The mean is given by μ = π and the variance, var (μ) = π(1 − π) [1].
When collecting a set of data, let Y 1 ,. . ., Y N be a set of independent binary outcomes, and let x 1 ,. . ., x N represent the corresponding p-dimensional vectors of covariate values. With a logit link function, ln π i 1Àπ i ¼ x ′ i ξ is the logistic regression model with ξ a vector of p fixed, unknown regression coefficients.

Overdispersion models
The standard Bernoulli model assumes that the mean and variance depend on a single parameter. Though a set of i.i.d. Bernoulli data cannot contradict the mean-variance relationship, it may not hold true for data having a hierarchical structure of the form z i successes out of n i trials.
For the Jimma infants study, considering i.i.d. Bernoulli data, the sample average probability of success and the sample variance are 0.150 and 0.128, respectively, indicating that the prescribed mean-variance link is maintained. In contrast, in the binomial setting, taking the hierarchical structure into account, the sample average and the sample variances are 0.141 and 2.107, respectively implying that the sample contradicts the mean-variance relationship for these data.
Similar exploratory analyses on the Jimma Longitudinal Family Survey of Youth were undertaken. For the binomial response, taking the two repeated measurements results in sample average probability of success 0.919 and sample variance 0.168 indicating that the results are in line with the prescribed mean-variance relationship which is known to be always true for the Bernoulli case. This may suggest, at first sight, that these data are not prone to exhibit strong overdispersion, even in the hierarchical binomial setting. In addition to the exploratory analysis, we also made tests for overdispersion. The commonly use approach is to compute the ratio of the residual deviance to the residual degrees of freedom which is approximates the overdispersion parameter (φ ). When the ratio is appreciably larger than 1, overdispersion is said to occur. It is pointed out that this approach could be misleading when n i p i is not sufficiently large, where p i is probability of the success event. This is because it is based on asymptotic theory. As a result, a better approach is based on a quasi-binomial model which allows more dispersion than the binomial model [18]. The approximated overdispersion (φ = 2.37) computed as the ratio of the residual deviance to the residual degrees of freedom in the Binomial, and the one estimated in the quasibinomial model (φ =2.47) for the Jimma Infants Growth data are very similar, both suggesting the presence of strong overdispersion. However, similar analysis for the Jimma Family Survey data does not suggest a considerable overdispersion, with values 0.765 and 1.129, approximated by the ratio of the residual deviance to the residual degrees of freedom in the Binomial, and estimated by the quasibinomial, respectively. An elegant way to account for overdispersion is through the so-called beta-binomial model, in which the Bernoulli model is combined with a beta distribution [17,19].

Models with normal random effects
For non-Gaussian data, the well-known generalized linear mixed model, in which the linear predictor contains random effects in addition to the usual fixed effects, is a common choice [6][7][8]. These random effects are usually assumed to come from a normal distribution. The model can be specified as follows: Let Y ij be the jth outcome measured for subject i = 1,. . ., N , j = 1,. . ., n i and group the n i measurements into a vector Y i . Assume that, in analogy with Section (Standard generalized linear model), conditionally upon q-dimensional random effects b i $ N (0, D), the outcomes Y ij are independent with densities of the form for a known link function η(Á), with x ij and z ij p-dimensional and q-dimensional vectors of known covariate values, with ξ a p-dimensional vector of unknown fixed regression coefficients, and with ϕ a scale (overdispersion) parameter. Finally, let f (b i |D) be the density of the N (0, D) distribution for the random effects b i . Here, the hierarchical approach is needed because we are working with longitudinal data. More precisely, in our model, the natural parameter is written as a linear predictor, a function of both fixed and random effects.

Models combining conjugate and normal random effects
Combining both the overdispersion effects (Section Overdispersion models) and the normal random effects (Section Models with normal random effects) into the generalized linear model framework, produces the following general family [9]: with notation similar to the one used in (3), but now with conditional mean where the random variable Unlike in Section (Models with normal random effects), we now have two different notations, η ij and λ ij , to refer to the linear predictor and/or the natural parameter. The reason is that λ ij encompasses the random variables θ ij , whereas η ij refers to the 'GLMM part' only. A detailed overview of the model can be found in Molenberghs et al [9]. For the case of binary data, we assume that where θ ij $ Beta(α, β). Indeed, this model also intuitively seems useful, as overdispersion and correlation due to the data hierarchy can occur simultaneously. The model is a two-level model with two types of random effects: (a) the b i , to accommodate correlation among repeated measures (and some overdispersion); (b) the θ ij for additional overdispersion. While (a) turns the model into a two-level model, rather than a onelevel one, (b) does not further add a level, because it merely accommodates overdispersion. This is to be compared with a classical generalized linear model, where also overdispersion random effects can be taken into account (e.g., beta in the Bernoulli model to yield the betabinomial; gamma in the Poisson model to yield the negative binomial; etc.), while keeping the so-resulting models remain one-level models.
Further, because the θ ij follow a conjugate distribution, they do not have an impact on the shape of the regression function (like the normal random effects in a linear mixed model), hence there is greatly reduced sensitivity to assumptions about the random effects. This is one of the elegant properties of conjugate random effects.

Estimation
In the likelihood framework, estimation proceeds by integration. The likelihood contribution of subject i is From this, the likelihood is given as: Here, ϑ groups all parameters in the conditional model for Y i . In the binomial case, the expression takes the form: with It is straightforward to obtain the fully marginalized probability by numerically integrating over the normal random effects, and using a tool such as the SAS procedure NLMIXED that allows for normal random effects in arbitrary, user-specified models. More details can be found in [9].
As an alternative estimation method, we turn to the Bayesian paradigm, combined with the popular Markov Chain Monte Carlo (MCMC) technique, making analyses of real-world complex data feasible [15]. In the Bayesian approach, prior distributions are assigned to the parameters and the random effects to adjust for parameter uncertainty. Bayesian inference for estimation of parameter θ is based on the posterior distribution, which is proportional to the likelihood multiplied with the prior distribution.
The Jimma longitudinal studies are characterized by clustering, resulting form the repeated measurements, leading to both correlation and overdispersion. When modeling such data, incorporating prior distribution for model parameters, including that of subject and observation specific random effects, will better handle the underlying uncertainties, instead of assuming that they are fixed. With the same model specification as in the likelihood framework, the parameters ξ, b i , and θ ij are taken to be a priori independent, i.e., p(ϑ, D, ϑ i , Σ i ) = p(ϑ)p(D)p(ϑ i )p(Σ i ) and the following prior distributions are used: ξ $ N(0,10 −6 ), b i $ N(0,τ i ), as also suggested in the literature [15,16], and θ ij $ Beta(α,β), is unimodal and concave, when α > 1, β > 1 [3]. For the hyper parameters τ i , the inverse-Gaussian prior IG(0.001, 0.001), and for α and β, an improper uniform prior is used, as also suggested by Gelman et al [16]. For more information on Bayesian data analysis and MCMC methods see [20,21].
Note that the Beta-binomial distribution is a compound distribution of the binomial and its conjugate beta, which can be used to capture overdispersion in binomial data. Beta-binomial approximates the binomial distribution arbitrarily well when its two non-negative parameters, α and β, determining its shape, are sufficiently larger. If one or both of these parameters are less than 1, then the probability mass function will go to infinity near its boundaries, 0 and 1, and hence not concave. As a result, the mode does not exist, leading to computational problems in MCMC. For this reason, we used the restriction α > 1, β > 1, such that the density is always concave and unimodal whereby it is always finite over the support [0, 1].
Spiegelhalter et al [22] suggest to use the so-called Deviance Information Criterion for model com-parison in Bayesian inference. Assume a probability model P (y|θ). The effective number of param-eters with respect to a model with parameter Θ is given by pD{y, Θ,θ(y)} = E θ|y [−2 log p(y|θ)] + 2 log[p{y| θ (y)}]. We shall usually drop the arguments {y, Θ,θ(y)} from notation. Generally, we takeθ(y) = E(θ|y), the posterior mean of the parameters. For f (y) being a fully specified standardizing term that is a function of the data alone, pD, defined as a 'mean deviance minus the deviance of the means, ' is given by pD According to this criterion, the model with the smallest DIC is to be preferred. pD and DIC are easily computed using 7 the available MCMC output by taking the posterior mean of the deviance to obtain E[D(θ|y)] and the plug-in estimate of the deviance D(E[θ|y]) using the posterior means E[θ|y] of the parameter θ. In nonhierarchical models, pD approximates the effective number of parameters to be estimated. However, for hierarchical models, pD is a measure of model complexity instead of being merely the number of effective parameters to be estimated. For the best model preferred based on DIC, the important risk factors could be identified looking the credible intervals, considering whether zero is in or outside of the credible interval.
We also attempted to fit the beta-binomial marginal density, although it is not one commonly encountered in software packages like WinBugs, where an observation x i contributes a likelihood term L i . We used the socalled zero trick, a Poi(ϕ) observation of zero has likelihood exp(−ϕ), so if our observed data is a set of 0's, and ϕ i is set to − log(L i ), we would obtain the correct likelihood contribution [23]. This zero trick allows for arbitrary sampling distributions and is particularly suitable when, say, dealing with truncated distributions. However, our case studies showed that this method can be very inefficient and give a very high Monte Carlo error.
In terms of parameter interpretation, we would like to refer back to the beneficial properties that come with the conjugacy property. Indeed, because the θ ij follow a conjugate distribution, the interpretation of the parameters is the same as in a classical generalized linear mixed model. Precisely, this means that the effect on the regression parameters only comes from the normal random effects in the linear predictor, a fact well documented. For a review, see, for example, Molenberghs and Verbeke [17].

The jimma infant growth study
We will analyze the binary BMI data. The following model is assumed for the mean structure: Y ij |b i $ Bernoulli(π ij ), for subject i and measurement j, and where G i is a gender indicator, P 1i and P 2i are dummy variables for place of residence corresponding to rural and urban areas and using semi-urban areas as a reference. T ij is the time point at which the j th measurement is taken for the i th subject, which is centered at month six. B ij denotes whether the i th infant is breast fed or not at time j. The random intercept b i $ N (0, D). The Infant Growth dataset is analyzed with a simple logistic model, a beta-binomial model introducing only an overdispersion parameter, a random-effects logistic model that introduces a random-effects term to take the repeated structure of the data into account, and finally the combined model, which allows for both an overdispersion and a random effects term. Parameter estimates are presented in Table 2.
Clearly, the logistic-normal model is an important improvement, in terms of likelihood, relative to both the ordinary logistic model and the beta-binomial. Moreover, considering the combined model, there is a very strong improvement in fit when the beta and normal random effects are simultaneously allowed for. The overdispesion term in the combined model is significant (p < 0.001), implying the presence of considerable extra variability due to the grouped nature of the data, which is beyond what can be accommodated by the commonly used logistic-normal model.
The logistic-normal model ignores the overdispersion that results from the grouped nature of the data. On the other hand, the beta-binomial model accommodates overdispersion which is assumed independent, implying independence between repeated measurements. Again, this is not realistic and therefore the combined model is the more viable candidate, supported further by the aforementioned 9 likelihood comparison.
The combined suggests that the intercept, the time effect, main effects of place of residence and breastfeeding are significant, which is also true for time interaction with rural place of residence and breast feeding. However, main effect and slope of gender were not significant implying that proportion of overweight seems to be invariant among male and female infants over time. Infants living in rural, and urban areas are at lower risk of overweight as compared to those in semi-urban ares with (ξ 3 = −1.058, p = 0.001), and (ξ 4 = −0.689, p = 0.001), respectively. Further, early initiation of breastfeeding has a protective effect against the risk of overweight in late infancy (ξ 9 = −0.167, p = 0.001), as shown in Table 2.

Jimma longitudinal family survey of youth
We will now analyze current school attendance. For the logit, consider the model: Y ij |b i $ Bernoulli(π ij ), with where A ij is the age of the i th subject at the j th visit, G i is the gender of the i th subject. P 1ij and P 2ij denote the two dummy variables for place of residence of the i th subject on the j th visit, which are urban, semi-urban, and rural by taking rural as a reference. W ij indicates whether the i th adolescent is engaged in some work for the family or help support on the j th visit. Finally, R ij is the j th round or measurement occasion of the i th subject, and b i $ N (0, d).
Results from fitting all four models (with/without normal random effect; with/without beta random effect) can be found in Table 3. Likelihood comparison of the betabinomial with the standard logistic model shows no improvement in fit, implying absence of strong evidence for overdispersion. This can be noted from likelihood comparisons of the simple logistic and the beta-binomial on the one hand, as well as the logistic-normal and the combined, on the other. One can easily see, however, that the commonly used logistic-normal and the combined models are significant improvements over the standard logistic model. We further observe, while the logistic-normal model suggests a significant intercept (p = 0.045), that the same does not emerge when the combined model is considered (p = 0.099) implying the beta random effect still has some impact on the p-values. The logistic-normal model is adequate, in this case study, for the combined model where there is no strong evidence of overdispersion, as the overdispersion term is not significant (p = 0.29)for these data with two repeated measurements per subject, as mentioned in the earlier sections. Further extension by adding random slope did not improve the fit of both logistic-normal and the combined models (details not shown).
Adolescents living in urban, and semi-urban areas have higher school attendance than those living in rural areas, with (ξ 2 = 1.098, p = 0.001), and (ξ 3 = 1.092, p = 0.001), respectively. Gender is also significantly associated with school attendance, where female adolescents are lower (ξ 4 = −1.241, p = 0.001). There is evidence that school attendance increases in the second round visit than that of the first (ξ 6 = 0.398, p = 0.010).

Comparison between estimation methods
For comparison with the previously applied estimation method in the likelihood framework, we again apply the same models to the two surveys, but now in the Bayesian framework. After generating 70,000 MCMC samples for the combined, and 50,000 MCMC samples for logisticnormal, beta-binomial, and simple logistic, the first 10,000 samples are discarded and treated as so-called burn-in samples. The remaining samples are used to summarize the posterior estimates. Two distinct chains were used to check sensitivity to the initial values, and convergence was met. Convergence was checked using the Gelman-Rubin diagnostic as well as by visual inspection of the trace and QQ plots [24]. The posterior summaries of logistic, beta-binomial, logistic-normal, and combined models are given in Tables 4 and 5 for the Jimma Infants Growth dataset and the Jimma Longitudinal Family Survey of Youth, respectively. The parameter estimates are fairly similar with what was obtained previously in the likelihood approach in both cases, except for differences in the case of the beta-binomial for the Jimma Infants data in Table 4 when compared with Table 2.
In terms of significance of the parameters, the same conclusion is reached for the two case studies in both approaches, except that the beta-binomial for the intercept and time effects in Jimma infants study shows significance in the likelihood framework as given in Section (The jimma infant growth study), while the same does not emerge from the Bayesian case, as observed from the 95% credible inteval which include zero for these effects. We compared the various models using the DIC criterion. For both studies, there is a significant reduction in the DIC of the logisticnormal and the beta-binomial, as compared to the simple logistic. We observe a rather high degree of model improvement by combining beta and normal random effects simultaneously, to allow for both the overdispersion and the data hierarchy. Moreover, the logistic and the betabinomial ignore the correlation stemming from the data hierarchy on the one hand, and the logistic-normal does not allow for the overdispersion, on the other, which altogether make the combined model the preferred one.
According to Spiegelhalter et al [22], in comparing complex hierarchical models where the number of parameters are not clearly defined, pD is the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of  interest, not only measures the effective number of parameters but also the model complexity. These authors further noted that the contribution pD i of each observation i turned out its leverage, defined as the relative influence that each observation has on its own fitted value. for y i conditionally independent given θ, pD i , shows its interpretation as the difficulty in estimating θ with y i . This shows the connection between the sample size, the parameters to be estimated, and the pD. The Jimma infants (n = 7969) and the Jimma Longitudinal family survey (n = 2100) data have large number of subjects followed longitudinally, where each subject was measured seven and two times, respectively. Due to these reasons, the pD values, as presented in Tables 4 and 5, appeared to be larger as the by-product of the MCMC estimation to obtain leverage of each observation. Unlike the Jimma infants study in Table 4, pD of the combined model for the Jimma Longitudinal Family Survey of Youth in Table 5, (pD = 211.9), is lower than that of the logistic-normal (pD = 241.5). This implies that, for the Jimma Longitudinal Family Survey of Youth, the combined model is less complex to fit than the logistic-normal, although, this is what we don't usually expect, as the combined model seems more complex, since it includes both the beta and the normal random effects, while, the logisticnormal including only the normal random-effects.  However, for these specific data, this resulted likely because there is less conflict between the specific data set, and the prior distributions which could be associated to the conjugacy of the beta random effects, as well as the peculiar data features including number of subjects and repeated measurements per subject. The SAS and WinBugs codes used for analysis of the data sets are given in the Appendix.

Discussion
Analysis of the case studies show that, in the presence of overdispersion, and clustering, the combined model results in improvement in model fit, which is similar to the finding in Molenberghs et al [9].
This study revealed that early breastfeeding lowers the risk of overweight at late infancy. This finding is in line with Bergman et al [10], who showed that breastfed infants had lower BMI after 3 months from birth than bottle-fed infants, though the BMIs at birth were nearly identical in both groups. Owen et al [11], who reviewed sixty-one studies, states that initial breastfeeding protects against obesity in later life, although the precise magnitude of the association remains unclear. Unlike Owen et al [11], the present study showed that, infants in the breastfed group were fatter, at birth, as compared to those who were not breastfed. This is likely because of the unmeasured maternal history, such as maternal BMI, and socio-cultural aspects, which are considered to be the risk factors of overweight in children [25]. In addition, it is a common practice in the study area that mothers provide additional liquid or solid food starting from early infancy, in addition to breastfeeding. This is probably because they believe that a child with more weight is considered as healthy, which is likely to have its own impact on the BMI in the early infancy. In this study, it is also shown that place of residence does not have a long term effect in the risk of overweight, instead it is the mode of feeding which is more important. The baseline differences observed in the risk of overweight among infants living in urban, semi-urban areas might be attributable to other family related factors like, social class, family income, educational level of the parents, and other socio-cultural variables, which are indicated to affect the nutrition of young children and women in Ethiopia [12].
In investigating school attendance among adolescents, this study showed that, girls have a lower rate of current school attendance than boys, which is a common situation in most Sub-Saharan African Countries. According to World Health Organization [13], there is a clear gender gap observed in primary or secondary school enrollment when the Gender Parity Index (GPI), the ratio of female to male enrollment, is considered. Between the years 1999 and 2003, GPI was found to be 0.7, indicating that there were only 7 girls enrolled at primary schools for every 10 boys. This gender gap increases with increasing level of education. This study also showed that adolescents in urban and semi-urban areas have higher rate of than those in the rural areas, which is in line with report of World Bank [14], where it was stated that among children in rural areas with a school in the neighborhood, less than 44% registered for school; in urban areas, the percentage is much higher up to 86%. According to the report, distance to the nearest school, household characteristics, and learning environment were among the possible reasons of the gap in the school attendance.

Conclusion
We have presented a model which integrates normal and beta random effects into a single model, termed the combined model. Our work builds upon that of Molenberghs et al [9], who brought together normal random effects to induce association between repeated binary and binomial data, and a beta-binomial distributed random factor in the log-linear predictor to fine tune the overdispersion. Estimated posterior mean and standard deviation in (1) the logistic model, (2) the beta-binomial model, (3) the logistic-normal model, and (4) the combined model.
Maximum likelihood estimation was considered by integrating over the random effects using the SAS procedure NLMIXED.
Further, Bayesian inference has been applied. Prior information about the parameters induces correlation, which then leads to reduced effective dimensionality although the reduction depends on the available data [22]. Complexity reflects the difficulty in fit and hence it seems reasonable that the measure of complexity may depend on both the prior information concerning the parameters under scrutiny and the specific data that are observed. This can be elucidated from the Jimma Longitudinal Family Survey of Youth result, where the combined model is less complex in fit, which likely results from the conjugacy of the beta random effect and the number of subjects as well as the repeated measurements per subject.
Future studies on early growth of children could benefit from careful measurement of a wider range of potential confounders of overweight.
Further efforts should be made to fill the gap in school attendance among boys and girls, as well as, urban and rural areas by focusing on the potential causes, such as lagging experience in primary schooling, which is then exacerbated by such factors as the practice of early marriage among Ethiopian women, families reluctance to invest in girls education. Situating schools closer to childrens homes in rural areas, and improve the quality of the services is necessary [14]. Longitudinal studies with better number of repeated measurements per subject should be conducted to get better insight on the trends of school enrollment, survival of adolescents.