Skip to main content

Symptoms of anal incontinence and quality of life: a psychometric study of the Norwegian version of the ICIQ-B amongst hospital outpatients



The International Consultation on Incontinence Questionnaire-Bowel (ICIQ-B), a self-report, condition-specific questionnaire designed to assess symptoms of anal incontinence (AI), measures AI’s impact on quality of life (QoL) along with perceived bowel patterns and bowel control amongst individuals with AI. In our study, we aimed to translate the ICIQ-B to Norwegian and investigate the Norwegian version’s psychometric properties.


To establish a relevant, comprehensive, and understandable Norwegian ICIQ-B, cognitive interviews were conducted with 10 patients with AI, and six clinical experts reviewed the translated scale. The Norwegian ICIQ-B’s structural validity, scale reliability, and content validity were tested amongst patients with AI attending hospital outpatient clinics in three regions of Norway (N = 208).


Assessing the Norwegian ICIQ-B’s content validity revealed that the questionnaire was relevant, comprehensive, and understandable. Missing data were infrequent (3.3%), and no floor or ceiling effects emerged. Three-factor and two-factor solution models, both with advantages and disadvantages, were found. The three-factor model offered the most parsimonious solution by covering most of the original scale, albeit with an unacceptably low reliability (α = .37) for the construct of bowel pattern. The two-factor model showed good reliability in terms of internal consistency for the constructs of bowel control (α = .80) and impact on QoL (α = .85) but was less parsimonious due to dismissing seven of the original 17 items and excluding the bowel pattern construct. Test–retest reliability demonstrates good stability for the Norwegian version, with an intra-class correlation coefficient of .90–.95 and weighted kappa of .39–.87 for single items.


Although the Norwegian version of ICIQ-B demonstrates good stability and content validity, the original constructs of bowel pattern and bowel control had to be adapted, whereas the construct of impact on QoL remained unchanged. Further psychometric testing of the Norwegian ICIQ-B’s factor structure is therefore recommended.

Peer Review reports


Anal incontinence (AI) is a debilitating condition that impacts an individual’s self-esteem and quality of life (QoL) and may cause significant secondary morbidity, disability, and economic burden [1]. In contrast to faecal incontinence (FI), defined as “the involuntary loss of liquid or solid stool that is a social or hygienic problem,” AI entails the involuntary loss of not only stool but also flatus from the rectum due to the inability to control bowel movements [2]. Thus, AI ranges from the occasional leakage of stool while passing gas to a complete loss of bowel control [3]. Because AI encompasses the loss of flatus and stool, its estimated pooled prevalence rate amongst home-dwelling adults is 15–17%, whereas FI’s is only 5.9% [4]. A population-based cross-sectional study among Norwegian women aged 30 and older found that 19.1% of the women reported AI, while 3.0% reported FI [5]. No studies have reported AI or FI prevalence among Norwegian home-dwelling men. However, because many patients avoid reporting FI, its prevalence may be underestimated [6, 4]. The highest prevalence is found among older people residing in care homes with an estimated FI prevalence of 42.8% [7].

The aetiology of AI is complex and multifactorial. Continence depends on the interaction between the anal sphincter complex, stool consistency, rectal reservoir function and neurological function. Disease processes or structural defects that alter any of those components can lead to FI [8]. Diarrhoea and altered bowel habits, inflammatory bowel disease, diet intolerance and constipation with paradoxical diarrhoea represent the most frequent independent risk factors for AI [9]. The most common structural causes, however, result from obstetrical injury [10], anorectal surgeries [11] and rectal prolapse [12, 13]. Depending on the presenting circumstances, FI is commonly classified as passive incontinence (i.e. involuntary discharge without any awareness), urge incontinence (i.e. discharge despite active attempts to retain it), and faecal seepage (i.e. leakage of stool with grossly normal continence and evacuation) [14, p. 1585].

Due to AI’s complex aetiology, treatment needs to be tailored to the individual’s circumstances [15]. Although several scoring systems are commonly used to assess AI, no investigative tools specifically link symptoms of AI to QoL [8]. For clinicians as well as researchers, validated questionnaires and scales play an integral role in identifying symptoms of a disease, assessing patients’ QoL, and objectively characterising any phenomenon detected [16]. Amongst such instruments, the International Consultation on Incontinence Questionnaire-Bowel (ICIQ-B) is a self-report, condition-specific questionnaire designed to assess symptoms of AI and its impact on QoL [17, 18]. As part of the International Consultation on Incontinence’s suite of validated questionnaires on incontinence [19], the ICIQ-B includes 21 main items, 17 of which address three scored factors: Bowel Pattern, Bowel Control, and Impact on QoL. In addition, to evaluate important issues from the perspectives of clinicians and patients, the ICIQ-B includes four unscored items: one representing the Bristol Stool Chart of stool consistency [20] and three others respectively concerning strain, worry and the restriction of sexual activities due to AI. Tailored for use by clinicians in both primary and secondary healthcare, the ICIQ-B is designed to screen for AI, obtain a brief yet comprehensive summary of the level, impact, and perceived cause of symptoms of AI and to facilitate better patient–clinician discussions [17, 18]. The ICIQ-B is intended for both clinical assessment and research. The 21 items are therefore divided in two parts; an A-question representing the main issue, accompanied by a B-question “how much does this bother you?” which is particularly important in a clinical perspective. The A-questions are measured on a 5- or 6-point Likert scale, while the B-questions are measured on a scale from (0 not at all) -10 (a great deal). One item, item 3, has a third question, since the main question regarding frequency of opening one’s bowels is further divided into a) usual and b) at worst and c) how much does this bother you? (Additional file 1).

Validated patient-reported outcome measures not only help patients and clinicians to make better decisions but also enable comparisons of providers’ performance to stimulate improvements in services. They are also well-suited for cross-national comparisons of research [21, 22]. To date, the ICIQ-B, originally developed in British English [17, 18], has been translated and validated in Spanish (i.e. in Chile), albeit only regarding content validity based on cognitive interviews [23]. Although an American English online version of the ICIQ-B has been psychometrically evaluated against an American English paper version [24], the extent of testing was limited. Even so, both cited studies involved assessing the test–retest reliability, which proved to be good in both cases [23, 24]. Moreover, the psychometric evaluation conducted in the United States demonstrated the ICIQ-B’s convergent validity and reasonable response to change at follow-up 3 months after the non-surgical treatment of FI, as well as its good internal consistency for the constructs of impact on QoL and bowel control. Meanwhile, having tested the American English version of the ICIQ-B, Markland et al. [24] demonstrated its fair internal consistency for the construct of bowel pattern. However, neither the Spanish nor the American English translation of the ICIQ-B has been assessed for structural validity. Beyond that, a review of QoL measures in relation to FI has shown that the original British English version of the ICIQ-B lacks sufficient structural validity [25]. Thus, because the ICIQ-B’s factor structure seems to be unclear, we evaluated the structural validity, reliability, and content validity of a Norwegian version of the scale.


In our study, we aimed to translate the ICIQ-B to Norwegian and assess the translated scale’s psychometric properties amongst outpatients with AI. The research question was threefold:

  1. 1.

    How well does the original ICIQ-B’s three-factor measurement model fit with the observed data?

  2. 2.

    Does the ICIQ-B demonstrate good reliability in terms of internal consistency and test-retest stability?

  3. 3.

    Does the ICIQ-B demonstrate good content validity in the Norwegian population?

The research question was addressed in accordance with the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN guidelines) [26, 27], which address evidence related to structural validity, reliability, and content validity, all as central, interrelated properties of a given measurement model. Whereas structural validity (i.e. dimensionality) concerns the homogeneity of items [28]—that is, whether items match their respective constructs—reliability encompasses a scale’s inconsistency and lack of error [28]. By further contrast, content validity explores whether the theoretical content of constructs is adequately represented by questionnaire items in terms of relevance and comprehensiveness [29].


Translation and cultural adaptation

First, the ICIQ-B was translated from British English to Norwegian by a bilingual Norwegian–English translator, followed by a back-translation into English conducted by another bilingual Norwegian–English translator [19]. Second, the back-translation was evaluated by the International Consultation on Incontinence Questionnaire group [30]—that is, the British English instrument’s developers—who provided useful comments regarding possible ambiguities and other flaws that guided minor adjustments to the Norwegian ICIQ-B. Third, the Norwegian version was pilot-tested for comprehensiveness, readability, and equivalence [29] in cognitive interviews with 10 patients with AI living in Norway. Fourth, comments were gathered from six Norwegian bi- or monolingual multidisciplinary clinical experts to further assess comprehensiveness, readability, and equivalence. As minor discrepancies were identified and amended between each step, a comprehensible Norwegian version of the ICIQ-B gradually emerged (see Fig. 1).

Fig. 1
figure 1

Flow chart of the ICIQ-B translation and validation in Norwegian

Participants and sampling procedure

In our study, three samples were recruited. The first was a sample of 10 patients, both men and women, recruited from the outpatient gastrointestinal surgery clinic of St. Olav’s University Hospital in Trondheim to participate in cognitive interviews. Patients with AI were invited to participate in the interview study by a nurse contact who provided them with information about the study, after which patients could contact the researcher directly. Written consent was obtained from the patients before their interviews commenced. Second, a sample of six clinical experts in AI (i.e. colorectal surgeons, stoma nurses and physiotherapists) from the three participating hospitals (i.e. St. Olav’s University Hospital in Trondheim, University Hospital Northern Norway in Tromsø and Akershus University Hospital in Oslo) were recruited to evaluate the Norwegian ICIQ-B’s comprehensiveness, relevance, and wording. The clinical experts were sent the Norwegian and original British English versions of the questionnaire via email, and the research team received their feedback either by email or orally during in-person meetings, depending on each expert’s preference. The cognitive interviews with patients and the evaluation by clinical experts were both part of pilot-testing the translated Norwegian version of the questionnaire and served to establish the foundation for the content validity and cultural equivalence between the Norwegian and British English versions of the ICIQ-B.

Third, to test the psychometric properties of the Norwegian ICIQ-B, patients referred from their general practitioners to outpatient clinics in the three mentioned university hospitals due to AI were recruited to complete a paper-based questionnaire. The three hospitals represent three regions of Norway from south to north. To be included, new patients had to be attending the outpatient clinic due to AI, had to have never received treatment for AI and had to be able to provide their written consent to participate in the study and to complete the questionnaire independently. Patients who participated in the cognitive interviews were not enrolled in that subsequent part of the study. A patient sample 10 times the number of items was needed to be able to perform a factor analysis of the Norwegian ICIQ-B [31]. Because the original questionnaire consists of 21 items, four of which are unscored and were excluded from our analysis, and because the remaining 17 items implied a sample size of approximately 170 patients, we aimed to include at least 200 questionnaire respondents.

Eligible patients were invited to participate by using the hospitals’ routines to summon patients. Along with an invitation to a medical consultation at the hospital, eligible patients received an information sheet about the study together with the questionnaire and a return envelope. Patients who attended the consultation subsequently received another invitation to participate in the study, which both reminded patients who had not yet responded to the questionnaire and served as a retest for those who had already returned their responses. Patients were recruited beginning in 2011 until 200 had been enrolled (i.e. in 2013).


Descriptive statistics and exploratory factor analysis (EFA) were performed with IBM’s SPSS version, while confirmatory factor analysis (CFA) was performed using Stata version 17.0 [32].

Structural validity was assessed of the main A-questions using CFA and EFA (Principal Axis Factoring). In our study, the model fit was assessed by χ2 statistics and two conventional fit indices—the root-mean-square error of approximation (RMSEA) and the standardised root mean square residual (SRMS)—with values less than .05 indicating a good fit and values from .05 to .10 indicating an acceptable fit [33, 34]. Furthermore, the comparative fit index (CFI) and the Tucker–Lewis index (TLI) with acceptable fit set at .95 and good fit at .97 were used [33,34,35,36]. Because skewness and kurtosis were significant, the Satorra–Bentler-corrected χ2 was applied as recommended when analysing non-normal continuous endogenous variables [37]. EFA was performed with oblim rotation, and observations with one or more missing values across the 17 variables included any of the three constructs were deleted. No replacements were made for missing data.

Next, content validity was assessed in three ways. First, cognitive interviews with patients in the target population and reviews of the scale by clinical experts were analysed on a question-by-question basis and any comments entered directly under each item on the scale [38]. Second, floor and ceiling effects were considered problematic if more than 15% of respondents achieved the highest- or lowest-possible score [39, 40]. Third, at the item level, less than 3% missing data was acceptable, whereas more than 15% was not [29].

The reliability of the questionnaire and its subscales were assessed for their internal consistency and stability over time. To assess the internal consistency of the A-items, we used the reliability coefficients of Cronbach’s alpha (α) and composite reliability (ρc), with values ≥.7 considered to be good [29]. Test–retest reliability was evaluated using intra-class correlation coefficients (ICC) to measure the stability of scales over time and weighted kappa values with linear weights for single items [40, 39]. In ICC analysis, a two-way mixed-effect Analysis of Variance (ANOVA) was used because time is a relevant factor in test–retest studies of patient-reported outcome measures. Also, interaction for the absolute agreement between scores was considered the preferred ICC formula [41]. Additionally, measurement error (i.e., standard error of measurement and smallest detectable change) were reported [40].

Ethical considerations

The Regional Committee for Medical and Health Research Ethics reviewed and approved the study (2009/1225), as did the institutional review board at the three university hospital clinics. Each patient was informed about the study and signed a written declaration of consent to participate. Participants were informed that their participation in the study was voluntary and that they could withdraw their consent at any given time and for any or no reason.


During the 2-year period of data collection, 360 invitations for participation were sent to eligible patients. At baseline, 208 Norwegian patients with AI completed the questionnaire (57.8% response rate), 50 of whom completed it again after 1–6 weeks (i.e., retest). Observations with one or more missing values across the 17 variables included in any of the three factors were deleted, which left a sample of 161.

At baseline, most respondents were women (87.3%). The age range was 18–89 years (Mean 59.2, SD = 15.0), as shown in Table 1. Scale scores for the original constructs appear in Table 2.

Table 1 Characteristics of participants
Table 2 The ICIQ-B original questionnaire including 3 factors and 21 items. Scale means and Standard deviation (SD)

Exploratory factor analysis (EFA)

To explain as much of the total variance as possible with as few factors as possible, we subjected the ICIQ-B to EFA. The Kaiser–Meyer–Olkin measure of sampling adequacy, .88, exceeded the recommended value of .60, and Bartlett’s test of sphericity showed statistical significance (p < .0001), which supported the factorability of the correlation matrix. A factor loading of .32 indicates approximately 10% overlapping variance with the factor’s other items; thus, a minimum loading of .32 is considered acceptable [42]. Accordingly, a cross-loading item would load at .32 or higher on two or more factors. When subjecting the ICIQ-B to EFA, we sought the cleanest factor structure. Because the original ICIQ-B contains three factors, we expected a three-dimensional structure with correlated factors.

Five factors with eigenvalues greater than or equal to 1.0 were extracted (see Table 3), with factor loadings of .38–.94. Figure 2 shows the scree-test of the ICIQ-B data, with five factors explaining 68.17% of the variance; Factor 1 explained 38.37%, Factor 2 explained 9.11%, Factor 3 explained 8.21%, Factor 4 explained 6.37%, and Factor 5 explained 6.11%. That EFA-suggested solution revealed five factors with two to five items each. Four of the factors displayed good or acceptable Cronbach’s alpha coefficients between .64 and .85, whereas the other had a poor one (α = .55). Table 3 lists the loadings and variance for that rotated five-factor solution of the ICIQ-B. Commonalities for the 17 items ranged between .25 for Item 7 and .86 for Item 21, for which a value greater than .40 is recommended [43].

Table 3 Principal Axis Factoring with oblim rotation of ICIQ-B. Estimates for factor loadings, extraction sums of squared loadings and Cronbach’s alpha
Fig. 2
figure 2

Scree-plot of the 17 item ICIQ-B. Principal component analysis. N = 161

Confirmatory factor analysis (CFA)

First, we tested the original three-dimensional structure involving 17 items following Cotterill et al. [18]. The model (i.e., Model 1) revealed standardised factor loadings (λ) of .26–.89, with squared multiple correlations (R 2) of .007–.79. The fit was poor: Satorra–Bentler χ2 = 283.339, df = 116, χ2/df = 2.44, p = .0001, RMSEA = .095, p for test of close fit = .0001, CFI = .85, TLI = .83, SRMR = .080. Although the estimated χ2 value was good, the other fit indices indicated misspecification. Reliability assessed with the composite reliability coefficient (ρc) was good for two of the three dimensions (see Table 4). Analysing the residuals and modification indices (MI) revealed no significant residuals, but 10 MIs were greater than 10; the pairs of Items 11 and 13 (MI = 23.01), Items 3 and 4 (MI = 22.67) and Items 8 and 12 (MI = 19.21) had the highest MIs. Examining the factor loading and R 2 values revealed that Item 13—“Do you have bowel accidents when you have no need to open your bowels?”—showed a low loading and R 2 (.09), which suggests less reliability than Item 11—“Are you able to control wind (flatus) escaping from your back passage?” (R 2 = .42). Because the respondents seemed to regard Item 13 as being irrelevant, we dismissed the item and ran the CFA again. That solution (i.e., Model 2), including 16 items, showed an improved but nevertheless poor fit: Satorra–Bentler χ2 = 240.317, df = 101, χ2/df = 2.38, p = .0001, RMSEA = .093, p for test of close fit = .0001, CFI = .88, TLI = .85, SRMR = .078. Model 2 had seven MIs higher than 10, with Items 3 and 4 presenting the highest value. Again, guided by the factor loadings, R 2 values and the nuances of the construct, Item 4—“How often do you open your bowels during the night from going to bed to sleep until you get up in the morning?”—showed exceptionally high modification indices with Item 3—“How many times do you open your bowels in 24 hours?”—thereby signifying that the items shared error variance, which makes sense: opening your bowels at night obviously correlates with opening them in the past 24 h. Considering that information regarding the past 24 h was more inclusive than the frequency of opening one’s bowels at night, we kept Item 3 and excluded Item 4 to achieve a statistically good model fit. That solution, Model 3, included 15 of the original items: Satorra–Bentler χ2 = 204.662, df = 87, χ2/df = 2.38, p = .0001, RMSEA = .092, p for test of close fit = .0001, CFI = .89, TLI = .87, SRMR = .073.

Table 4 Goodness-of-fit measures for ICIQ-B measurement model. Confirmatory Factor Analysis for Model-1 to Model-8

Thus far, we had dismissed Items 13 and 4. Nevertheless, though the χ2/df was good, the fit remained poor, and five MIs greater than 10 were present. Items 8 and 12 had an MI of 19.43; Item 12—“Are you able to control mucus (discharge) leaking from your back passage?”—shared a considerable amount of error variance with Item 8 (i.e., “Do you experience any staining of your underwear or need to wear pads because of your bowels?”). Because controlling mucus leakage and staining one’s underwear due to such leakage obviously correlate strongly, we dismissed Item 12 to achieve a good fit without including correlated error terms. Nevertheless, that solution (i.e., Model 4), including 14 items, still revealed a poor fit (χ2 = 174.77, df = 74, χ2/df = 2.36, p = .0001, RMSEA = .092, p for test of close fit = .0001, CFI = .90, TLI = .88, SRMR = .073). Furthermore, Items 5 and 14 had an MI of 18 and 16, respectively. The theoretical content of Item 14—“Are your bowel accidents or leakages unpredictable?”—concerned bowel leakage and shared error variance with Item 19—“Do your bowels cause you to feel embarrassed?—which is plausible: bowel accidents and leakage would cause embarrassment. Consequently, removing Item 14 improved the fit.

Even with Item 14 removed, Model 5, including 13 items, only marginally improved the fit. Item 5—“Do you have to rush to the toilet when you need to open your bowels?”—seemed to load more strongly on the Bowel Control factor (λ = 0.82). While allowing it to load on the Bowel Control -factor instead of the Bowel Pattern factor in Model 6 (with 13 items), improved the model fit considerably: χ2 = 108,492, df = 62, χ2/df = 1.75, p = .0001, RMSEA = .071, p for test of close fit = .054, CFI = .95, TLI = .94, SRMR = .056.

As a result, the Bowel Pattern factor, including only three items—Item 3 (i.e. “On average how many times do you open your bowels in 24 hours?”), Item 6 (i.e. “Do you use medications such as tablets or liquids to stop your bowels from opening?”) and Item 7 (i.e. “Do you experience pain/soreness around your back passage?”)—had a low composite reliability (ρpattern = .51). However, the reliability was good for the other two factors (ρcontrol = .82 and ρQoL = .89). Examining the theoretical content of the items belonging to the Bowel Pattern factor clarified that they address different aspects, which explains their low internal consistency. Those items include aspects ranging from the frequency of opening one’s bowels to using medication and experiencing pain. Apparently, the items neither shared much variance nor seemed to represent reliable indicators for the same construct.

Therefore, we dismissed the Bowel Pattern factor (i.e. Items 3, 4, 6 and 7) and re-added Item 12 to the Bowel Control factor. The resulting two-factor solution (i.e., Model 7), with 11 items, revealed a nearly acceptable fit to the data: χ2 = 88.725, df = 43, χ2/df = 2.06, p = .0001, RMSEA = .082, p for test of close fit = .003, CFI = .95, TLI = .94, SRMR = .060. For this model termed Model 7, the loadings ranged between .41 and .90, R 2 values were between .17 and .81, and composite reliability (ρc) was .80 and .89 for the Bowel Control and Impact on QoL factors, respectively. After again adapting the model, we generated a two-factor model, Model 8, that included 10 of the 17 original items—Items 12–14 along with the Bowel Pattern factor were dismissed—and showed a good fit: χ2 = 63.443, df = 34, χ2/df = 1.87, p = .0001, RMSEA = .074, p for test of close fit = .026, CFI = .97, TLI = .96, SRMR = .052. Composite reliability was .82 and .89 (see Table 5).

Table 5 ICIQ-B Model-6 and Model-8 (in parentheses): the best fitting three-factor and two-factor measurement models

Thus, Model 8, with two factors (i.e. Bowel Control and Impact on QoL) and 10 items, was less parsimonious but demonstrated the statistically best fit. By comparison, Model 6 also included those factors along with the Bowel Pattern factor, with 13 items, and was therefore the most parsimonious measurement model with a good fit. Models 6 and 8 are illustrated in Figs. 3 and 4, respectively.

Fig. 3
figure 3

The best fitting most parsimonious measurement model of the Norwegian version ICIQ-B scale

Fig. 4
figure 4

The best fitting two-factor solution of the Norwegian version of the ICIQ-B scale

Content validity and scale reliability

Cognitive interviews with patients with AI and evaluations of the Norwegian ICIQ-B version by clinical experts indicated the Norwegian ICIQ-B’s good face and content validity in terms of relevance, comprehensiveness, readability, and equivalence. The overall percentage of missing data at baseline was 3.3%, ranging from 0.5 to 11.1% for single items. Because none of the proposed scales had the lowest- or highest-possible score with more than 15% frequency, no floor or ceiling effects were found in the total score distributions.

Regarding reliability, internal consistency in the proposed factors showed Cronbach’s alphas (α) from .37 to .85, as presented in Table 6. Test–retest stability revealed ICCs between .90 and .94. Concerning the stability of single items, 13 items had weighted kappa values of .61–.80, whereas five had values of .41–.60, two of .81–1.00 and one of .21–.40. The factors’ standard error of measurement error (SEM), an expression of the average measurement error, was estimated to be 0.42–0.73 points, while the smallest detectable change (SDC95), indicating the uncertainty of that average, was 1.16–2.02 points [29]. Although the SEM was 0.73 for the factor Impact on QoL, to be 95% certain that a change beyond the measurement error has occurred, the patient’s score has to change by 2.02 points from test to retest.

Table 6 Weighted Kappa, ICC, Cronbach’s alpha and change score for new subscales, Bristol stool chart and single items in the Norwegian version of ICIQ-B scale


The original ICIQ-B includes 17 items representing three factors (i.e. Bowel Pattern, Bowel Control, and Impact on QoL), along with four unscored items. In our study, we translated the ICIQ-B scale into Norwegian and tested its psychometric properties (i.e. structural validity, reliability, and content validity) amongst adults in Norway.

Structural validity

When evaluating a measurement scale’s structural validity, two aspects are vital: the data’s underlying dimensionality (i.e., not too many or too few factors) and the adequacy of the scale’s individual items [29]. Showing eigenvalues exceeding 1.0, our EFA suggested five factors: two substantial factors with three and five items and three weak factors with three or two items each. The EFA also revealed cross-loadings, and because the original ICIQ-B contains only three factors [18], its dimensionality seemed uncertain. However, because conclusions should not be drawn solely based on EFA, we conducted a CFA, which revealed both a three-factor solution and a two-factor measurement model showing good fit. However, several items seemed to indicate misspecification.

Both reliability and structural validity relate to the adequacy of a scale’s items. Good indicators of a factor show highly significant factor loadings, preferably greater than .70, accompanied by strong squared multiple correlations (R 2), which represent how much variation in an item is explained by the latent construct [44]. In our study, all loadings were significant at the 1% level except Item 7. Regarding Model 6, Table 5 shows that seven factor loadings were excellent (>.70), four were good to fair (.55–.45), and two, for Items 7 and 8, were very low (<.45) and hardly explained any variance in the respective construct [28]. Thus, 11 items were rated as reliable indicators, whereas Items 7 and 8 displayed poor reliability. For Model 6, the factors of Bowel Control and Impact on QoL had good alpha values and composite reliability(ρc), whereas the Bowel Pattern factor demonstrated low internal consistency (ρc = .51) and thus low reliability [45, 33]. Accordingly, the dimensionality seemed imprecise, as further pinpointed by Item 5’s far stronger loading on another factor than originally determined. Allowing Item 5 (i.e. “Do you have to rush to the toilet when you need to open your bowels?”) relate to the Bowel Control factor instead of the Bowel Pattern factor (Model 6) improved the model fit considerably. Based on the low reliability of the Bowel Pattern factor, we tested a two-factor solution excluding that factor. That solution (i.e. Model 8) showed good reliability, with highly significant factor loadings, good reliability coefficients and a nearly acceptable fit. Looking at the two-factor model including Item 5 in the Bowel Control factor, the solution had good reliability and clear dimensionality. However, to achieve a good model fit, some items had to be removed, namely Items 12–14, all of which solicit information about bowel accidents. In our study, respondents seemed to assume those three items sought to assess roughly the same thing, which generated substantial correlated error variance that again hampered the model fit.

In our investigation, the original three-factor structure with 17 items did not fit well with the data. Model 6, including three factors and 13 items, was the most parsimonious model with a good fit, whereas Model 8, including two factors with 10 items, was less parsimonious but demonstrated a statistical better fit. Both models contained identical versions of the factors of Bowel Control and Impact on QoL and differed only considering Model 6’s inclusion of the third factor, Bowel Pattern.

Content validity

To gauge the translated scale’s relevance, comprehensiveness and comprehensibility, cognitive interviews with patients from the target group and evaluations made by a multidisciplinary group of clinical experts deemed that the content and wording of the Norwegian ICIQ-B’s items corresponded well with the constructs intended to be measured—that is, the items captured AI’s complexity [29]. However, the items did not fit well into the three constructs, especially for the construct of bowel pattern, in which the items were overly broad and caused insufficient internal consistency, as also seen in the original British English version, the Spanish version and the American English version [18, 23, 24]. Moreover, the original ICIQ-B includes four unscored items not encompassed within the original dimensionality. The four items (i.e. Items 4 and 12–14) removed from the Norwegian scale, however, could be placed together with those four unscored items, which would support the Norwegian ICIQ-B’s clinical relevance.

The Norwegian ICIQ-B with the adapted factor structure demonstrated promising psychometric properties. The level of missing items in the questionnaire was acceptable, which confirmed that that the items were relevant, straightforward, and meaningful to the respondents. One item had more than 3% missing data, namely Item 18 (i.e. “Do you restrict your sexual activities because of your bowels?”), with 11% missing data. That outcome is unsurprising, because sexuality may be a sensitive topic or even be perceived as irrelevant. The absence of floor and ceiling effects demonstrated that the scale could produce a good distribution of responses to a given item and that scores at the scale’s upper or lower levels show no clustering or skewness. That measurement property is also important regarding the questionnaire’s discriminative power. For example, a maximum score would preclude recognising any potential improvement to the questionnaire following any type of intervention.

Scale reliability

Testing the Norwegian ICIQ-B demonstrated its good reliability in terms of internal consistency and excellent stability. While the Bowel Control factor had an acceptable Cronbach’s alpha, Impact on QoL factor had a good one. However, for the Bowel Pattern factor (α = .37), the reliability coefficient was unacceptably low (>.5) [46]. The poor reliability of the Bowel Pattern factor has previously been identified, including in the initial study by the scale’s developers [24, 18]. Consistent with the American English and Spanish versions of the ICIQ-B and the initial study performed by the developers [24, 23, 18], stability over time was excellent for all three constructs [47]. Furthermore, the Norwegian ICIQ-B demonstrated stability for 13 single items with largely substantial weighted kappa values, two with nearly perfect values, five with moderate values and one with a fair value [48]. The good test–retest reliability of an instrument ensures that measurements obtained are both representative and stable over time [29].


A major strength of our study was the rigorous methodology employed in translating and validating the Norwegian ICIQ-B following COSMIN guidelines [26]. However, some limitations should be noted. First, the sample size of 208 was scaled down to 161 due to missing data. The response rate was nevertheless sufficient to perform the analysis. Second, this study employed a rather wide time frame between test and retest with a risk for recall bias and changes in the respondent’s health status. Finally, it is worth noting that a good model fit does not guarantee that we have obtained ‘the true model’; other alternative models might fit the data equally well as the model found [49].


To determine the psychometric properties of the Norwegian ICIQ-B, we assessed the translated scale’s dimensionality, reliability, and content validity. The dimensionality seemed inaccurate. We were able to present a three-factor and a two-factor solution, both with advantages and disadvantages. The three-factor model represents the most parsimonious solution due to covering most of the original scale, albeit with unacceptably low reliability for the Bowel Pattern factor. The two-factor model demonstrates good reliability but is less parsimonious due to lacking seven of the original 17 items and excluded one of the constructs. For a statistically well-functioning measurement model able to be used in SEM or regression analysis, we consider the two-factor construct to be superior. By contrast, concerning the clinical relevance, breadth and nuances of the theoretical constructs, the three-factor solution consisting of 13 items is superior. In addition, the eight unscored and removed items may be used in a clinical context to provide more information about the patient’s condition. The two factors Bowel Control and Impact on QoL are identical in the two models in terms of included items and psychometric properties, meaning that the models differed only in Model 6’s inclusion of the Bowel Pattern factor, which may be used in a clinical context. Altogether, the Norwegian ICIQ-B has excellent reliability in terms of test–retest stability, good internal consistency for the two-factor model and good content validity.

The results recommend further studies of the Norwegian ICIQ-B’s psychometric properties to gain more in-depth clinical insights into improving the reliability and construct validity of the ICIQ-B as a measure of patient-reported outcomes. After all, a single study does not prove structural validity. On the contrary, structural validation is a continuous process of evaluation, re-evaluation, refinement, and development.

Availability of data and materials

Due to the sensitive nature of the dataset which the analysis is based upon, it is not publicly available. The present Norwegian legislation and the General Data Protection Regulation (GDPR) of the European Union does not allow sensitive data to be made openly accessible. In special cases data is available from the authors on reasonable request.



The International Consultation on Incontinence Questionnaire-Bowel


Anal Incontinence


Quality of Life


Faecal Incontinence


COnsensus-based Standards for the selection of health Measurement Instruments


Exploratory Factor Analysis


Confirmatory factor analysis


Root Mean Square Error of Approximation


Standardized Root Mean Square Residual


Comparative Fit Index


Tucker-Lewis Index


Cronbach’s alpha

ρc :

Composite Reliability


Intraclass Correlation Coefficient


Analysis of Variance


University Hospital of Northern Norway, Tromsø


Akershus University Hospital, Oslo


Standard Deviation


Standard Error of Measurement


Smallest Detectable Change


Squared multiple correlations


Modification indices


Degrees of freedom.


  1. Meyer I, Richter HE. Impact of fecal incontinence and its treatment on quality of life in women. Women’s Health. 2015;11:225–38.

    CAS  Google Scholar 

  2. Norton C, Whitehead WE, Bliss DZ, Harar D, Lang J. Management of fecal incontinence in adults: report from the 4th International Consultation on Incontinence. Neurourol Urodyn. 2010;29(1):199–206.

    Article  CAS  Google Scholar 

  3. Robson KM. Fecal incontinence in adults: Etiology and evaluation. UpToDate September 2020. Accessed 27 Jan 2022.

  4. Sharma A, Yuan L, Marshall RJ, Merrie AE, Bissett IP. Systematic review of the prevalence of faecal incontinence. Br J Surg. 2016;103(12):1589–97.

    Article  CAS  Google Scholar 

  5. Rømmen K, Schei B, Rydning A, Sultan AH, Mørkved S. Prevalence of anal incontinence among Norwegian women: a cross-sectional study. BMJ Open. 2012;2:e001257.

    Article  Google Scholar 

  6. Johanson JF, Lafferty J. Epidemiology of fecal incontinence: the silent affliction. Am J Gastroenterol. 1996;91(1):33–6.

    CAS  Google Scholar 

  7. Musa MK, Saga S, Blekken LE, Harris R, Goodman C, Norton C. The Prevalence, Incidence, and Correlates of Fecal Incontinence Among Older People Residing in Care Homes: A Systematic Review. J Am Med Dir Assoc. 2019;20(8):956–62.

    Article  Google Scholar 

  8. Saldana Ruiz N, Kaiser AM. Fecal incontinence - Challenges and solutions. World J Gastroenterol. 2017;23(1):11–24.

    Article  Google Scholar 

  9. Andrews CN, Bharucha AE. The etiology, assessment, and treatment of fecal incontinence. Nat Clin Pract Gastroenterol Hepatol. 2005;2:516–25.

    Article  Google Scholar 

  10. Shin GH, Toto EL, Schey R. Pregnancy and postpartum bowel changes: constipation and fecal incontinence. Am J Gastroenterol. 2015;110:521–9.

    Article  Google Scholar 

  11. Ternent CA, Fleming F, Welton ML, Buie WD, Steele S, Rafferty J. Clinical Practice Guideline for Ambulatory Anorectal Surgery. Dis Colon Rectum. 2015;58:915–22.

    Article  Google Scholar 

  12. Wallenhorst T, Bouguen G, Brochard C, Cunin D, Desfourneaux V, Ropert A, et al. Long-term impact of full-thickness rectal prolapse treatment on fecal incontinence. Surgery. 2015;158:104–11.

    Article  Google Scholar 

  13. Walma MS, Kornmann VN, Boerma D, de Roos MA, van Westreenen HL. Predictors of fecal incontinence and related quality of life after a total mesorectal excision with primary anastomosis for patients with rectal cancer. Ann Coloproctol. 2015;31:23–8.

    Article  Google Scholar 

  14. Rao SS. Diagnosis and management of fecal incontinence. American College of Gastroenterology Practice Parameters Committee. Am J Gastroenterol. 2004;99:1585–604.

    Article  Google Scholar 

  15. Madoff RD. Surgical treatment options for fecal incontinence. Gastroenterology. 2004;126:48–54.

    Article  Google Scholar 

  16. Wiebe S, Guyatt G, Weaver B, Matijevic S, Sidwell C. Comparative responsiveness of generic and specific quality-of-life instruments. J Clin Epidemiol. 2003;56(1):52–60.

    Article  Google Scholar 

  17. Cotterill N, Norton C, Avery KN, Abrams P, Donovan JL. A patient-centered approach to developing a comprehensive symptom and quality of life assessment of anal incontinence. Dis Colon Rectum. 2008;51(1):82–7.

    Article  Google Scholar 

  18. Cotterill N, Norton C, Avery KNL, Abrams P, Donovan J. Psychometric Evaluation of a New Patient-Completes Questionnaire for Evaluating Anal Incontinence Symptoms and Impact on Quality of Life: The ICIQ-B. Dis Colon Rectum. 2011;54:1235–50.

    Article  Google Scholar 

  19. The International Consultation on Incontinence Questionnaire. Accessed 20 Jan 2022.

  20. Lewis SJ, Heaton KW. Stool form scale as a useful guide to intestinal transit time. Scand J Gastroenterol. 1997;32(9):920–4.

    Article  CAS  Google Scholar 

  21. Black N. Patient reported outcome measures could help transform healthcare. BMJ. 2013;346:167.

    Article  Google Scholar 

  22. Basch E. Patient-Reported Outcomes - Harnessing Patients' Voices to Improve Clinical Care. N Engl J Med. 2017;376(2):105–8.

    Article  Google Scholar 

  23. Sacomori C, Lorca LA, Martinez-Mardones M, Benavente P, Plasser J, Pardoe M. Spanish Translation, Face Validity, and Reliability of the ICIQ-B Questionnaire with Colorectal Cancer Patients. J Coloproctol. 2021;41(4):340–7.

    Article  Google Scholar 

  24. Markland AD, Burgio KL, Beasley TM, David SL, Redden DT, Goode PS. Psychometric evaluation of an online and paper accidental bowel leakage questionnaire: The ICIQ-B questionnaire. Neurourol Urodyn. 2017;36(1):166–70.

    Article  Google Scholar 

  25. Lee JT, Madoff RD, Rockwood T. Quality-of-Life Measures in Fecal Incontinence: Is Validation Valid? Dis Colon & Rectum. 2015;58(3):352–7.

    Article  Google Scholar 

  26. Mokkink LB, Prinsen CAC, Patrick DL, Alonso J, Bouter LM, de Vet HCW, et al. COSMIN Study Design checklist for Patient-reported outcome measurement instruments. COSMIN. 2019; Accessed 7 Jan 2022.

  27. Gagnier JJ, Lai J, Mokkink LB, Terwee CB. COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Qual Life Res. 2021;30(8):2197–218.

    Article  Google Scholar 

  28. Netemeyer RG, Bearden WO, Sharma S. Scaling procedures: issues and applications. Thousand Oaks, California: Sage Publications; 2003.

    Book  Google Scholar 

  29. de Vet HW, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine – Practical guide to biostatistics and epidemiology. Cambridge: Cambridge University Press; 2011.

    Book  Google Scholar 

  30. ICIQ Validation methodology. Accessed 20 Jan 2022.

  31. Pett MA, Lackey NR, Sullivan JJ. Making sense of factor analysis: the use of factor analysis for instrument development in health care research. Thousand Oaks, California: Sage Publications; 2003.

    Book  Google Scholar 

  32. StataCorp. Stata Statistical Software: Release 17. College Station, TX: StataCorp LLC; 2021.

    Google Scholar 

  33. Mehmetoglu M, Jakobsen TG. Applied Statistics using STATA. A guide for the social sciences. Los Angelos: SAGE; 2017.

    Google Scholar 

  34. McCallum RC, Austin JT. Applications of structural equation modeling in psychological research. Annu Rev Psychol. 2000;51:201–26.

    Article  Google Scholar 

  35. Acock AC. Discovering structural equation modeling using Stata. Rev ed. Texas: Stata Press Books; 2013.

    Google Scholar 

  36. Schermelleh-Engel K, Moosbrugger H, Müller H. Evaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive Goodness-of-Fit. Measures. Methods of Psychol Res. 2003;8(2):23–74.

    Google Scholar 

  37. Kline RB. Principles and practice of structural equation modeling. 3rd ed. New York: Guilford Press; 2011.

    Google Scholar 

  38. Willis GB. Cognitive Interviewing - A “How To” Guide. Short course presented at the 1999 Meeting of the American Statistical Association. Accessed 19 Apr 2021.

  39. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.

    Article  Google Scholar 

  40. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.

    Article  Google Scholar 

  41. Qin S, Nelson L, McLeod L, Eremenco S, Coons SJ. Assessing test-retest reliability of patient-reported outcome measures using intraclass correlation coefficients: recommendations for selecting and documenting the analytical formula. Qual Life Res. 2019;28(4):1029–33.

    Article  Google Scholar 

  42. Tabachnick BG, Fidell LS. Using multivariate statistics. 6th ed. Boston: Pearson Education; 2013.

    Google Scholar 

  43. Osborne JW, Costello AB, Kellow JT. Best Practices in exploratory factor analysis. In: Osborne JW, editor. Best Practices in Quantitative Methods. California: SAGE Publications; 2008. p. 205–13.

    Chapter  Google Scholar 

  44. Raykov T. Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. Br J Math Stat Psychol. 2001;54(2):315–23.

    Article  CAS  Google Scholar 

  45. Hair JJ, Black W, Babin B, Anderson R. Multivariate data analysis. Upper Saddle River: Prentice Hall; 2010.

    Google Scholar 

  46. George D, Mallery P. SPSS for Windows step by step: A simple guide and reference. 11.0 update. 4th ed. Boston: Allyn & Bacon; 2003.

    Google Scholar 

  47. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63.

    Article  Google Scholar 

  48. Landis JR, Koch GC. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.

    Article  CAS  Google Scholar 

  49. Bollen KA. Structural equations with latent variables. New York: Wiley; 1989.

    Book  Google Scholar 

Download references


In carrying out this research project, we have received substantial assistance and practical help from health secretaries, nurses and surgeons in the outpatient gastrointestinal surgery clinics of St. Olav’s University hospital, Trondheim; University Hospital Northern Norway, Tromsø; and Akershus University Hospital, Oslo. We sincerely appreciate all the support we received.


Open access funding provided by Norwegian University of Science and Technology

Author information

Authors and Affiliations



SS, AGV and CN developed the study design. SS performed the data collection. SS and GH performed the data analyses and wrote the manuscript. All authors participated in critical revisions of the manuscript for important intellectual content, and all read and approved the final manuscript.

Corresponding author

Correspondence to Susan Saga.

Ethics declarations

Ethics approval and consent to participate

The Regional Committee for Medical and Health Research Ethics approved the research project (2009/1225), as well as institutional review boards at the three participating hospitals. Respondents were anonymous, voluntary, and provided consent for the study.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saga, S., Vinsnes, A.G., Norton, C. et al. Symptoms of anal incontinence and quality of life: a psychometric study of the Norwegian version of the ICIQ-B amongst hospital outpatients. Arch Public Health 80, 251 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: