Indicators of guideline-concordant care in lung cancer defined with a modified Delphi method and piloted in a cohort of over 5,800 cases

Background To identify indicators of guideline-concordant care in lung cancer, to implement such indicators with cancer registry data linked to health databases, and to pilot them in a cohort of patients from the cancer registry of the Milan Province. Methods Thirty-four indicators were selected by revision of main guidelines by cancer epidemiologists, and then evaluated by a multidisciplinary panel of clinicians involved in lung cancer care and working on the pathway of lung cancer diagnosis and treatment in the Lombardy region, Italy. With a modified Delphi method, they assessed for each indicator the content validity as a quality measure of the care pathway, the degree of modifiability from the health professional, and the relevance to the health professional. Feasibility was assessed using the cancer registry and the routine health records of the Lombardy region. Feasible indicators were then calculated in the cohort of lung cancer patients diagnosed in 2007–2012 derived from the cancer registry of the Milan Province. Criterion validity was assessed reviewing clinical records of a random sample of 114 patients (threshold for acceptable discordance ≤20%). Finally, reliability was evaluated at the provider level. Results Initially, 34 indicators were proposed for evaluation in the first Delphi round. Of the finally 22 selected indicators, 3 were not feasible because the required information was actually not available. The remaining 19 were calculated on the pilot cohort. After assessment of criterion validity (3 eliminated), 16 indicators were retained in the final set and evaluated for reliability. Conclusion The developed and piloted set of indicators is now available to implement and monitor, over time, quality initiatives for lung cancer care in the studied health system. Supplementary Information The online version contains supplementary material available at 10.1186/s13690-021-00528-0.


Background
Lung cancer remains a killer disease being the leading cause of cancer death for both genders in countries with a high Human Development Index (HDI), with world age-standardized mortality rates of 36.4 per 100,000 for males and 14.6 per 100,000 for females, accounting for approximately 22% of all cancer deaths in 2018 [1]. This despite the discontinuation in smoking habit occurred in high HDI countries that led to a reduction of incidence rates [2].
There is accumulating evidence that patients not receiving guideline concordant treatments have worse outcomes [3,4]. Hence, it is important to define and implement a set of indicators to verify concordance between guidelines and care delivered in clinical practice. Previous studies defining indicators to evaluate guideline-concordance in lung cancer care have been carried out in The Netherlands [5,6] and Ontario [7]. However, to serve the scope of quality of care improvement, the whole process from definition, to operationalization and pilot testing needs to be carried out in a specific health service [8,9]. Also, indicators have to be timely, reliable and economically computable on a large scale to inform quality improvement activities [10,11]. In the oncologic setting, this is impractical if indicators are retrospectively computed using data abstracted ad hoc from clinical records, while it is expected to be feasible using current health database linked to modern population cancer registers [12][13][14].
The aim of this project was to develop, with a multidisciplinary panel, a set of adherence to guidelines indicators concerning aspect of diagnosis, treatment and follow-up of lung cancer care and to pilot them in a cohort of lung cancer patients from the Cancer Registry of the Milan Province, a territory covered by the Agency for Health Protection of Milan, Italy.

Identification and selection of the indicators
We first identified relevant evidence-based practice guidelines and potential indicators from the literature and then, with a modified Delphi method, used clinical experts' knowledge to select the final set of indicators in two rounds. The whole process is summarized in Supplementary Figure 1. On 07 May 2015 we interrogated the SAGE clinical guidelines database [15] (Supplementary Material), that collects and evaluates according to the scheme AGREE II all published guidelines in English [16]. From the retrieved list, we selected guidelines with a score ≥ 50% for the 'rigour' and ≥ 30% for the 'applicability' domain. Additionally, we used the 2013 Lung Cancer Guidelines of the Italian Association of Medical Oncology (AIOM), which are published in Italian language [17]. Those guidelines are developed according to the Scottish Intercollegiate Guidelines Network (SIGN) and Grading of Recommendations Assessment, Development and Evaluation (GRADE) methodologies [18,19]. Systematic evaluation and selection of recommendations from the identified guidelines was carried out by one of the authors, and reviewed by a senior author. We included all recommendations whose compliance could be evaluated by process or short term outcomes indicators, and that could be theoretically calculated with data available from the cancer register and current health databases of the Lombardy Region. We excluded all recommendations that required data unavailable in the health databases to be verified (e.g. results of patient performance status assessment, advices given to patients). To identify previously developed indicators for lung cancer care evaluation, we searched the Pubmed database with no language and date restrictions on 20 April 2015 (Supplementary Material -Search strategy). We also searched the National Quality Measures Clearinghouse database on 05 May 2015 [20]. After screening on title and abstract, full-text of the relevant publications were examined. One of the author extracted all the indicators and matched them with one of the selected recommendation. If more than one indicator matched with a recommendation, we combined them into a single indicator. If there were minor differences in time windows or definition of the denominator, we maintained the more suitable for the database and health system under scrutiny. If a recommendation had no corresponding published indicator, a new one was proposed. This process was reviewed by other authors. The provisional list of indicators was then specified unambiguously and exhaustively in term of numerator, denominator, and data sources, and then organized in a questionnaire using the freely available survey tool eSurv (https:// esurv.org/). The survey consisted of five sections evaluating the following dimensions: organization, diagnosis, surgery, medical treatment and follow-up. Each indicator was presented with a title and a description of how the indicator would have been calculated from the register and the administrative data. Below it was asked to rate, using a seven-point Likert scale, the validity of the indicator as a quality of care measure, the possibility to modify the value of the indicator at the patient level by the health professional, and the usefulness of the indicator for self-assessment. Definitions provided to the panel are provided in English translation in the Supplementary Material. A free text field allowed to suggest changes to the indicator wording or to propose an alternative indicator. Panel designation is described in Supplementary Material. Participants were chosen on agreement to participate basis. Results of the first round were evaluated, both in terms of obtained scores and comments expressed in the free-text field. An indicator was retained if at least 75% of the panel members completing the questionnaire scored the validity question equal or greater than five, and at least 50% of them scored the modifiability and utility question equal or greater than four. The reduced list of indicators was also updated according to relevant comments in the free-text field. A summary of the results of the first round was sent to the clinicians who completed the questionnaire, including rates and proposed modifications. The second round was conducted during an inperson meeting, run as an informal discussion, with representatives of the panel.

Identification of the cohort and pilot calculation of the indicators
We calculated all the finally selected indicators on a pilot cohort derived from Cancer Registry of the Agency for Health Protection of Milan, Italy including all patients developing a lung cancer (International Classification of Diseases for Oncology version 3, first revision (ICDO-3) topographic codes C33-34) in the period 2007-2012 and registered with the Lombardy Regional Health Service. A subset of the area covered by the Agency was used, corresponding to fourteen municipalities around Milan with a total population of 1.546.237 inhabitants at 01/01/2013 [21]. The cancer register is accredited at the national level and included in the Volume XI of Cancer Incidence in Five Continents [22]. It is semi-automated, using multiple sources of information (i.e. inpatient, histopathology and death certificate databases) and a record linkage algorithm to match all information at the individual level [23] with the review of all cases for the assignment of morphology and stage. We calculated a modified version of the Charlson comorbidity index using both in and outpatient databases [24,25]. The date of incidence was defined, according to international cancer registration rules [26]. Exclusion criteria were: tumors identified only through death certificate and mesenchymal histology. For indicators applying to either small cell (SCLC) or non-small cell lung cancer (NSCLC), patients were classified as SCLC if the histology ICD-O-3 code was 8041-8045 and NSCLC in all other cases, including nonspecific histology codes and not cyto-histologically confirmed cases. To calculate the indicators at patient level we used the register and all available computerized sources of health information from January 2006 to December 2016, including outpatient diagnostic and therapeutic procedures, inpatient, prescription, and emergency access databases. We derived gender, age and stage at diagnosis from the register. First therapy was defined as described in Supplementary material. For each indicator, we assigned the patient to the hospital where he/she received the relevant procedure for the first time. We assessed feasibility of each indicator verifying that the data needed to calculate the indicators were actually available for the whole population in the administrative data. We assessed construct validity in a stratified (main histological type, stage, type of first treatment) random subsample of 114 patients, verifying that the required coding was reported in the administrative databases when the procedure had actually been performed according to the full clinical record, and reported number and percentage of discordances. The threshold of acceptable discordances to retain the indicator was set at equal or lower than 20%. We first computed the indicators overall, and in relevant subgroups of patients, and the assessed their clinimetric properties. Potential improvement was evaluated analyzing median value. We also computed number of providers scoring either 0 (flooring effect) or 100% (ceiling effect). Finally, we evaluated the reliability of the indicators, that is a measure that can vary between 0 and 1 and describes how well one can confidently distinguish the performance of one provider from another measuring the signal (variability of the indicator that can be attributed to real differences in performance) to noise (random variability of the indicator value) ratio [27]. We calculated the reliability of each indicator for all providers, transformed it to percentage to interpret as the percentage of variability that can be attributed to real difference in performances, and reported the number of providers with a reliability equal or greater than 70%, which is an accepted cut-off for good reliability [28,29].

Statistical analysis
Descriptive statistics including quartiles were calculated for the validity, modifiability and utility scores received by each indicator in the first Delphi round. Differences in the distribution of covariates across gender, stage or years were assessed using the Manthel-Haenszel test, analysis of variance or Wilcoxon test, as appropriate. All tests were two-sided with a significance level of 0.05. We calculated each indicator as the proportion of patients who received the procedure in the defined time window among those defined as eligible. Indicators were calculated also stratified by age (≤60, 61-70, ≥70 years), sex, Charlson index (0, 1-2, ≥3) and stage (I to IV). To assess reliability, we measured noise as the variance of a proportion, and signal using the variance of the random effect of a hierarchical logistic model with the indicator value as the outcome, age, gender, stage and Charlson index as first level predictors, and provider as the only second level variable. All analyses were performed with SAS software (v.9.4, SAS Institute, Cary NC).

Review of the literature and identification of candidate indicators
The search of the SAGE clinical guidelines database retrieved 45 records (Supplementary Figure 1). Ten guidelines did not include NSCLC or SCLC, 11 had no AGREE II evaluation, and 18 guidelines had a rigour < 50% or an applicability < 30%. The six remaining guidelines were used to extract recommendations [30][31][32][33][34][35] .The literature search for lung cancer indicators retrieved 557 records. Based on title and/or abstract, n = 532 were excluded as not relevant and n = 8 because they were guidelines already included in the SAGE database search results. Three manuscripts were excluded on the basis of the full-text evaluation [36][37][38]. Indicators were extracted from the 14 remaining papers [7,9,[39][40][41] and matched with a guidance. A set of n = 34 indicators, each assessing a recommendation, was included in the web-based questionnaire for the first round of the Delphi process.

Results of the Delphi process
A total of 225 physicians were invited to participate to the Delphi process. The number of questionnaires compiled at least partially was 95 (42% of respondents); 85 questionnaires were completed in full. Thirteen indicators did not satisfy the selection criteria and were excluded from the second round. One additional indicator evaluating time from thorax computed tomography (CT) to surgery was proposed by panel members (O4). Four indicators were partially modified based on the comments received in round 1. The meeting constituting the second round led to approval of the changes in the indicator formulation but did not modify the number of indicators. Table 1 reports the finally 22 selected indicators with first round scores.

Pilot cohort
In the 2007-2012 period there were 5860 cases of lung cancer (International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD10) topographic codes C33-34) in the resident population of the Milan province. Death certificate only (n = 88) and mesenchymal histology (n = 26) cases were excluded. This left n = 5746 patients, 91% NSCLC and 9% SCLC (Supplementary figure 2). Seventy-five percent were males and 55% of cases were in the 66-80 years range ( Table 2). A significant increase in the proportion of female was observed over time from 24% in 2007 to 28% in 2012 (p = 0.009), while there were no significant changes in age distribution (p = 0.8). Overall 43% of patients had a Charlson comorbidity index greater than zero, with males having more comorbidities (mean Charlson index 1.5 for male vs. 1.2 for females, p < 0.001). No differences in Charlson index distribution were observed with stage (p = 0.5).

Clinimetric properties of the indicators
Indicators were calculated on the pilot cohort and their values are reported in Table 3. Regarding feasibility, the indicator 'palliative care before death' (M3) was actually not feasible because the required administrative datasets were not available for the entire population, i.e. available for some municipalities but not others. The same was true for 'Multidisciplinary evaluation' (O3), as the code was used only from 2 out of 92 providers in the analyzed years, and 'Functional evaluation before surgery' (S3) as the required exams were systematically never billed in the pre-admission consultation to surgery.
Three of the indicators did not meet the criterion validity threshold (i.e. 20% of discordances) and were eliminated (Table 4): 'Treatment with curative intent preceded by PET' (D3), 'SCLC patients fully staged' (D6), and 'Patients with a thorax CT ≤30 days before surgery' (S2). The indicator 'SCLC patients undergoing medical oncologic therapy or radiotherapy' (M2) did not have a high reliability (≥ 70%) in none of the 56 evaluated providers (Table 5). Also, the indicators O1, O4, D1 and S5 had a high reliability in less than 50% of the evaluated providers.
Calculation of the organizational indicators showed that the proportion of patients having rapid access to treatment after first contact or diagnostic procedures did not differ between men and women and was higher in younger, healthier and more advanced stage patients (O1-O4, Table 3). Half of the hospitals had values of the indicator 'First contact to first therapy ≤60 days' (O1, Table 4) equal or lower that 68%. Only 4% of stage II-III NSCLC patients received chemoradiation (M1). 'Pain management before death' (M4) had an overall value of 92% but a low improvement potential (median value across providers 98%). Overall, 72% of patients surviving to the end of the second year had a follow-up contact (F1).

Discussion
This article describes the development, based on the evidence based guidelines and expert consensus, of a set of indicators aimed at monitoring concordance of lung cancer therapy to guidelines and its piloting on 5746 lung cancer patients diagnosed over a 6-year period.    One of the major strengths of this work is the participation of a high number of clinicians involved in the different steps of lung cancer care, guaranteeing that the identified reference guidelines were endorsed and perceived as relevant. Secondly, identifying the cohort with a cancer register allows to adequately define the denominators of the indicators, solving the problem of accurately identify the target population that has been encountered using administrative records only [42]. Also, lung cancer guidelines differ by main histological type (SCLC vs. NSCLC) and by tumor stage. The definition of the cohort from the cancer register also permits to reliably identify the histology.
Lung tumors still have a poor overall prognosis, which is certainly related to the often advanced stage at diagnosis [43]. However, there is evidence that care not concordant with evidence-based guidelines worsen survival and that actual delivered care is often not adherent to those guidelines, especially for advanced stages and SCLC [3,4,6,44,45]. The report from Nadpara et al., examining a large cohort of lung cancer cases from the Surveillance, Epidemiology, and End Results register linked with Medicare database, investigated variations in guideline-concordant care among the elderly, highlighting that a lower income, increasing age and being non-white reduced the probability to receive guideline-concordant care [4]. The results of our study also highlight differences in adherence to standards with age: older patients received curative surgery less frequently and this was independent from the comorbidity burden.
With the exception of S1 and S6, which are short term outcome-indicators, it was decided to focus on process indicators measuring adherence to guidelines. Process indicators have been criticized as they do not directly measure the outcome [46,47], which is the main interest both in a patient-centered and public health perspective. A small set of outcome indicators has been developed also for lung cancer [48]. However, process indicators have the advantage to be more sensitive to quality of care differences, not heavily depending on external   factors which are not directly under the control of care providers as it happens for outcome indicators [46]. Especially in a setting such not very early lung cancer, where prognosis is still dismal even when receiving the best standard of care, we think that process indicators are fundamental to assess heterogeneity between providers and implement quality improvement actions [49]. Of note, some of the indicators had a low reliability in a large percentage of the providers, suggesting caution in the interpretation of the value of indicators for those providers.
The use of the current administrative databases offers the possibility to implement the indicator on a population scale and to repeat the measurements in time at a reduced cost. However, it introduces some limitations. The first one is that we were not able to calculate three of the indicators produced by the Delphi consensus process because the investigated procedures were not coded in the necessary databases. We also had to drop three indicators having more than 20% discordances between reviewed clinical records and indicators calculated from the administrative databases. However, this is an important information to plan quality improvement initiatives for the future, raising the problem to consistently employ the codes with the clinicians involved in lung cancer care. Also, when using administrative data to calculate health indicators and define case-mix variables (e.g. comorbidity index), concerns about data quality are always present [25,50,51], particularly with regard to lack of accuracy in coding and coding variability across professionals. However, they are the larger, more systematic and continuous in time source of health information. Also, specifically to our study, Lombardy health databases are quality checked for reimbursement purposes and have also been found of good quality in several studies [52,53]. It is also important to acknowledge that not all case-mix variables that are important to fairly compare providers [48] were available such as smoking status and performance status. Finally, the second round of Delphi method was an informal face to face criterion process, that could have been biased by strong opinion leaders. However, after the first round, a high agreement rate was obtained The indicators Follow-up in year 2, 3 and 4 (F1) for surviving patients were not evaluated as full clinical records, including outpatient visit reports, were not available for the follow-up period a procedure performed and in the correct time frame according to clinical record (indicator = 1) and corresponding codes not found in administrative databases in the correct time frame (indicator = 0). b indicator with discordance percentage higher than 20%, that was set as the a-priori acceptable threshold value

Conclusions
The developed and piloted set of indicators is now available to implement and monitor, over time, quality initiatives for lung cancer care in the studied area.
authorship criteria and that no others meeting the criteria have been omitted. The author(s) read and approved the final manuscript.

Funding
This work was supported by the Italian Ministry of Health [RF 2011-02348859 to A.G.R].

Availability of data and materials
The dataset from this study is held securely at the ATS of Milan, Epidemiology Unit. Data sharing agreements prohibit the ATS of Milan from making the dataset publicly available. The full dataset creation plan and underlying analytic code are available from the authors upon request.
Ethics approval and consent to participate Ethics approval and consent to participate were not required, as this is an observational study based on data routinely collected by the Agency for Health Protection (ATS) of Milan, a public body of the Regional Health Service -Lombardy Region. The ATS has among its institutional functions, established by the Lombardy Region legislation (R.L. 23/2015), the government of the care pathway at the individual level in the regional social and healthcare system, the evaluation of the services provided to, and the outcomes of, patients residing in the covered area. This study is also ethically compliant with the National Law (D.Lgs. 101/2018) and the "General Authorisation to Process Personal Data for Scientific Research Purposes" (n.8 and 9/2016, referred to in the Data Protection Authority action of 13/12/ 2018).

Consent for publication
Not applicable.