- Study protocol
- Open Access
- Open Peer Review
The EMPCAN study: protocol of a population-based cohort study on the evolution of the socio-economic position of workers with cancer
Archives of Public Healthvolume 77, Article number: 15 (2019)
The improvements in cancer control led to an increase in the number of cancer survivors, notably, in the working age population (16–64 years). There is a strong need to assess and understand their reintegration on the labour market, which underlines and ensures their social integration and quality of life. The objectives of the EMPCAN study is therefore to measure the scale of return-to-work after cancer and to identify the determining factors, allowing for the implementation of an adequate socio-professional support.
We requested data from the Belgian Cancer Registry and the Crossroad Bank for Social Security. We included all socially insured Belgian workers diagnosed between 2004 and 2011 with colorectal, breast, head & neck, prostate, testis, lung and corpus uteri cancer. The end of (administrative) follow-up was 31st December 2012. We include demographic, health-related and work-related factors in the analysis and observed how these factors interplay to determine the working status. After having solved legal, ethical and technical issues for the coupling, we will perform survival analysis with competing risks using the Fine and Gray model; we will also perform a multistate model using transitions probabilities; and finally, a group-based modeling for longitudinal data using the ‘proc traj’ package in SAS.
The results of the EMPCAN study will allow the provision of an evidence-based support to professional reintegration policies. It will also bring some key features for the prediction of the cancer-related social security needs. Besides the raise of awareness among health professionals and policy makers, this study could lead to a better planning and organization of vocational rehabilitation programs.
Since two decades, improvements in cancer control led to an increased survival and subsequent prevalence of cancer survivors  with differences within and among European Members States . In Belgium, 40% of the new cancer diagnoses are detected in the working age population (16–64) . For this group, the ability to keep or resume work means maintaining the level of the household income, self-esteem, back to “normal”, to feel being cured, preservation of the quality of life, etc. [4,5,6,7,8].
The literature reports several obstacles to the RTW after cancer, directly related to the cancer and its treatments, but also depending on the work environment, personal beliefs and the reevaluation of the importance of working [10, 11]. In most European countries, authorities are reflecting on the legal and practical framework for the RTW after a long sickness absence .
At the moment in Belgium, no population-based study reports on estimates of the size of this phenomenon . Worldwide, there have been some attempts to measure the rate of RTW after cancer [10, 14,15,16], mainly through cross-sectional studies, using samples of survivors from specific groups of cancer patients . These studies report RTW rates varying between 40 to 80% .
The Cancer Centre of Sciensano, in collaboration with the School of Public Health of the Université Libre de Bruxelles designed and conducted the EMPCAN study. The objective of the current manuscript is to describe the design of the EMPCAN study which aims at quantifying the RTW after cancer. More specifically the objectives of EMPCAN are (1) to estimate the RTW rate after cancer diagnosis in Belgium; (2) to explore the differences in the rates and time to RTW according to demographic characteristics, cancer site, stage, treatment and type of employment and (3) to compare the rate of working cancer survivors with the general working population.
Description of the study
The EMPCAN study is a retrospective population-based cohort study, using data coming from three Belgian administrative registers. It includes workers aged 16–64 years, diagnosed in Belgium with breast, colorectal, lung, corpus uteri, prostate, head and neck or testis cancer between 1st January 2004 and 31st December 2011.
In the following, we describe (1) how we have decided on the variables to ask to the registers; (2) which procedures were followed to allow the linkage as well as the inclusion and exclusion criteria; (3) the list of variables included and (4) the type of analysis that we will perform.
Study population and inclusion/exclusion criteria
The first inclusion criteria is the age (16–64 years old at the date of incidence) and the cancer-related information: the date of diagnosis (between 1st January 2004 and 31st December 2011) and the type of cancer (head and neck cancer C00-C14 and C30-C32; lung cancer C34; colorectal cancer C18-C20; breast cancer C50; corpus uteri C54; prostate C61 and testis cancer C62.
The second criteria concern the working status of the patient at the date of incidence and during the three preceding quarters. We excluded those patients who were long-term unemployed, disabled, handicapped or on sick leave at these dates.
The first cancer-related data available for the whole country started from 2004, the year which EMPCAN starts to include patients. As administrative health and social security registration in Belgium can have up to two years delay and that we started to design the EMPCAN study in 2015, we selected patients with a date of diagnosis between 1st January 2004 to 31st December 2011, so that the last who entered have at least 2 years of follow-up.
The Fig. 1 presents the dataflow of the personal data from the providers to the researchers. The BCR started the process, with the selection of patients diagnosed from 1st January 2004 to 31st December 2011 included, with: head and neck cancer (C00-C14 and C30-C32); lung cancer (C34); colorectal cancer (C18-C20); breast cancer (C50); corpus uteri (C54); prostate (C61) and testis cancer (C62). Among them, the BCR only retained those aged 16–64 at the date of incidence. The list of national social security numbers (NSSN) of these patients was sent to the CBSS, which checked whether these patients were professionally active during the three quarters precedingFootnote 1 the month of incidence. In line with privacy protection requirements and to ensure that the workers are unidentifiable, the BCR took a random sample of 75%.
The BCR and the CBSS sent, separately, the extracted data with the corresponding NSSNs to eHealth, via an electronic flow. The public institution eHealth is the only one authorized to proceed the linkage of data based on the NSSN. In our study, the main task of eHealth was to pseudo-anonymize the NSSNs, i.e. to generate new IDs for each NSSN and to keep the pseudo-anonymization key.
The pseudo-NSSNs and their associated data were transmitted by eHealth to HealthData.be which organized (1) the data storage on a secured server, (2) the transcription of the data into SAS data files and (3) a secured virtual private network (VPN) connection to the server for the researchers.
Finally, the researchers accessed the data that was provided through 12 SAS files with a total amount of 52 variables (Tables 1 and 2). Each file contains information related to a unique ID on individual level allowing retrieving data on the individuals in each file, along with the quarter corresponding to the socio-economic position observed (cross-sectional data). Each individual included can be observed for a maximum of 36 quarters (9 years).
Procedures for data collection
Selection of the determinants to investigate
In order to identify the determining factors of RTW after cancer, we conducted a literature review . As the contextual factors (political and socio-economic) may be important in the RTW pathway [17,18,19], we presented the results to a group of field experts from different domains of expertise (oncology, occupational health, social security, academics, etc.) who confirmed and specified the determining role of each factor identified in the literature, into the Belgian context.
This exercise resulted in the identification of four groups of RTW determinants: socio-demographic, disease-related, work-related and person-related. Given the fact that the cohort that will be used will be selected within the cancer registry and that the study will use administrative data, the last domain, the person-related factors, cannot be investigated (Table 1).
After having identified and selected the determinants, we clarified the outcome (or main event of interest). According to the sickness absence literature , different outcomes are used to study the RTW after cancer [21, 22]: the ability to work, the ability to the return-to-work, the work performance, the job retention, the quality of life, etc. [7, 23]. In the EMPCAN study, the primary goal focuses on RTW after cancer. In the Belgian social security system, the working status refers to a socio-economic position, which is the position that a socially insured Belgian resident has on the labour market (Table 2). Hence, four main socio-economic positions are observed (Table 2): occupied (with four underlying classes), job seeker (with four underlying status), inactive (with five subgroups), and the others.Footnote 2 The national social security scheme (SSS, Fig. 2) allows for the combination of up to three socio-economic positions at the same time, for one individual.
Identification of data sources and providers
Based on the list of variables that were identified, we looked into the Belgian healthcare registries to find from where the data could have been retrieved. Three main sources of information or registries were identified: (1) the Belgian Cancer Registry, (2) the Intermutality Agency and (3) Labour Market Datawarehouse.
For population-based cancer-related information, the Belgian Cancer Registry (BCR) was the most appropriate data source. The BCR collects information about all new diagnosed cancers in Belgium. The BCR receives cancer-related information from three sources: (1) the hospitals where (cancer) data managers are trained to retrieve the appropriate data; (2) the health insurance companies and (3) the laboratories performing the microscopic confirmation of malignancy (Fig. 3).
Based on these information, the BCR maps out the nature and extent of cancer in Belgium.
In addition, the cancer treatments were identified as important determining factors, having short, middle and long-term impact on the quality of life and health status of cancer survivors [24,25,26,27,28]. In Belgium, all treatments that are covered by the social security are registered by the health insurances, and transmitted to an overarching institution, the InterMutuality Agency (IMA). When it comes to cancer treatment, the BCR and the IMA have concluded an agreement to allow the BCR to couple all their cases with the treatment information of IMA.
Several social security institutions could provide us with the required data with regard to the socio-economic position and its underlying information. However, the Labour Market Datawarehouse (DWH MT&PS) was created in 1999, within the Crossroad Bank for Social Security (CBSS), with the aim to register and aggregate permanently administrative information of Belgian residents. Since 2003, 16 institutions send quarterly their information to the DWH MT&PS which has now information on 85% of the Belgian population.
As there was no precedent of such data coupling in Belgium, these different steps of coupling and controlling privacy, required more than two years to finally submit our demand to the national Privacy Commission, which provided its positive opinion in January 2016; this authorization started the data flow process among the registers and eHealth and the final files were made available for the researchers in October 2017 (Additional file 1).
Data management and storage
According to the legal prescriptions and recommendations from the national Privacy Commission (based on the European Commission Communication ), the dataset resulting from the coupling of personal data coming from different registries has to be stored on a platform that is separated from those of the primary sources and a remote access has to be organized for the researchers to access the data.
For the EMCPAN study, this technical management (virtual private network connexion) is ensured by the HealthData.be service of Sciensano. In 2006, the national knowledge centre (KCE)  performed an inventory of the healthcare databases in the country, which revealed the existence of almost 150 different databases. In order to stimulate research, HealthData.be has been created, with the aim of standardization, homogenization and centralization of healthcare information, respecting the confidentiality principles for both patients and providers.
An additional requirement, in line with the proportionality principle, had to be foreseen: a small cell analysis will be conducted in order to balance the probability of person identification versus the level of detail of the data for the individuals included in the cohort.
Exploratory data analysis
We will start with an exploratory analysis to understand the inclusion/exclusion flow, to clean the data (identify and exclude repetitions and inconsistencies, Fig. 4) and describe the main characteristics of the individuals included in the EMCPAN study.
The main objective is to explore the possible effect modification of variables (e.g. between cancer, stage and treatment) on the outcome.
The logistic regression is often used for multivariate analysis purpose in epidemiology, to measure the association between event(s) of interest (dependent variables) and determining factors (independent variables). According to the main objectives of the EMPCAN study, we will use the logistic regression considering the (return to the) professional activity as the main event of interest, whether it is part-time or full-time. The determining factors are those presented in Table 1.
To ensure proper estimation of this event, we will have to take into account the probability of occurrence of other events, i.e. death and retirement. We will therefore use the Fine and Gray model  which allows for competing risks analysis and provision of cause-specific hazards.
Unlike in most Western countries, disability in Belgium is not an irreversible status and is therefore not granted until death or retirement. For that reason, disability is not considered as a competing event to work.
Multistate life tables (MSLT) and transitions among socio-economic positions
An important aspect of our data and event of interest is that workers can enter and exit the different socio-economic positions, over time. To capture this reality, the best method is to use a multistate model.
If we consider the socio-economic positions as the states, we would have 14 different states in the model and 39 possible transitions. As our main interest is the working status, we reduced the model by focusing on four states: 100% working, 100% not working, part-time working and death (Fig. 5).
We will model these 9 possible transitions, among the four states and will incorporate covariates to explain (possible) differences among workers with cancer in the course of the follow-up.
Life course perspective
In addition of the transversal perspective and results that the logistic regression offers, we looked for a model allowing a longitudinal approach, emphasizing the employment trajectories and the transitions between the spectrum of the socio-economic positions (Table 2), among the workers with cancer.
For the trajectories, we will conduct a mixture modeling exercise that can be described as:
xi time-stable predictor variables.
Rit time-dependent variables.
Up to the polynomial of order 4 due to software limitations.
An important step of the modeling is the model selection for which two aspects are assessed: the optimal number of components and the appropriate degree of the polynomial modeling each group trajectory. The Bayesian information criterion (BIC) will be used to select the optimal model. It can be used to assess the fit of both nested and unnested models under fairly general circumstances.
The best model will have the maximum BIC i.e. the least negative value.
The SAS procedure Proc Traj command computes the BIC as follows:
where, L is the maximized value of the log likelihood, n is the sample size and k is the number of parameters in the model.
Legal and ethical approval
Legal and technical requirements
The EMPCAN study has been designed by the Cancer Centre of Sciensano in collaboration with the Research Centre for Environmental and Occupational Health of the School of Public Health of the Université Libre de Bruxelles. Both institutions are owners of the results.
As prescribed by the European Commission Communication of 2012 , the exchange of health-related personal data has to be supported and organized by eHealth services. These governmental institutions play the important role of the third trusted party in charge of bringing personal data from different sources together.
Legal and ethical issues for coupling personal data
The coupling of personal data performed in the EMCPAN study has been realized before that the 2018 EU General Data Protection Regulation (GDPR) came into force. The legal framework in which this linkage has been done, relates on four main legal bases presented in the Table 3.
The execution of this linkage has been discussed with the two main data providers (BCR and CBSS) and HealthData.be, first separately. We presented the main objectives of the study, discussed the data required and we asked for the internal legal, ethical and technical requirements that we had to meet (e.g. the presentation of the project to internal boards).
Second, several preparatory meetings took place with all the partners in order to discuss and ensure the technical and ethical feasibility of the study.
The main points under discussion concerned the definition of the cohort (inclusion and exclusion criteria) and the choice of the variables which were the most appropriate regarding the study objectives, i.e. the discussion of the proportionality principle.
For the technical and ethical aspects, we received advices and recommendations of lawyers from the governmental platform eHealth. Their two main concerns were: (1) the fact that the final coupled dataset has to be stored on a completely separated server and that (2) the data holder has to foresee access for the researchers via a virtual private network (VPN) connexion.
Besides this protocol, at least three publication in peer reviewed journals are foreseen. A first paper with the description of the cohort and the results of the logistic regression. A second paper will focus on the lifecourse perspective and the trajectories observed among the workers included in the study. A third paper will concern the multistate model and the transition probabilities among the different socio-economic positions.
Strengths and limits of the EMPCAN study
The main strength of the EMPCAN study is its representativeness. A recent review on quantitative studies on RTW after cancer  reports that only one on the twelve included studies was population-based. The main reason is probably that the coupling of health, socio-economic and administrative data requires many legal and ethical steps. Another one is that most of these data are not available neither accessible at national level.
The main limit of EMPCAN is that it doesn’t include three important aspects to completely (comprehensively) understand and explain the RTW after cancer. First, the patient’s experience and self-assessment of their health status and (new) occupational aspirations are not captured in administrative data. Second, we do not include the employer’s perspective with their ability and willingness to reintegrate workers with cancer. Thirdly, cancer-related information is provided at baseline as the treatments are those provided during the first twelve months following the month of incidence, which limits the possibility to appreciate the health status of the workers on the following years on which he is observed. Similarly, cancer treatment related or preexisting comorbidities could be of high importance in the ability to work but are currently missing in our data.
However, in a previous study, we build a proxy for the health status, using the relative survival corresponding to the cancer site . This exercise could also be performed in EMPCAN, using in addition of the relative survival, the stage at diagnosis and the treatments received.
The EMPCAN will bring important insights on the RTW after cancer and will allow the identification of the determining factors to be considered for the development of occupational rehabilitation interventions. The combination of the results from both cross-sectional and longitudinal approaches will allow a better understanding and explanation of the changes and stability of the socio-economic positions in the years after the cancer diagnosis. This should provide elements for the identification of impediments or facilitators that relate to the social security scheme.
Eventually, the more specific needs and remaining gaps in the knowledge regarding the RTW after a long sickness absence will be identified and suggested as future research perspectives.
The available information at the CBSS corresponds to the socio-economic position(s) of the socially insured individual at the last day of the quarter.
This group mainly include cross-border and international workers, seamen and housewives/men.
Belgian Cancer Registry
Baysian Information Criterion
Crossroad Bank for Social Security
- DWH MT&PS:
Datawarehouse Marché du travail & Protection sociale (Datawarehouse Labour Market & Social Protection)
Funds for Occupational Diseases
- FPP SI:
Federal Public Program Social Inclusion
- FPS SS:
Federal Public Service Social Security
General Data Protection Regulation
Knowledge center of expertise
National Employment Office
National InterMutualist College
National Office for Family Allowances
National office for Social Security
National population Registry
National social security number
Social Security Scheme
Virtual Portal Network
IARC. GLOBOCAN 2012: estimated cancer incidence, mortality and prevalence worldwide in 2012: World Health Organization; 2014. https://gco.iarc.fr/
Coleman MP, Gatta G, Verdecchia A, Esteve J, Sant M, Storm H, et al. EUROCARE-3 summary: cancer survival in Europe at the end of the 20th century. Ann Oncol. 2003;14(Suppl 5):v128–49.
Belgian Cancer Registry. Cancer incidence Projections in Belgium 2015 to 2025. 2017.
Tiedtke C. de RA, Dierckx de CB, Christiaens MR, Donceel P. Experiences and concerns about 'returning to work' for women breast cancer survivors: a literature review. Psychooncology. 2010;19(7):677–83.
Tiedtke C. Dierckx de CB, de RA, Christiaens MR, Donceel P. Breast cancer treatment and work disability: patient perspectives. Breast. 2011;20(6):534–8.
Feuerstein M. Cancer survivorship and work. J Occup Rehabil. 2005 Mar;15(1):1–2.
Mehnert A. Employment and work-related issues in cancer survivors. Crit Rev Oncol Hematol. 2011;77(2):109–30.
Mehnert A, de BA FM. Employment challenges for cancer survivors. Cancer. 2013;119(Suppl 11):2151–9.
Mehnert A, Koch U. Work satisfaction and quality of life in cancer survivors in the first year after oncological rehabilitation. Work. 2013;46(4):407–15.
de Boer AG, Taskila T, Ojajarvi A, van Dijk FJ, Verbeek JH. Cancer survivors and unemployment: a meta-analysis and meta-regression. JAMA. 2009;301(7):753–62.
Kiasuwa MR, Otter R, Mortelmans K, Arbyn M, Van OH, Bouland C, et al. Barriers and opportunities for return-to-work of cancer survivors: time for action-rapid review and expert consultation. Syst Rev. 2016;5(1):35.
OECD. Sickness, Disability and work. Breaking the barriers. 2010.
R.Kiasuwa Mbengi ROEAEGCBeCdB. Utilisation de l'échantillon permanent (eps) pour l'étude du retour au travail après cancer. Défis et opportunités pour la recherche. Rev Med Brux 2018;78–86.
Amir Z, Brocky J. Cancer survivorship and employment: epidemiology. Occup Med (Lond). 2009;59(6):373–7.
Paltrinieri S, Fugazzaro S, Bertozzi L, Bassi MC, Pellegrini M, Vicentini M, Mazzini E, Costi S. Return to work in European Cancer survivors: a systematic review. Support Care Cancer. 2018;26(9):2983–94.
Alexanderson K, Leijon M, Akerlind I, Rydh H, Bjurulf P. Epidemiology of sickness absence in a Swedish county in 1985, 1986 and 1987. A three year longitudinal study with focus on gender, age and occupation. Scand J Soc Med. 1994;22(1):27–34.
Kiasuwa Mbengi et al. How do social security schemes and labor market policies support the return-to-work of cancer survivors? A review article. J Cancer Policy. In press 2018.
Loisel P. A tool for assessing rehabilitation and workplace factors. IEEE Eng Med Biol Mag. 2001;20(6):35–6.
Kiasuwa Mbengi R, Tiraboschi M, de Brouwer C, Bouland C. How do social security schemes and labor market policies support the return-to-work of cancer survivors? A review article. J Cancer Policy. 2018;15:128–33.
Angelique d R. Work Disability Theories: A Taxonomy for Researchers. In: Loisel P, Anema JR, editors. Handbook of Work Disability: Prevention and Management: Springer; 2013. p. 475–99.
Alexanderson K, Hensing G. More and better research needed on sickness absence. Scand J Public Health. 2004;32(5):321–3.
Alexanderson K, Norlund A. Swedish council on technology assessment in health care (SBU). Chapter 1. Aim, background, key concepts, regulations, and current statistics. Scand J Public Health Suppl. 2004;63:12–30.
Feuerstein M, Todd BL, Moskowitz MC, Bruns GL, Stoler MR, Nassif T, et al. Work in cancer survivors: a model for practice and research. J Cancer Surviv. 2010;4(4):415–37.
Balak F, Roelen CA, Koopmans PC, Ten Berge EE, Groothoff JW. Return to work after early-stage breast cancer: a cohort study into the effects of treatment and cancer-related symptoms. J Occup Rehabil. 2008;18(3):267–72.
Cooper AF, Hankins M, Rixon L, Eaton E, Grunfeld EA. Distinct work-related, clinical and psychological factors predict return to work following treatment in four different cancer types. Psychooncology. 2013;22(3):659–67.
Gudbergsson SB, Fossa SD, Borgeraas E, Dahl AA. A comparative study of living conditions in cancer patients who have returned to work after curative treatment. Support Care Cancer. 2006;14(10):1020–9.
Miller KD, Siegel RL, Lin CC, Mariotto AB, Kramer JL, Rowland JH, et al. Cancer treatment and survivorship statistics, 2016. CA Cancer J Clin. 2016;66(4):271–89.
Silver JK, Baima J. Cancer prehabilitation: an opportunity to decrease treatment-related morbidity, increase cancer treatment options, and improve physical and psychological health outcomes. Am J Phys Med Rehabil. 2013;92(8):715–27.
eHealth Action Plan 2012–2020 - Innovative healthcare for the 21st century, European Commission, (2012).
Stefaan Van de Sande DDWNSeal. Inventory of healthcare databases. 2006.
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509.
Kiasuwa Mbengi RL, Nicolaie AM, Goetghebeur E, Otter R, Mortelmans K, Missinnne S, et al. Assessing factors associated with long-term work disability after cancer in Belgium: a population-based cohort study using competing risks analysis with a 7-year follow-up. BMJ Open. 2018;8(2):e014094.
We very much thank Dr. Marc van den Bulcke, head of the Belgian Cancer Centre; Prof. Dr. Herman van Oyen, head of the DO Epidemiology & Public Health; Dr. Elisabet van Eycken and Dr. Harlinde de Schutter, from the Belgian Cancer Registry; Mr. Chris Brijs, from the Crossroad Bank for Social Security, as well as our colleagues Johan van Bussel, Jan Kips and Kris Vranken from HealthData.be, Sciensano for their support in preparing the EMPCAN study.
The EMPCAN study has been funded in the framework of the PhD Thesis of Mrs. Régine Kiasuwa Mbengi, by the National Institute for Health and Disability Insurance.
Availability of data and materials
The EMPCAN database is stored on a server located at the Belgian Ministry of Economy and a Virtual Private Network connexion has been organized for the researchers who can access the data after a double identification check.
Ethics approval and consent to participate
The EMPCAN study received the approval of the Health Section and Social Security Section of the Belgian Privacy Commission on respectively the 15th December 2015 and 12th January 2016.
Consent for publication
All personal data have been completely anonymized before the research could access them. No personal information would be subject to disclosure.
The researchers do not have any competing interest to declare.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
List of files and variables available in the EMPCAN database. (DOCX 18 kb)