Item number | Checklist item | Description | |
---|---|---|---|
1 | Rationale and objective of the study (i.e., research question) | Define the rationale and objective of the study by adopting PICO criteria to research studies focused on population health. | ☐ |
2 | Study design | Select the appropriate study design that could best address the proposed research question. | ☐ |
3 | Linked data sources | Select the required linked data sources to answer the proposed research question. | ☐ |
4 | Study population | ||
4.1 | Define the inclusion and exclusion criteria of the study population by taking into account age, sex and period of data collection. | ☐ | |
4.2 | Sample size | State the significance level of alpha and power based on the defined research question to calculate the sample size. | ☐ |
5 | Study outcomes | ||
5.1 | Main outcomes | Define the main outcomes by taking into account study population, health condition to be studied, exposure (intervention/risk factors, if relevant) and defined period of study. | ☐ |
5.2 | Level of estimation | Describe the level of estimation of health outcomes at the lowest possible granularity level (i.e., at community, metropolitan, departmental or regional levels). | ☐ |
6 | Data preparation | ||
6.1 | A. Data extraction | Extract data with required input variables from linked data set to a single file or a spreadsheet that could be converted according to the required format of the statistical software for data analysis. | ☐ |
6.2 | Coding of variables | Code the input variables, which are common in different linked data sets continuous or categorical or binary variables for required data analysis. | ☐ |
B. Data preparation to develop and apply a ML-algorithm | |||
6.3 | Identify and define the target groups for a given defined time window based on the outcome of interest. | ☐ | |
6.4 | Code the inputs variables, which are common in different linked data sets to continuous or categorical or binary variables for a given defined time window time. | ☐ | |
6.4 | Split of final data set into 80% training and 20% test data set. | ☐ | |
7 | Data analysis | ||
7.1 | A. Variables selection | Select variables after the removal of all variables with a variance equal to zero. | ☐ |
7.2 | Estimate the RelifExp score based on the relevance of each variable to the outcome of interest. | ☐ | |
B. Statistical techniques | |||
7.3 | I. Classical statistical techniques | Select an appropriate statistical technique to address the proposed research question according to the study objectives and the available data. | ☐ |
II. ML-techniques | |||
7.4 | Train various models and compare the performances of each model in terms of AUC curve (only for binary classifier). | ☐ | |
7.5 | Validate the model performance using k-fold cross-validation first on training data set, and then assess the model performance on test data set. | ☐ | |
7.6 | Select the final model based on specific performance metrics including sensitivity, specificity, PPV*, NPV*, F1-score and kappa. | ☐ | |
C. Sensitivity/uncertainty analysis | |||
7.7 | Perform a sensitivity analysis to identify the most influential parameters for a given output of a model. | ☐ | |
7.8 | Select an appropriate method to perform the sensitivity analysis. | ☐ | |
7.9 | Calculate the uncertainty in estimates using 95% CI* and describe the source of uncertainty (if relevant). | ☐ | |
D. Potential issues during data analysis | |||
I. Missing data | |||
7.10 | Identify the missing data in the given dataset. | ☐ | |
7.11 | Apply an appropriate technique for the imputation of missing values in the given data set. | ☐ | |
7.12 | II. Imbalanced target group in a given dataset | Apply an appropriate technique to create a balanced data set either using down sampling or over sampling approach. | ☐ |
7.13 | III. Bias and variance tradeoff | Find the most generalizable model to keep the balance between bias and variance. | ☐ |
8 | Study limitations | Describe the study limitations related to data sources (i.e., linkage, quality, access and privacy), study design, study population and statistical method used (if relevant). | ☐ |