Clinical Prediction Models and Machine Learning

Clinical Prediction Models and
Machine Learning (WK80)

The purpose of a prediction model is to estimate the probability of the presence of a particular outcome as accurately as possible. Prediction models are often developed with clinical practice in mind, and involve combining information about individual patients to calculate an individual’s probability of illness or recovery. The model can then be presented in the form of a clinical predictive rule. General applicability – i.e. the accuracy of the prediction model when applied to new patients in the future – is another very important aspect.


Date:	6, 7, 8, 11, 12 January 2027	Tuition fee:	€1.750
City:	Amsterdam	Course coordinator:	M.W. (Martijn) Heymans, PhD
Language:	English	Learning method:	Lectures and computer practical
Examination:	Written exam with computer assignments (facultative)	Examination dates:	See Exams page
Number of EC:	2	Details:	Contact hours: 24

Date		Tuition fee:
6, 7, 8, 11, 12 January 2027		€1.750
City:	Amsterdam
Course coordinator:	M.W. (Martijn) Heymans, PhD
Language:	English
Learning method:	Lectures and computer practical
Examination:	Written exam with computer assignments (facultative)
Examination dates:	See Exams page
Number of EC:	2
Details:	Contact hours: 24

Apply for this course

About the course

Clinical Prediction Models and Machine Learning is a 4-day course. It consists of an intensive programme with partially interactive lectures, combined with computer-based practical work. Examples taken from clinical practice will be used for the computer-based work.

If a course is [Full], you can still register, and we will place you on a waiting list. We will contact you as soon as a place becomes available. You can then decide whether you still want to participate.

More information

Course description

Nowadays, access to data is becoming easier and easier and therefore the data sets are getting bigger and bigger. The problem when developing prediction models in these data sets include the difficulty of selecting the most important predictors from a large number of variables. If this is not done carefully, the quality of the prediction model can be adversely affected. Machine learning methods can be used to develop prediction models in these large data sets. Also, prediction models may be adjusted before they can be applied to new persons. All these issues are frequently overlooked or underestimated by clinicians and researchers.

The aim of the course is to provide better knowledge and understanding of the development of prediction models in smaller and larger data sets that are relevant to real-life practice. We will focus on common methods for selecting variables as backward selection but also more advanced Machine learning procedures as lasso regression and tree based methods as well as their pros and cons. Once prediction models have been developed, it is important to assess the quality of the prediction model. For example, we will look at whether the predictions of the model are accurate and will consider various ways of measuring performance by using measures for overall quality, discrimination and calibration. The question of applying the model to new (future) patients will also be addressed. An important element of this is investigating whether the performance of the prediction model deteriorates when it is applied to new patients. This component is entitled the validation of the prediction model and we will cover various techniques for internal and external validation of prediction models and ways to train and test the model by using bootstrapping and cross-validation.

The course consists of an intensive programme of partly interactive lectures, combined with computer-based practical work. Examples taken from clinical practice will be used for the computer-based work.

Day 1

The development and quality of prediction models, including:

The characteristics of a prediction model
The most frequently used methods for selecting variables
The pros and cons of common methods for selecting variables
Sample size recommendations to develop a prediction model
Different measures of quality and how to interpret them (including explained variation, calibration, discrimination, ROC curve)
Introduction to Spline regression models.
Introduction R software

Day 2

Introduction to the validation of prediction models

The linear predictor (lp)
Optimism and shrinkage
Adjusting the intercept
The internal and external validation of prediction models
Train and test datasets (Bootstrapping and Cross-validation)
Adjusting the slope
External validation
Generalizability of prediction model (Case-mix, different regression coefficients)
Presentation formats of prediction models

Day 3

Updating of prediction models

Reasons for generalizability problems
Updating the intercept and slope
Comparing Prediction models
Adding a new Updating of Prediction models variable
Reclassification tables
A prediction model for survival data

Day 4

Developing prediction model with many variables

Developing prediction model with many variables
Cross-validation
Lasso Regression
Model stability analyses
Tree based methods
Random forest

The student can identify the strengths and weaknesses of a prediction model developed using regression models and advanced Machine Learning.
The student can analyse, in statistical software, the methods used to assess the quality of a prediction model, such as discrimination, calibration, clinical usefulness, and overall model fit.
The student can interpret the methods used to assess the quality of a prediction model, such as discrimination, calibration, clinical usefulness, and overall model fit.
The student can develop a prediction model using the statistical software program R, using regression models and advanced Machine Learning methods.
The student can interpret the developed prediction model based on regression models and advanced Machine Learning methods.
The student can internally validate a prediction model in R using cross-validation and bootstrapping, and externally validate it using another dataset, including adjustment of the intercept and slope of the prediction model.
The student can interpret the results of internal validation of a prediction model using cross-validation and bootstrapping, as well as external validation by applying an existing prediction model in an external dataset, taking optimism and shrinkage into account.

Target audience

The course is designed for PhD-students, practitioners and applied researchers working in the field of epidemiology, medicine, public health, psychology, human movement sciences.

The course is intended for anyone who wants to know more about prediction models, for example because they want to be able to better assess a research proposal or article, or because they are developing or wanting to make a prediction model themselves. It is also important to be able to make a proper assessment of the value of a prediction model for practice.

Course prerequisites

Participants are assumed to have prior knowledge of the following concepts:

– Basic statistical tests, such as t-tests and regression analyses.

– Basic SPSS commands. Knowledge of R(Studio) is not a prerequisite.

Course material

The course materials (lectures, assignments, feedback of the assignments etc) are available on Canvas, our digital learning environment. The documents will remain available on Canvas for at least one year.

In order to do the computer practicals, you will need:

1. R and R studio. R and R studio can be downloaded for free at https://cran.r-project.org/.

and

2. SPSS. If you do not have SPSS installed yet, you can purchase it on Surfspot. Another option is to use IBM’s 30-day free trial: SPSS Software | IBM

Literature

Literature will be provided during the course.

Epidemiology Master’s students are required to do the exam in order to complete this course. For other students (not enrolled in Epidemiology Master), the exam is optional and costs €160,- per registration.

You can register for the exam on the Exams page. Registration will close 3 weeks prior to the exam.

Please note that you need to pass the exam in order to receive credits (EC).

A certificate of participation will be granted to all students who have attended at least 80% of the classes. The number of contact hours is mentioned on the certificate.

Only for Dutch medical specialists!

On the final day of the course, you need to sign the attendance list if you wish to obtain KNMG accreditation credits (PE-points).

To qualify for these credits, there is an attendance requirement of 100%.

M.W. (Martijn) Heymans, PhD

Epidemiology and Data Science, Amsterdam UMC

Liza de Groot, MSc

Epidemiologie en Data Science, Amsterdam UMC

Clinical Prediction Models and
Machine Learning (WK80)