Mark van de Wiel

Amsterdam university medical centers - location Vumc

Co-data random forest learning for rare tumors

Rare cancers pose a major problem for machine learning algorithms: most genomic studies on rare cancers contain data on a relatively small number of patients and a large number of genomic features (e.g. genes). Such a setting is challenging for machine learners, because these may overfit, or fail to find relevant signal. Our aim is to steer the machine learners in the right direction. For that, we make use of vast amounts of complementary data (co-data) on the features, as available in online repositories. We propose to build a well-interpretable tree-based learner that unites strong elements of machine learning, statistics and biology: it accounts for complex molecular interactions, while improving predictive performance by estimating feature weigths using biological co-data and incorporating these weigths in the learner. We focus on prognosis for three rare disease entities of lymphoma cancer using a variety of genomics data. The project is a collaboration between: prof. Mark van de Wiel (PI), dr. Thomas Klausch and prof. Daphne de Jong, all at Amsterdam UMC.

Brief summary of progress / results May 2023

First, we have published a paper on using learning curves to evaluate predictors. Next, we finished the development of co-BART, an algorithm based on Bayesian additive regression trees that invokes co-data to improve predictive performance and feature selection. This algorithm has been succesfully applied to the lymphoma cancer setting, for which we show that the co-data approach improves prediction of 2yrs survival using molecular features and classical predictors. Finally, we have started to investigate domain adaptation techniques to improve generalizability of learners to (somewhat) other patient populations, as this is relevant in medical settings.

Brief summary of progress / results June 2022

This year (2022) we applied our recently developed method, called Learn2Evaluate, to lymphoma data. This data consists of progression free survival measurements on 220 treated DLBCL patient. Additionally, DNA markers, such as mutations, copy number variations, and mutations, were measured. Learn2Evaluate shows that these DNA markers are not very predictive for progression free survival, also when clinical variables are included. However, our method indicates that collecting more samples may strongly boost the predictive performance. The new samples will be made available soon by collaborating clinicians.

We also made progress in the development of co-BART, an algorithm that incorporates external information about the DNA markers. Simulations have shown promising results in terms of variable selection and predictive performance compared to the standard BART algorithm.

Brief summary of progress / results year 1

At this moment, the brief summary of progress is only available in Dutch. You can find the Dutch summary here.

Granted applications 2019

Mark van de Wiel

Co-data random forest learning for rare tumors

Brief summary of progress / results May 2023

Brief summary of progress / results June 2022

Brief summary of progress / results year 1