Amsterdam university medical centers - location Vumc
Co-data random forest learning for rare tumors
Rare cancers pose a major problem for machine learning algorithms: most genomic studies on rare cancers contain data on a relatively small number of patients and a large number of genomic features (e.g. genes). Such a setting is challenging for machine learners, because these may overfit, or fail to find relevant signal. Our aim is to steer the machine learners in the right direction. For that, we make use of vast amounts of complementary data (co-data) on the features, as available in online repositories. We propose to build a well-interpretable tree-based learner that unites strong elements of machine learning, statistics and biology: it accounts for complex molecular interactions, while improving predictive performance by estimating feature weigths using biological co-data and incorporating these weigths in the learner. We focus on prognosis for three rare disease entities of lymphoma cancer using a variety of genomics data. The project is a collaboration between: prof. Mark van de Wiel (PI), dr. Thomas Klausch and prof. Daphne de Jong, all at Amsterdam UMC.
Brief summary of progress / results 2022
This year (2022) we applied our recently developed method, called Learn2Evaluate, to lymphoma data. This data consists of progression free survival measurements on 220 treated DLBCL patient. Additionally, DNA markers, such as mutations, copy number variations, and mutations, were measured. Learn2Evaluate shows that these DNA markers are not very predictive for progression free survival, also when clinical variables are included. However, our method indicates that collecting more samples may strongly boost the predictive performance. The new samples will be made available soon by collaborating clinicians.
We also made progress in the development of co-BART, an algorithm that incorporates external information about the DNA markers. Simulations have shown promising results in terms of variable selection and predictive performance compared to the standard BART algorithm.
Brief summary of progress / results 2021
At this moment, the brief summary of progress is only available in Dutch. You can find the Dutch summary here.