Artificial Intelligence in Oncology - Supporting scientific research
Amsterdam university medical centers - location Vumc
Rare cancers pose a major problem for machine learning algorithms: most genomic studies on rare cancers contain data on a relatively small number of patients and a large number of genomic features (e.g. genes). Such a setting is challenging for machine learners, because these may overfit, or fail to find relevant signal. Our aim is to steer the machine learners in the right direction. For that, we make use of vast amounts of complementary data (co-data) on the features, as available in online repositories. We propose to build a well-interpretable tree-based learner that unites strong elements of machine learning, statistics and biology: it accounts for complex molecular interactions, while improving predictive performance by estimating feature weigths using biological co-data and incorporating these weigths in the learner. We focus on prognosis for three rare disease entities of lymphoma cancer using a variety of genomics data. The project is a collaboration between: prof. Mark van de Wiel (PI), dr. Thomas Klausch and prof. Daphne de Jong, all at Amsterdam UMC.
For many medical applications, in particular for rare cancers, data is scarce in terms of number of patients. Machine learning algorithms are data hungry, so we need to feed them with complementary data to make them predict well. We devised a unique prediction algorithm that can incorporate such complementary data for a popular class of machine learners: decision trees. We showed that this can improve predictive performance and may render a better set of selected genomic markers. In addition, we tackled the problem of transferring our prediction models to other domains, e.g. one trained on data from academic centers to one that can be applied to patients in non-academic hospitals. These methods were successfully applied to several data sets from a particular type of lymphoma cancer. Software supports our methods and is made available via public repositories for others to freely benefit from those novel methods.
First, we have published a paper on using learning curves to evaluate predictors. Next, we finished the development of co-BART, an algorithm based on Bayesian additive regression trees that invokes co-data to improve predictive performance and feature selection. This algorithm has been succesfully applied to the lymphoma cancer setting, for which we show that the co-data approach improves prediction of 2yrs survival using molecular features and classical predictors. Finally, we have started to investigate domain adaptation techniques to improve generalizability of learners to (somewhat) other patient populations, as this is relevant in medical settings.
This year (2022) we applied our recently developed method, called Learn2Evaluate, to lymphoma data. This data consists of progression free survival measurements on 220 treated DLBCL patient. Additionally, DNA markers, such as mutations, copy number variations, and mutations, were measured. Learn2Evaluate shows that these DNA markers are not very predictive for progression free survival, also when clinical variables are included. However, our method indicates that collecting more samples may strongly boost the predictive performance. The new samples will be made available soon by collaborating clinicians.
We also made progress in the development of co-BART, an algorithm that incorporates external information about the DNA markers. Simulations have shown promising results in terms of variable selection and predictive performance compared to the standard BART algorithm.
At this moment, the brief summary of progress is only available in Dutch. You can find the Dutch summary here.