Artificial Intelligence in Oncology - Supporting scientific research
UMC Utrecht
In short, Willeke Blokxs' research entails the following:
'Ambiguous melanocytic lesions are rare and difficult to classify without molecular test as either benign (nevus), intermediate, or malignant (melanoma), with important clinical implications. However, these tests are only available in highly specialized centers, are expensive and have long turn-around times, and are not feasible in case of insufficient tumor in a specimen. The UMCU Department of Pathology, the largest reference center for melanocytic tumors, combines unique diagnostic expertise and sophisticated molecular tests within this field (dr. Willeke Blokx, dr. Anne Jansen, Prof. Marijke van Dijk, dr. Gerben Breimer). The imaging group of TU Eindhoven (dr. Mitko Veta, prof. Josien Pluim) has ample experience in developing and implementing artificial intelligence (AI) in pathology practice.
For the validation part of the study there is a collaboration with the Department of Pathology of the LUMC (dr. Anne Roos Schrader).
Together, we aim to develop AI models to accurately classify ambiguous melanocytic lesions without the need for molecular testing.'
We have the pipeline working, a second dataset curated, and a primary dataset soon-to-be collected. A DTA with Lyon is signed, and we expect to receive additional > 400 cases of Spitz lesion to enrich our data set for training and validation of our AI model to predict the diagnosis and genetic alteration in (difficult to diagnose) melanocytic tumors based only on H&E stained slides, i.e., the primary aim of this project.
We have already collected the intended number of cases for phase 1, the training cohort, and phase 2, the internal validation cohort of the project. The cases in the first cohort are already fully characterized with all clinical and molecular data available in an SPSS file. For the cases in phase 2 we intend to import all these data within the upcoming months.
For the AI development we are currently curating a separate dataset of approximately 28.000 melanocytic lesions form the UMCU, derived from 2013-2020.
Simultaneously with the assembly of our datasets, we are establising a streamlined process that facilitates effortless large-scale model training and assessment, tailored to accomodate our extensive collection of melanocytic lesion data.
This pipeline includes software implementations for streaming the image data to the High Performance Computing (HPC) cluster, loading images with different filetypes using a unified interface, preprocessing the images at a lower magnification by segmenting the tissue cross-sections as well as pen markings, and tessellating the tissue regions to create image tiles, which serve as the input for the AI model. For the segmentation step, we have trained a separate model based on manual annotations which were acquired using a custom software tool.
Since the beginning of August 2022, we have started an effort to curate a second, internal dataset of whole slide images with corresponding clinical reports from cutaneous melanocytic lesions. This dataset consists of approximately 28.000 unique specimens, acquired between January 2013 and December 2020 at the Department of Pathology, of the University Medical Center in Utrecht. In contrast to the primary dataset with only ambiguous lesions, this collection includes many lesions that are clearly benign or malignant. Whereas the oldest specimens were diagnosed based on hematoxylin and eosin (H&E) stained slides only, immunohistochemical staining and molecular analysis have additionally been used for diagnosis in more recent years.
We have finished the case selection and correction steps for the second dataset, and are currently in the process of quality assurance with respect to the provided diagnoses. The next curation steps will include the selection of relevant slides and tissue annotation, for which we have already developed custom software tools.
The intended purpose of the second dataset is twofold:
- Firstly, to facilitate the development of a more widely applicable AI model for diagnosis. Such a model could, after careful validation, provide benefit as a tool for triaging or possibly as a second opinion for all melanocytic lesions.
- Secondly, to enhance the training set for the development of an AI model to predict genomic mutations based only on H&E stained slides, i.e., the primary aim of this project. Despite the lack of information about genomic mutations for most of these specimens, the available images alone can add value by using a model training procedure known as self-supervised learning. The advantage of this approach was for example demonstrated in recent work (1), where an AI model was first trained on a pan-cancer whole slide image dataset to learn a large variety of image patterns, followed by training on a smaller dataset for a specialized task. In effect we are aiming to build a foundational model for histopathology of melanocytic lesions that will serve as the basis for a wide range of diagnostic and prognostic models (one of the secondary aims of this project). After the second dataset has been curated, we plan to start the development of separate AI models for prediction of diagnoses and genetic mutations in (ambiguous) melanocytic lesions.