CAFCW24

FNL - Cancer Data Science Initiatives Team

Tracks

Integration of Advanced Computational Approaches in Colorectal Cancer (CRC) Research

Abdelouahab Dehimat

Vote for this poster
Abstract
Colorectal cancer (CRC) is a major global health concern because of its high rates of illness and death. Recent advances in computational and single-cell technologies have revolutionized our understanding of the molecular and cellular characteristics of CRC. This research highlights the innovative methods and technical capabilities employed in current CRC studies, emphasizing their originality and cross-disciplinary integration. The studies examined utilize cutting-edge single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics, offering unprecedented insights into the tumor microenvironment (TME). These techniques enable the precise identification and characterization of diverse cell populations within CRC tumors, including cancer stem cells (CSCs), immune cells, and fibroblasts.

Utilizing advanced technologies has allowed researchers to uncover unique metabolic, immunophenotypic, and transcriptional characteristics within different cell subtypes, highlighting the complex diversity of the tumor microenvironment (TME). The integration of high-throughput sequencing data with computational models has played a crucial role in these investigations. This has involved creating prognostic risk models based on single-cell data and building gene regulatory networks (GRNs) for CRC. Computational tools such as CIBERSORTx and CMScaller have been employed to measure cell subpopulation fractions and determine consensus molecular subtypes (CMS), respectively. These models have been instrumental in pinpointing key transcription factors like ERG and in clarifying the roles of specific gene regulatory elements in CRC progression. The main research areas involve studying how metabolic reprogramming and immune evasion mechanisms function within the CRC TME. Recent studies have shown how lipid metabolism and immune suppression are controlled in various cell subtypes, suggesting potential therapeutic targets like RPS17. Furthermore, the discovery of epigenetic regulators and chromatin accessibility patterns has provided new insights into CRC subtypes, especially in distinguishing between iCMS and CIMP phenotypes. This research not only highlights the technical strengths of combining single-cell technologies with computational models but also emphasizes the potential for these approaches to lead to personalized treatment strategies. In conclusion, our work underscores the importance of computational biology in advancing CRC research, offering a multi-dimensional understanding that could pave the way for more effective and personalized interventions. By addressing the challenges of data integration and interpretation, this research opens new avenues for therapeutic development and better clinical outcomes for CRC patients.
Presented by
Abdelouahab Dehimat <a-ouahab.dehimat@univ-msila.dz>
Institution
Sciences of Nature and Life Department, Faculty of Sciences, Mohamed BOUDIAF University -PB 166 M'sila 28000, Algeria
Hashtags
#Computational_Biology, #Bioinformatic, #Cancer_Research, #CRC

Alcott: A Convolutional Neural Network to Predict Multimeric Interactions in HIV-1 Neural Infection

Anna Mohanty

Vote for this poster
Abstract
There are currently no targeted drugs for HIV-1 neural infection, which affects at least half of acutely infected patients. Furthermore, key accessory proteins of HIV-1 immune evasion –Nef and Fyn– share a multimer structure with dragline spider silk. The original aim of this experiment was to determine if the digestive enzymes used by golden orb-weaver spiders to digest their silk could be used to inhibit HIV-1 immune evasion. However, enzymatic pharmaceuticals have low success rates because of low binding affinity to non-specific substrates. Therefore, to determine which digestive enzyme would have the highest binding affinity (K-score) to the target protein, Nef, a convolutional neural network was designed using alignment motifs and K-score values from the NetMHCII2.3 database. First, 10,000 alignment motifs of incidental multimers were used to train a text predictive model to identify which patterns in a large protein alignment vector would yield the highest K-score. Using these identified motifs, a 2D convolutional neural network was trained on 25,000 data points from the same database, specifically the solubility and entropy in relation to binding affinity. The network showed an 83% accuracy rate and was used to identify the N. clavipes homolog of Cathepsin L (CTSL) as the enzyme with the highest predicted affinity to Nef. Gel electrophoresis displayed significant cleavage of Nef by CTSL (specifically maintaining key calcium binding domains necessary for mediated apoptosis). Lastly, a completed drug was designed, incorporating a fibroin-based nanoparticle crystal for targeted/pH-sensitive drug delivery and tenascin C to aid in drug cycling.
Presented by
Anna Mohanty <annamohantyg@gmail.com>
Institution
Department of Biochemistry, Marymount University
Hashtags
#virology #hiv #naturallangaugeprocessing

Optimal Prescriptive Treatments for Ovarian Cancer with Genetic Data

Alkiviadis Mertzios, Matea Gjika, Xidan Xu, SamayitaGuha, Neelkanth M. Bardhan, SubodhaKumar, Angela Belcher, Georgia Perakis

Vote for this poster
Abstract
Presented by
Alkiviadis Mertzios <mertzios@mit.edu>
Institution
Massachusetts Institute of Technology
Hashtags

A Novel Microservices Architecture for Digital Twins

Jeremy Balian, Jun Deng , Anvi Sud, Shreya Tiwari, Shivani Maffi, Koninika Ray, Anil Srivastava, Jeevan Saini, Haresh KP, Mariano Vazquez

Vote for this poster
Abstract
Digital Twins (DTs) have emerged as a transformative technology across various fields, offering significant potential for cancer research through real-time simulation and analysis of biological systems. By creating a virtual representation of a patient's biological data, DTs enable personalized treatment plans and real-time monitoring of disease progression. While microservices architectures (MSA) have been successfully implemented in domains like civil engineering and transport, their application in healthcare, particularly in cancer research, remains relatively unexplored. We present a novel microservices architecture system design specifically for Digital Twins in cancer research, providing real-time updates on patient data. Our architecture leverages TileDB, a scalable, cloud-native storage engine that supports multi- dimensional arrays for efficient storage and management of vast heterogeneous data. The microservices communicate through a message broker, such as Apache Kafka, to ensure reliable data exchange and event-driven processing. The system includes a reinforcement feature that verifies the accuracy of model-generated values, enhancing simulation reliability. The design integrates TileDB Cloud for data storage ensuring high availability and fault tolerance, using the FHIR API to standardize data from various sources including wearables, EHR, and patient generated data. The analytics component employs TensorFlow Serving API through TileDB for scalable and flexible deployment of machine learning models. This setup enables real-time inference and analysis, providing actionable insights based on standardized data. The system architecture includes microservices for data ingestion, preprocessing, model training, and inference to ensure scalability and maintainability. Our goal is to develop this architecture into open-source software, encouraging collaboration and innovation within the research community. By advancing the capabilities of Digital Twins in cancer research, our system promotes precise, personalized treatments, potentially leading to improved patient outcomes.
Presented by
Anvi Sud
Institution
TileDB , Yale University , Open Health Systems Laboratory (OHSL) , All India Institute Of Medical Sciences Delhi, Centro Nacional de Supercomputación
Hashtags
Chat with Presenter
Available November 18, 4:00 pm EST
Join the Discussion

Bridging Scales with Healthy AI: Transforming Cancer Treatment through Multiscale Integration of Technology and Biology

Debsindhu Bhowmik‡, Chris Stanley, John Vant, Paul Inman, John Gounley, Anuj Kapadia

Vote for this poster
Abstract
Bridging Scales with Healthy AI: Transforming Cancer Treatment through Multiscale Integration of Technology and Biology

‘Healthy AI’ embodies a transformative approach that merges technological innovation with biological systems, underscoring the significance of different length scales in advancing cancer treatment and drug designing. This integrated framework highlights how AI can seamlessly connect cellular, molecular, and systemic levels, driving progress in personalized medicine and drug discovery.

Cancer treatment necessitates a deep understanding of interactions across various scales. At the cellular scale, Agent-Based Modeling (ABM) has been pivotal in simulating the complex dynamics between cancer cells and the immune system inside the tumor microenvironment. ABM offers crucial insights into tumor heterogeneity and immune responses however frequently faces challenging situations due to excessive computational costs and time constraints, which limit its scientific applicability. To address these challenges, AI-driven techniques, supported via HPC, provide a transformative solution. Integrating AI with ABM allows the execution of speculative events simulations, generating comprehensive datasets that capture tumor behavior under different unique conditions. AI models trained on those data can discover hidden characteristics and predict treatment outcomes with high accuracy. For example, AI can examine tumor morphologies from histopathological images to predict responses to various therapies, accelerating the development of personalized treatments and improving patient results. This integration of AI into ABM exemplifies how technology can enhance our understanding and manipulation of biological systems at the cellular level, bridging the gap between complex biological interactions and effective treatment strategies.

Moving to the molecular scale, Healthy AI drives innovation in designing drugs, which is essential for drug discovery. Traditional molecular design approaches often rely on rigid, predefined rules that can limit the exploration of novel compounds. To overcome these limitations, we are leveraging language models (LMs) with critic component and genetic algorithms (GAs) in a unified framework. LMs with critic facilitate the automated generation of molecular structures, while GAs simulates evolutionary processes to enhance structural diversity and optimize molecular properties. This approach enables the discovery of new molecules with desirable characteristics, demonstrating how AI can advance molecular design and connect it to practical applications in therapy.

The synergy between AI-driven cancer immunotherapy and advanced molecular scale drug design highlights the multiscale nature of ‘Healthy AI’. By connecting detailed cellular interactions with innovative molecular scale design, this approach accelerates personalized therapy development and drives new discoveries in drug designing. This comprehensive strategy illustrates how AI can bridge different scales, from cellular dynamics to molecular innovation, to address the complexities of cancer treatment and beyond.

In summary, our ‘Healthy AI’ platform represents a paradigm shift that integrates technological innovation with the complexities of biological systems across multiple scales. By harnessing AI to model and manipulate interactions from cellular to the molecular to the systemic level, this approach offers new possibilities for effective cancer therapies and groundbreaking discoveries, ultimately transforming the future of healthcare and scientific research.

Presented by
DEBSINDHU BHOWMIK <bhowmikd@ornl.gov>
Institution
Oak Ridge National Laboratory, Oak Ridge, TN 37830.
Hashtags
Chat with Presenter
Available November 18 03:00 pm - 03:30 pm EST
Join the Discussion

Increasing Confidence in AI Models in the Medical Field​

Jake Gwinn, Justin M. Wozniak, Thomas Brettin​

Vote for this poster
Abstract
This study performed a statistical and graphical analysis of the results of a large deep learning study run on a merge of the CCLE, CTRPv2, gCSI, GDSCv1, and GDSCv2 datasets and the Uno model for cancer drug response prediction (DRP) on ALCF Polaris. Some initial results are presented here, although a much larger analysis has been performed. Previous studies [1] have shown that deep learning is a promising avenue for being able to predict qualities such as AUC, which scores tumor growth via Area Under the Curve; lower is better for treating cancer. In our study, Uno was used to predict the AUC of drugs using (drug, cell line) combinations. This model used a LOO (leave-one-out) protocol which is defined as training the model using all the data except one drug and then using the model to predict the AUC of that drug. While investigating the results of one such model (Uno) being trained using the LOO (leave-one-out) protocol, we discovered that the underlying data was highly skewed; most of the observations were much closer to an AUC of 1. As a result, the model better predicted drugs with AUCs closer to 1 and struggled to predict drugs that were further away from 1 (Figure 1). To demonstrate this, we highlight the 3 highest error drugs to show the large difference between their predicted and actual AUC. This suggests that most of the egregious errors in the model might be due to the fact that the dataset is so skewed. To further investigate this effect we grouped the data by the drug and the sample and transformed the AUC to be normally distributed using Quantile Transformation. After this we computed the z-scores and took the absolute value, this is used as our out-of-distribution (OOD) measurement. The accuracy of an observation is defined as the absolute value mean of the AUC error (predicted AUC−actual AUC) for each (drug, sample) pair. We found that after about 1 OOD there is a positive, linear relationship between OOD and Accuracy (Figure 2). We found a small number of what we call High Error Drugs (HEDs) that were the object of further study. These drugs typically were over-estimated by the model; meaning that drugs that actually effective in inhibiting cancer (low AUC) were predicted to be less effective in inhibiting cancer. This is a very important error to fix as we do not want to predict a drug to not be useful against a certain cancer when in reality it could possibly help a patient battle cancer more effectively. Instead of using Mean Absolute Error (MAE) as the training criterion, we propose to train the model using a different loss metric, the F-Beta score, and optimize via Recall. This may enable us to predict true positives (drugs with low AUC score) more accurately. This study performed multiple statistical analyses of a DRP LOO study; in this presentation, we highlighted the difficulty of predicting low AUC drugs using a statistical and visual analysis of predicted results.
Presented by
Jake Gwinn
Institution
Argonne National Labratory
Hashtags

Use of ATOM Modeling PipeLine (AMPL) and Generalized Generative Molecular Design (GGMD) to Discover Natural Product-Like Compounds to Target Brd4-BD1

Justin Overhulse1, Arjun Parambathu2, Jay Patel2, Kushagra Srivastava2, Leonardo Pierre3, Jiayi Yang4

Vote for this poster
Abstract
The discovery of inhibitors remains a critical challenge in drug discovery. Traditional methods are often time-consuming and resource intensive. In-silico methods such as docking can also be challenging if a binding pocket is shallow, very dynamic, less defined, or containing structural waters that affect the binding of ligands. In this study we leveraged both the ATOM Modeling PipeLine (AMPL) (https://github.com/ATOMScience-org/AMPL) and the generalized generative molecular design (GGMD) (https://github.com/CBIIT/GGMD/tree/main) pipeline to produce new inhibitors of the bromodomain containing protein 4 – bromodomain 1 (Brd4-BD1) enzyme. Brd4 is a well studied target against cancer and cardiovascular disease that contains two tandem binding domains, BD1 and bromodomain 2 (BD2), which can be selectively targeted. Several inhibitors developed for Brd4 are pan-BD inhibitors, targeting both BD1 and BD2. These pan-BD inhibitors have been shown to have severe side effects during clinical trials. Recent studies have shown that BD1 and BD2 have different biological functions, where targeting BD1 specifically can lead to similar anti-cancer efficacy shown in the pan-BD inhibitors. Identifying selective BD1 inhibitors could lead to less severe side effects exhibited with pan-BD inhibitors. A limitation of chemical libraries used with in-silico model training and building is the compounds not being biologically relevant. To develop more biologically relevant compounds, the natural product space continues to be an attractive avenue for new scaffold generation and drug discovery. These natural product-like compounds are advantageous when searching for novel scaffolds due to high structural diversity and various bioactivities. Advances in the drug discovery field have enabled the ability to discover and synthesize new natural product-like compounds in less time and at a lower cost. Brd4-BD1 inhibitors were collected from the public database BindingDB.org, curated, and used to train several AMPL models. After hyperparameter optimization, a random forest production model was prepared using the extended-connectivity fingerprints, exhibiting an R^2 score of 0.927. This optimized production model was used in the GGMD as a scorer to develop new selective inhibitors. An autoencoder (AE) using the Junction Tree Variational AutoEncoder (JTVAE) (https://github.com/CBIIT/JTVAE) was trained using 220K natural product-like compounds from the Coconut Database. Using a roulette style selection type within the GGMD and the natural product-like AE model, new compounds were generated over 10 epochs, with improved fitness targeting Brd4-BD1. Two RDKit scorers (https://github.com/rdkit/rdkit) were used to calculate the synthetic accessibility (SA_Score) and the natural product likeness score (NP_Score) to ensure the compounds are synthesizable and continue to have the complexity of natural products. The stereochemistry was removed during the training and compound generation to limit the complexity of the data. This project is a use-case example of using an AMPL model to predict Brd4-BD1 inhibitors as a scorer in the recently developed GGMD pipeline.
Presented by
Justin Overhulse
Institution
Frederick National Laboratory for Cancer Research1, University of Delaware2, Delaware State University3, University of Southern California4
Hashtags
#Bromodomain #Generativemoleculardesign #AMPL #GMD

Software Application Development for Medical Data Migration and Integration to NIDAP (NIH Integrated Data Analysis Platform)

Zhang LZ, Ning H, Zhuge Y, Cheng J, Li B, Chappidi S, Tasci E, Miller RW, Krauze A

Vote for this poster
Abstract
In clinical settings, there is a significant need for doctors and researchers to efficiently export medical data for both clinical and research purposes. Taking our Radiation Oncology Branch at the NCI, NIH (National Institutes of Health) as an example, the treatment planning system (the Eclipse Platform by Varian Medical Systems) is currently used for patient data management. However, exporting patient data is a time-consuming and tedious process. Doctors and researchers must open each patient individually, review each plan, and manually export the data one by one. The Visual Scripting module integrated within the Eclipse platform offers robust functionality for patient data management, but its use is limited to individual patient files, with no option for batch processing. To address this need, we aim to develop a software application that can batch process patient data and integrate it into the NIDAP (NIH Integrated Data Analysis Platform) server, where it can be retrieved using text prompts. The software application is being designed as a standalone system and developed in C# using the MVVM (Model-View-ViewModel) architecture, with development planned in five phases: Phase 1: Establishing a connection between the local clinical client computer and the Varian medical database. This foundational step is critical due to access limitations associated with the Varian database. Phase 2: Enabling batch export of relevant dose planning metrics such as DVH (dose-volume histogram) data for individual patients without needing to open the patient file in Eclipse. The DVH data for all targets (organs) will be exported and saved to individual Excel worksheets. Phase 3: Enabling batch export of DVH data for multiple patients. The exported files will include patient IDs as part of their filenames. Phase 4: Designing and developing a user-friendly interface in C# that allows users to upload patient lists and specify directories for saving exported files. Phase 5: Integrating the exported data into NIDAP. The files will be uploaded to the NIDAP server, and a pipeline will be developed to facilitate data integration.

This software application will be a powerful tool for both clinical and research purposes, significantly improving the efficiency of medical data management and analysis.

Presented by
Longze Zhang
Institution
National Institutes of Health
Hashtags

Evaluating the Efficacy of Synthetic Pathology Reports

Patrycja Krawczuk , Christopher Stanley, John Gounley, Heidi A. Hanson

Vote for this poster
Abstract
Research in sensitive domains like medicine is restricted by privacy concerns and regulations.​ Synthetic data offers a solution to overcome these challenges [1,2].​This work introduces a workflow for generating and evaluating synthetic pathology reports.​ 100,000 reports associated with 5 primary cancer sites (breast, ovary/fallopian, melanoma, lung and colorectal) and 220 histology types from the NCI SEER Program were used to create the synthetic dataset.​
Presented by
Patricia Krawczuk
Institution
Oak Ridge National Laboratory
Hashtags
#synthetic #data #llama3

Artificial intelligence-powered drug discovery: A case study on LD50 prediction and molecular optimization

Logan Hallee, Nikhil Rao, Nikolaos Rafailidis, Tom Le, Colin Horger, Herman Singh, Naomi Ohashi, Pinyi Lu

Vote for this poster
Abstract
Background: To accelerate the drug discovery process, automated optimization and design of compounds with desired properties have been a long-time goal. Recent advances in artificial intelligence (AI)-based prediction of chemical properties and learning-based generative design are enabling new approaches to achieve this goal. We present the generalized generative molecular design (GGMD) platform, a scalable framework to optimize multiple parameters simultaneously over large populations of molecules. We applied the GGMD framework in a molecular optimization case study to design molecules with desirable lethal dose (LD 50 ), an important toxicity property of drugs. The larger the LD 50 value, the lower the toxicity.

Methods: The ATOM Modeling PipeLine (AMPL), a data-driven modeling pipeline for drug discovery, was applied to build AI (regression) models to predict LD 50 . The models were trained, validated, and tested using the LD 50 dataset obtained from the Therapeutic Data Commons (TDC). We evaluated random forest, neural network, and XGBoost models over a large range of hyperparameters in terms of mean absolute error (MAE) and coefficient of determination (R 2 ). A production model was created based on the best hyperparameters and evaluated on additional data generated by EPA Toxicity Estimation Software Tool, NIH Collaborative Acute Toxicity Modeling Suite, and National Toxicology Program, which was applied as a scoring function in the GGMD framework. The production model was also tested on precisionFDA, a secure, collaborative, high-performance computing platform that builds a community of experts around the analysis of biological datasets in order to advance precision medicine. In our case study, GGMD used a junction tree variational autoencoder mapping structures to latent vectors, along with a genetic algorithm operating on latent vectors, to search a diverse molecular space for molecular optimization toward the design criteria. We applied the GGMD framework to design molecule with desirable LD 50 on the Delaware Advanced Research Workforce and Innovation Network (DARWIN), a big data and high-performance computing system designed to catalyze Delaware research and education.

Results: Among those evaluated AMPL models, the small autoencoder-like neural network model (Layer sizes: (1972, 66, 1940)) performed the best to predict LD 50 with a test MAE, 0.587. The model performance is comparable to the second-best model’s as published in the TDC leaderboard (https://tdcommons.ai/benchmark/admet_group/19ld50/). We were able to apply the GGMD framework to design molecules with reduced toxicity by optimizing molecular structures, while the framework could be also used to design compounds with high toxicity, indicating the dual roles of AI-powered drug discovery.

Summary: By using AMPL and GGMD, we created a predictive model to predict a proxy for toxicity, LD 50 , and generated new molecules with desirable toxicity properties, which demonstrated the potential of predictive and generative modeling in increasing the throughput of drug discovery. Meanwhile, we recognized the dangerous potential of misuse of AI-powered drug discovery, such as designing compounds with high toxicity, and revealed the need for efforts to ensure healthy AI development, use, and oversight.
Presented by
Pinyi Lu <pinyi.lu@nih.gov>
Institution
Center for Bioinformatics & Computational Biology, Department of Biomedical Engineering, University of Delaware; National Center for Atmospheric Research; MassMatrix Inc.; Frederick National Laboratory for Cancer Research
Hashtags

Converting multi-omics data into multi-channel images for drug response modeling using convolutional neural networks

Priyanka Vasanthakumari1, Yitan Zhu1, Thomas Brettin2, Oleksandr Narykov1, Alexander Partin1, Maulik Shukla1, Nicholas Chia1, Fangfang Xia1, and Rick L. Stevens2,3

Vote for this poster
Abstract
The advent of multi-omics has revolutionized cancer research by providing comprehensive insights into the molecular mechanisms underlying cancer progression and treatment responses. We integrate gene expressions, copy number variations, and deleterious mutations to predict drug responses of cancer cell lines using deep learning. To predict the drug response based on multi-omics data, we employed an extended implementation of the Image Generator for Tabular Data (IGTD) [1] algorithm to convert tabular multi-omics data into multi-channel image representations and then utilized the Convolutional Neural Networks (CNNs) to model drug response. The IGTD algorithm puts similar genes at adjacent pixel positions and dissimilar genes far apart in the generated images. Different weights can be given to different omics data types when calculating the overall gene similarity based on their similarities in individual omics data types. 1936 genes, including “landmark” genes well-representing cellular transcriptomic changes identified in the LINCS project and cancer-related genes collected from OncoKB, GDSC, and COSMIC databases, were selected for the analysis. The omics data of these genes were converted into 44×44 three-channel images. To represent drugs, we used 1600 Mordred descriptors, which were converted into 40×40 single-channel images using the IGTD algorithm. Our model consists of two convolutional layer subnetworks to independently encode multi-omics and drug image data before merging their embeddings to predict drug response. The model was trained and tested on the Cancer Therapeutics Response Portal (CTRP) version 2 dataset to predict drug response measured by the area under the dose-response curve. An ablation study was conducted by varying the weight ratios of gene expressions, mutations, and copy number variations from 0 to 1 in increments of 0.1 for image generation, resulting in 66 different image datasets for model training and evaluation. These image datasets included not only three-channel image datasets but also two-channel or single-channel image datasets, depending on whether the weight ratios of some omics types were 0. For each combination of these ratios, experiments were repeated using 50 different data partitions for training, validation, and testing to ensure robust model evaluation. The results of the ablation study indicated that the prediction performance (measured by R2) was comparable for models trained on similar weight ratios of gene expression and copy number variation across all datasets. However, the prediction performance decreased as the contribution of mutation features increased. The best prediction performance of 0.8167 was obtained with omics images generated by weight ratios of 0.4, 0.5, and 0.1 for gene expression, copy number variation, and mutation, respectively. While the lowest prediction performance of 0.7752 was obtained with omics images generated using only mutation data. We demonstrate that converting tabular multi-omics and drug data into images and utilizing CNNs for response modeling can effectively capture the complex relationships inherent in the data and enable accurate predictions of anti-cancer drug responses.

References Zhu, Y., Brettin, T., Xia, F., Partin, A., Shukla, M., Yoo, H., Evrard, Y.A., Doroshow, J.H. and Stevens, R.L., 2021. Converting tabular data into images for deep learning with convolutional neural networks. Scientific reports, 11(1), p.11325.
Presented by
Priyanka Vasanthakumari
Institution
1Data Science and Learning Division, Argonne National Laboratory, Lemont, IL; 2Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL; 3Department of Computer Science, The University of Chicago, Chicago, IL
Hashtags
Chat with Presenter
Available November 18th 1:00pm to 5:00pm
Join the Discussion

Using a Mechanism-Based Mathematical Model and Multiparametric MRI to Predict Response to Therapy of I-SPY 2 Breast Cancer Patients

Reshmi J. S. Patel, Chengyue Wu, Casey E. Stowers, Rania M. Mohamed, Jingfei Ma, Gaiane M. Rauch, Thomas E. Yankeelov

Vote for this poster
Abstract
INTRODUCTION Accurate and early prediction of response to neoadjuvant therapy (NAT) is essential to adjust treatment to improve outcomes for locally advanced breast cancer (LABC) patients [1]. We previously developed a mechanism-based mathematical model that captures tumor heterogeneity and makes accurate patient-specific predictions [2]. The model achieved concordance correlation coefficients (CCC) of 0.95 and 0.94 between the observed and predicted changes in total tumor cellularity (ΔTTC) and tumor volume (ΔTV), respectively, in a dataset of 56 triple-negative breast cancer patients [3]. Here, we show this approach is generalizable by applying it to the multi-site, multi-subtype I-SPY 2 trial dataset [4].

METHODS The I-SPY 2 clinical trial for LABC patients acquired dynamic contrast-enhanced (DCE) and diffusion-weighted (DW) magnetic resonance imaging (MRI) scans before (V1), three weeks into (V2), and after (V3) the first NAT course [4]. Our subset of 91 patients includes 42 hormone receptor-positive/human epidermal growth factor receptor 2-negative (HR+/HER2−), 22 HER2+, and 27 triple negative breast cancer patients.

Our mathematical model is a reaction-diffusion partial differential equation solved in space and time via the finite difference method. The rate of change in voxel-wise tumor cellularity, NTC(x̄,t), is a function of cell diffusion, proliferation, and death due to NAT. Diffusion is mechanically coupled to the surrounding tissue [2].

NTC(x̄,t) was calculated at each visit from DW-MRI-derived apparent diffusion coefficient maps [2]. We applied clustering algorithms to segment tissues that defined the modeling domain. The initial drug concentration was assumed to be proportional to DCE-MRI-derived contrast agent accumulation. Using a Levenberg-Marquardt nonlinear least-squares optimization method, we calibrated the efficacy and spatially-resolved proliferation rates to the V1 and V2 NTC(x̄,t) data. We ran the calibrated model forward to predict tumor status at V3 [2].

RESULTS For 91 patients, our model achieved CCC values of 0.94 and 0.90 between the observed and predicted V1 to V3 ΔTTC and ΔTV, respectively. The model overestimated tumor volume and underestimated voxel-wise cellularity for a subset of patients with tumor tissue compression from V2 to V3. However, across the cohort, there was a median of 0% for the median percent difference between the voxel-wise observed and predicted change in NTC(x̄,t), indicating a high voxel-wise predictive accuracy.

CONCLUSION Our mechanism-based mathematical model calibrated to multiparametric MRI data early in a course of NAT can accurately predict tumor status for LABC patients after the NAT course, which supports the potential for personalizing treatment via mathematical modeling.

REFERENCES [1]. Shien T and Iwata H. Jpn J Clin Oncol. 2020. [2]. Jarrett AM et al. Nat Protoc. 2021. [3]. Wu C et al. Cancer Res. 2022. [4]. Barker AD et al. Clin Pharmacol Ther. 2009.
Presented by
Reshmi J. S. Patel
Institution
The University of Texas at Austin, Department of Biomedical Engineering
Hashtags

Global Explainability of a Deep Abstaining Classifier for Cancer Pathology Reports

Sayera Dhaubhadel, Jamaludin Mohd-Yusof, Trilce Estrada, Benjamin Hamilton McMahon, Tanmoy Bhattacharya

Vote for this poster
Abstract
Background: The MOSSAIC Information Extraction API is a real-world NLP application of the Deep Abstaining Classifier (DAC) for automated information extraction from cancer pathology reports. The DAC is a novel deep learning architecture which allows the model to ‘abstain’ (or not answer) on those samples which are low confidence, often because of missing or ambiguous information. In previous work, we have shown that the DAC learns patterns within the data that can make prediction unreliable, allowing it to be trained to a specified level of accuracy at the expense of reduced coverage (abstaining on some fraction of the samples, which must then be manually classified).

Objective: To characterize sources of confusion in our real world DAC for automated classification of cancer pathology reports from NCI-SEER registries via global analysis of local explainability results.

Materials and Methods: We use a multitask convolutional neural network (MTCNN) based deep abstaining classifier (DAC) for NLP that is tuned to achieve at least 97% accuracy by identifying and abstaining on confusing samples. We then generate local explanations of classification with two methods: local interpretable model-agnostic explanations (LIME) and (gradient • input) techniques. We then develop a pipeline to extract global explainability from tens of thousands of local explanations and provide global insights into classification decisions for the cancer histology task, which comprise a substantial portion of the misclassifications.

Results: Our DAC obtains ≥ 97% accuracy by identifying both the classes and instances of reports most likely to be correctly classified and abstaining on those samples which are sources of confusion. By comparing several hundred local explanations, we determined that the gradient • input produces qualitatively similar explanations to LIME with significantly improved throughput, enabling an efficient path to global explainability. Application of our global explainability pipeline to the tens of thousands of local explanations from the gradient • input allows us to separate classification mismatch into groups we call label noise, conflicting information, and insufficient information. The 97% accuracy of our deep abstaining classifier (DAC) restricted its resolving power to the top four classes of both lung and breast cancers, improving the interpretability of its local explanations. This enabled identification of keywords strongly associated with specific classification/confusion categories.

Discussion and Conclusion: Global analysis of tens of thousands of local explainability results enabled us to obtain global insights into sources of confusion in an MTCNN based DAC. This suggests several specific strategies to iteratively improve our DAC in this complex real-world implementation.

Acknowledgement: This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725.
Presented by
Sayera Dhaubhadel <sayeradbl@lanl.gov>
Institution
Los Alamos National Laboratory, University of New Mexico
Hashtags
#machinelearning #explainability #informationextraction
Chat with Presenter
Available Nov 18th, 3-3:30 pm EST
Join the Discussion

UNNT: A novel Utility for comparing Neural Net and Tree-based models

Vineeth Gutta, Sunita Chandrasekaran

Vote for this poster
Abstract
Advancement in data science, machine learning (ML), and artificial intelligence (AI) methods has enabled extraction of meaningful information from large and complex datasets that has assisted in better understanding, diagnosing, and treating cancer. The understanding of the drug response domain in cancer research has been accelerated with developing ML models to aid in predicting the effectiveness of the drugs based on a specific genomic molecular feature. In our study, we explored tree-based models to improve the accuracy of a single drug response model and demonstrate that tree-based models such as XGBoost (eXtreme Gradient Boosting) have advantages over deep learning models, such as a convolutional neural network (CNN), for single drug response problems. However, comparing models is not a trivial task.

In this study we developed a novel robust framework called UNNT (A novel Utility for comparing Neural Net and Tree-based models) that trains and compares deep learning methods such as CNN and tree-based methods such as XGBoost on the user input dataset. We applied this software to a single drug response problem in cancer to identify the best performing ML method based on the National Cancer Institute 60 (NCI60) dataset. In addition, we studied the computational aspects of training each of these models where our results show that neither is evidently superior on both CPUs and GPUs while training. This shows that when both models have similar error rates for a dataset the hardware available determines the model choice for training.
Presented by
Vineeth Gutta
Institution
University of Delaware
Hashtags
Chat with Presenter
Available November 18th 3:00 - 3:15 PM
Join the Discussion

Back to top

Ultrasound Lesion Segmentation with an Unsupervised Learning Approach

Abdalrahman Alblwi and Kenneth E. Barner

Vote for this poster
Abstract
The problem of identifying regions of interest in ultrasound images using deep learning remains a challenge, particularly in biomedical segmentation. High-quality size-annotated data are necessary for superior performance, yet data annotation is time-consuming and requires human expertise. Previous research has addressed unsupervised segmentation in various imaging modalities without labels. However, the unique nature of ultrasound images requires further attention and effort to develop truly effective approaches. In this work, we propose a novel technique for unsupervised anomaly segmentation in lesion detection, focusing on identifying abnormal patterns in ultrasound images without the need for labels. We apply Cluster MixUp data augmentation to overcome data constraints and utilize an unsupervised network to generate binary anomaly masks for suspected lesions. Our approach is validated on four key breast ultrasound images (BUS) datasets from diverse populations with varying imaging qualities. Our results show that our method performs effectively in unsupervised segmentation scenarios, indicating its potential for real-world application.
Presented by
Abdalrahman Alblwi
Institution
University of Delaware
Hashtags
#segmentation #unsupervised_learning #anomaly #data_augmentation

Constraint-Based Hierarchical Loss: A Novel Loss Function for Hierarchical Classification

Abhishek Shivanna, Heidi A. Hanson, Adam Spannaus

Vote for this poster
Abstract
Presented by
Abhishek Shivanna
Institution
Advanced Computing for Health Sciences, Oak Ridge National Laboratory
Hashtags

Integrated Computing Environment for Next Generation Biology: Cloud-based HPC and Big Data Platform

Ramakrishnan Periyasamy, Sandeep Malviya, Vivek Gavane, Renu Gadhari, Kunal Tembhare, Neeraj Bharti, Palash Pullarwar, Prachi Barkale, Preet Jamsandekar, Pallavi Niturkar, Tina Sharma, Archana Achalere, Sunitha Manjari Kasibhatla, Uddhavesh Sonawane, Rajendra Joshi*

Vote for this poster
Abstract
Advancements in computational biology, especially with NGS and molecular dynamics simulations, have led to massive data generation, emphasizing the need for secure storage and analysis in centralized cloud platforms to foster collaboration. Addressing these challenges, the "Integrated Computing Environment" (ICE) has been developed to support next-generation biology on cloud and big data platforms, leveraging Kubernetes for container orchestration. ICE’s microservices architecture ensures scalability, availability, security, and fault tolerance, with a role-based access storage module and support for executing public Docker containers. This comprehensive system supports secure, scalable, and collaborative bioinformatics research. Additionally, a Kubernetes-native application facilitates large-scale comparisons of variant files, while the VCF analysis module enables the identification of unique and common variants across populations. A case study for integration of multi-omics data for research, like breast cancer survival prediction using gene expression, miRNA, DNA methylation, and copy number variation has been performed on ICE enabled through machine learning (ML) pipelines. The multi-ensemble method for survival prediction has an AUC of 89% and accuracy of 82% with SVM and PLS algorithms and is available via the ConnectOME Docker image.
Presented by
Ramakrishnan Periyasamy <rkrishnan@cdac.in>
Institution
Centre for Development of Advanced Computing
Hashtags
#ai, #ice, #container

Machine Learning Surrogate Model for Molecular Pose Optimization in Drug Discovery

Sean Black, Vineeth Gutta, Sunita Chandrasekaran

Vote for this poster
Abstract
The traditional drug discovery process is slow and expensive, taking approximately 10 years and $2-3 billion dollars per drug. In silico methodologies are becoming more accurate and efficient for screening large databases for new lead compounds and computational resources are becoming faster and cheaper. Furthermore, generative AI methods are being increasingly employed to suggest new molecules to be evaluated, expanding the space of de novo molecules to be evaluated. In oncological applications, validating the binding affinity of potential new drugs with molecular dynamics is an essential step in both efficacy and toxicity evaluation. Unfortunately, such methods are computationally intensive and time consuming, slowing the evaluation process and increasing compute costs. Our method aims to predict the binding free energy of docking poses to reduce the need for computationally expensive Molecular Dynamics (MD) simulations as part of a larger workflow. We have successfully designed a set of novel descriptors and trained a machine learning model to predict the binding free energy of docking poses, allowing for ligand relaxation in the protein pocket. In doing so, we are able to forego what have historically been multiple molecular dynamics runs, accelerating the evaluation process dramatically. For training, a set of 10,000 molecules provided data on 100 poses for each molecule, totaling nearly 1,000,000 data points. The Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) was computed for each pose as a measure of binding free energy. The interaction space descriptor employed 600 novel features that capture the relative positional information of the protein-ligand docking pose. We employed the ATOM Modeling Pipeline (AMPL), an open source machine learning molecular prediction pipeline, to train our model to predict the MMPBSA values. Prior to further hyperparameter optimization and descriptor innovation, our model’s performance has already achieved an R2 of 0.602 where 1 is the best possible score. This model is more than sufficient for the predictive role in the intended molecule evaluation process.
Presented by
Sean Black, Vineeth Gutta <sean.black2@nih.gov>
Institution
Frederick National Laboratory for Cancer Research, University of Delaware
Hashtags
#ComputationalCancerResearch #Cancer #CancerResearch

Distributed KAN Enhanced Vision Transformers for Brain Tumor Detection

Vijayalakshmi Saravanan, Ephrem A. Yekun, Lakshman Tamil, and Arvind Ramanathan

Vote for this poster
Abstract
The diagnosis of brain cancer, an extremely aggressive and challenging malignancy, stands to be gained substantially from advancements in diagnostic technologies. In this context, our research introduces a novel approach to brain tumor detection that leverages the latest innovations in artificial intelligence. We propose a novel framework that integrates Vision Transformers (ViTs) with Kolmogorov-Arnold Networks (KANs), resulting in a sophisticated and highly effective diagnostic tool. Vision Transformers have recently gained acclaim for their exceptional capabilities in handling complex image analysis tasks, owing to their ability to capture complex patterns and features within visual data. Meanwhile, KANs are recognized for their robustness in modeling complex, non-linear relationships, which enhances their utility in various classification tasks. By integrating these two technologies, our approach significantly advances the state of the art in brain tumor detection. The integration of parallelized KANs with Vision Transformers introduces a new level of performance and accuracy, better conventional neural network architectures. Our approach classifies brain tumors into three categories—gliomas, meningiomas, and pituitary tumors—using preprocessed MRI images. The training process is uniquely designed: all layers of the vision transformer model are frozen except for the final layer, which is replaced with a KAN. This modification enhances the model's classification accuracy by leveraging the advanced capabilities of KAN. It utilizes univariate functions parameterized as links instead of traditional linear weights, and features learnable activation functions, making them particularly effective for nuanced classification tasks. To meet the computational demands of processing high-resolution medical images, we employ a parallelized framework. This distributed architecture harnesses technologies such as Apache Spark, Databricks, and Nvidia GPUs, significantly improving both training and inference times. By distributing the workload across multiple nodes and GPUs, our system addresses the practical limitations of implementing AI models in healthcare, ensuring timely and accurate results are feasible. The integration of KANs with vision transformers, coupled with a parallelized infrastructure, offers a promising solution for enhancing tumor detection accuracy and efficiency. KANs' ability to model complex, non-linear relationships complements the vision transformer's strength in capturing long-range dependencies and global context. This combination pushes the boundaries of medical image analysis, potentially leading to earlier detection, better treatment planning, and improved patient outcomes. Furthermore, our approach addresses key limitations of traditional convolutional neural networks (CNNs) by leveraging the strengths of vision transformers in capturing diverse tumor presentations and KANs in refining classification precision. The parallelization of our approach opens new avenues for large-scale medical image analysis. As healthcare institutions increasingly generate vast volumes of imaging data, the need for efficient processing becomes critical. Our proposed approach accelerates both training and inference, enabling the analysis of larger and more diverse datasets. This capability leads to the development of more robust and generalizable models. Enhanced efficiency also supports more frequent updates and retraining, allowing the system to rapidly adapt to new data and evolving medical knowledge. As research advances, this approach could serve as a foundation for developing more sophisticated AI systems in medical diagnostics, potentially extending beyond brain tumor detection to other areas of healthcare.
Presented by
Vijayalakshmi Saravanan
Institution
University of Texas at Tyler, Brookhaven National Laboratory, University of Texas at Dallas, Argonne National Laboratory
Hashtags

Back to top

The Hallmarks of Predictive Oncology

Akshat Singhal, Xiaoyu Zhao, Patrick Wall, Emily So, Guido Calderini, Alexander Partin, Natasha Koussa, Priyanka Vasanthakumari, Sara Jones, Oleksandr Narykov, Yitan Zhu, Farnoosh Abbas-Aghababazadeh, Sisira Kadambat Nair, Jean-Christophe Bélisle-Pipon, Jason I. Griffiths, Athmeya Jayaram, Barbara A. Parker, Kay T. Yeung, Ryan Weil, Aritro Nath, Benjamin Haibe-Kains, Trey Ideker

Vote for this poster
Abstract
The rapid evolution of machine learning has led to a proliferation of sophisticated models for predicting therapeutic responses in cancer. While many of these show promise in research, standards for clinical evaluation and adoption are lacking. Here, we propose seven hallmarks by which predictive oncology models can be assessed and compared. These are Data Relevance, Expressive Architecture, Standardized Benchmarking, Generalizability, Interpretability, Accessibility, and Fairness. Considerations for each hallmark are discussed along with an example model scorecard. We encourage the broader community– including researchers, clinicians and regulators– to engage in shaping these guidelines towards a concise set of standards.
Presented by
Akshat Singhal
Institution
University of California, San Diego, La Jolla, CA, USA; Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Simon Fraser University, Burnaby, BC, Canada; Argonne National Laboratory, Lemont, IL, USA; Frederick National Laboratory for Cancer Research, Frederick, MD, USA; The Hastings Center, Garrison, NY, USA; City of Hope National Medical Center, Monrovia, CA, USA
Hashtags

An Ensemble Machine Learning Model Identifies Metabolite Modulators of Epigenetic Drugs

Scott E. Campit, Rupa Bhowmick, Taoan Lu, Aaditi Vivek Saoji, Ran Jin, Madeline R. Shay, Aaron M. Robida, Sriram Chandrasekaran*

Vote for this poster
Abstract
Introduction: The interplay between metabolites and epigenetics is essential for understanding how environmental factors like diet, stress, and toxin exposure can affect gene expression and potentially lead to various diseases and disorders. Metabolism can alter epigenetic states. Metabolic dysregulations are known to alter acetylation states, contributing to increased cancer risk. Notably, glucose-derived acetyl-CoA correlates with histone acetylation and oncogene activation. In various cancers, loss-of-function mutations in metabolic genes lead to the accumulation of succinate, fumarate, and 2-hydroxyglutarate, which inhibit demethylase enzymes (e.g., TET2) and promote epithelial-to-mesenchymal transition. Thus, understanding the connections between metabolism and gene expression through epigenetic regulation is vital for comprehending how metabolo-epigenetic interactions can influence disease development and therapeutic responses. To this end, we employed a data-driven, systems-pharmacology approach to uncover and characterize metabolo-epigenetic interactions by constructing an ensemble interaction network. Using the network, we prioritized histone post-translational modifications (PTMs), metabolite interactions, and the metabolic dependencies of epigenetic drugs.

Methods: We integrated global chromatin profiles, epigenetic drug sensitivity data, and metabolomics data from over 600 cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE). Using an ensemble of machine learning models: Least Absolute Shrinkage and Selection Operator (LASSO), stepwise regression, and k-Top Scoring Pairs (kTSP), we identified significant histone PTM-metabolite and metabolite-drug interactions. The predicted interactions were validated through experimental analysis, focusing on metabolites with synergistic or antagonistic relationships with chromatin-modifying drugs. We further analyzed the dynamics of metabolite/PTM interactions during the epithelial-mesenchymal transition (EMT).

Results: Our analysis uncovered novel metabolic dependencies of various epigenetic drugs. The predictions, validated for five metabolites, revealed synergistic or antagonistic interactions with Vorinostat, a histone deacetylase (HDAC) inhibitor, and GSK-J4, a H3K27me2/3 demethylase inhibitor. Specifically, our systems-pharmacology approach indicated that certain metabolites could influence the efficacy of HDAC inhibitors by modulating metabolic flux, as demonstrated previously through flux balance analysis. This supports the notion that metabolic gene expression is predictive of drug sensitivity. Importantly, our interaction network also offers insights into the mechanistic underpinnings of metabolism and epigenome regulation during cellular processes such as EMT.

Conclusion: Our findings highlight the importance of considering environmental variables, such as metabolite availability or nutrient levels, when assessing the efficacy of epigenetic therapeutics. By unveiling metabolic dependencies, this study advances our understanding of how metabolism influences chromatin-modifying drug efficacy and provides a foundation for developing predictive tools in cancer therapy.
Presented by
Rupa Bhowmick <rbhowmic@umich.edu>
Institution
Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, 48109
Hashtags
#MachineLearning #ensemblemodeling #metaboloepigentics #epigeneticmodification #metabolism
Chat with Presenter
Available Time: Nov 18, 2024 03:00 PM Eastern Time (US and Canada)
Join the Discussion

Back to top

Training the MOSSAIC OncID Classifier on Kentucky Data

Isaac Hands, Sally Ellingson, John Gounley, Heidi A. Hanson, Patrycja Krawczuk, Dakota Murdock, Eric Durbin

Vote for this poster
Abstract
Cancer is a leading cause of mortality, with pathology reports providing essential information for diagnosis and staging. These text documents are reviewed at cancer registries and abstracted into structured datasets through a manual process that is slow and resource-intensive. Natural language processing (NLP) techniques have the potential to enhance data abstraction efficiency and accuracy in cancer registries, allowing greater use of cancer registry data for patient benefit. Due to the considerable volume of clinical text in a patient medical record, cancer registries spend a lot of time determining whether a clinical text document is related to a cancer diagnosis or not. In this study, we explore the application of OncID, a deep learning NLP classifier previously developed as part of the MOSSAIC project using pathology reports from the Seattle SEER registry. Here, we use pathology reports from Kentucky to train a new MOSSAIC classifier, named KY OncID.

The MOSSAIC classifier is a hierarchical self-attention network model created from two software packages. BARDI (Batch-processing Abstraction for Raw Data Integration) is a specialized framework engineered to facilitate the development of reproducible data pre-processing pipelines within machine learning workflows and includes steps to build the vocabulary used in the modeling. The modeling is done with FrESCO (Framework for Exploring Scalable Computational Oncology). We use this framework to make a binary classifier of reportable (positive class) vs non-reportable (negative class) pathology reports.

The Kentucky dataset was created by combining historical pathology reports labeled by an older rule-based classifier with newer pathology reports labeled by the MOSSAIC OncID classifier. For all documents labeled positive by either classifier, a professional Oncology Data Specialist (ODS) manually reviewed the label and further classified the reports into more specific reportable categories. In order to build a binary classifier, we labeled everything manually reviewed by the ODS and determined to be reportable as ‘Cancer’ and everything else as ‘No Cancer’. We took the most recent 10% of documents (n=196,929) as a complete holdout set for testing as it has approximately the same 6:1 ratio of non-reportable to reportable cases that we see in production. The remaining data for training (n=1,772,360) has the opposite balance of classes with approximately 4:1 more reportable cases.

Using the Kentucky data to train the new MOSSAIC classifier and testing on our holdout data resulted in a higher F1 score than the MOSSAIC model trained solely on Seattle data, due to a better false positive rate (FPR). Our baseline model had an FPR and F1 score of 0.014 and 0.920, compared to 0.094 and 0.769 using the original model.

Cancer registries optimize for the lowest false negative rate (FNR) while keeping a reasonable FPR and F1 score since it is critical that a SEER cancer registry does not miss cases. We continue to investigate our false negative pathology reports for insights on improving the model for use in production.
Presented by
Isaac Hands
Institution
University of Kentucky Markey Cancer Center
Hashtags
Chat with Presenter
Available Monday November 18 3:00-4:00PM ET
Join the Discussion

Towards Interactive Analysis of Whole Slide Multiplex Immunofluorescence Images for Biomarker Discovery on the Alps Supercomputer

Lukas Drescher, Ossia Eichhoff, Patrick Turko, James Whipman, Josephine Yates, Tumor Profiler Consortium, Valentina Boeva, Mitchell P. Levesque

Vote for this poster
Abstract
Therapy-resistant melanoma remains a major clinical challenge despite significant advances in the last decades with new immuno- and targeted therapies. Increasing evidence suggests that cancer-associated fibroblasts (CAFs) play an important role in creating an environment that isolates the tumor from the immune system and promotes local invasion as well as metastasis. To target these steps clinically, there is an urgent need to discover new biomarkers incorporating spatial information.

To study spatial behavior, multiplex immunohistochemistry (IHC) has emerged as a popular approach in recent years, allowing the simultaneous visualization of different markers. In order to gain a holistic understanding of tumor heterogeneity and its interaction with CAFs, however, whole slide image analysis is required, which poses significant computational challenges and requires frequent iteration with expert feedback. Currently widely-used digital pathology tools are often not designed to handle millions of cells and have runtimes that exceed lab requirements, necessitating a new approach.

In this work, we focus on the spatial analysis of the tumor micro-environment of a set of metastatic melanoma patients undergoing immuno-therapy. In a preliminary step, we identify cells with a transcriptional profile compatible with CAFs on a large patient cohort using scRNA-seq and establish marker genes to differentiate them functionally. We then generate hypotheses through cell-cell communication analysis that informs a subsequent spatial analysis, where the role of the CAF phenotypes is analyzed on whole slide multiplexed IHC images spanning the full size of metastasis biopsies.

Building on recent progress in the NVIDIA RAPIDS suite, we construct a platform that enables a near-interactive whole slide spatial analysis experience on the Alps infrastructure, the newly introduced supercomputer at the Swiss National Supercomputing Centre. This involves efficient GPU-based clustering as well as a browser-based visualization that can handle millions of cells, improving by up to two orders of magnitude in analysis speed and visualization scale over baseline community implementations. As a result, phenotypes can be retrieved in an efficient, robust and scale-independent way using ensembles of clusterings.

We exemplify this workflow on a lymph node metastasis, where it allows the precise spatial analysis of rare CAF phenotypes and conclude with an outlook on the scale-out of the platform to a large, multi-centric patient cohort in order to identify predictive spatial biomarkers.
Presented by
Lukas Drescher
Institution
Swiss National Supercomputing Centre (CSCS), ETH Zurich, University Hospital Zurich, University of Zurich
Hashtags