CAFCW24
FNL - Cancer Data Science Initiatives Team
Filter displayed posters (32 tags)
Tracks
Integration of Advanced Computational Approaches in Colorectal Cancer (CRC) Research
Abdelouahab Dehimat
Utilizing advanced technologies has allowed researchers to uncover unique metabolic, immunophenotypic, and transcriptional characteristics within different cell subtypes, highlighting the complex diversity of the tumor microenvironment (TME). The integration of high-throughput sequencing data with computational models has played a crucial role in these investigations. This has involved creating prognostic risk models based on single-cell data and building gene regulatory networks (GRNs) for CRC. Computational tools such as CIBERSORTx and CMScaller have been employed to measure cell subpopulation fractions and determine consensus molecular subtypes (CMS), respectively. These models have been instrumental in pinpointing key transcription factors like ERG and in clarifying the roles of specific gene regulatory elements in CRC progression. The main research areas involve studying how metabolic reprogramming and immune evasion mechanisms function within the CRC TME. Recent studies have shown how lipid metabolism and immune suppression are controlled in various cell subtypes, suggesting potential therapeutic targets like RPS17. Furthermore, the discovery of epigenetic regulators and chromatin accessibility patterns has provided new insights into CRC subtypes, especially in distinguishing between iCMS and CIMP phenotypes. This research not only highlights the technical strengths of combining single-cell technologies with computational models but also emphasizes the potential for these approaches to lead to personalized treatment strategies. In conclusion, our work underscores the importance of computational biology in advancing CRC research, offering a multi-dimensional understanding that could pave the way for more effective and personalized interventions. By addressing the challenges of data integration and interpretation, this research opens new avenues for therapeutic development and better clinical outcomes for CRC patients.
Alcott: A Convolutional Neural Network to Predict Multimeric Interactions in HIV-1 Neural Infection
Anna Mohanty
Optimal Prescriptive Treatments for Ovarian Cancer with Genetic Data
Alkiviadis Mertzios, Matea Gjika, Xidan Xu, SamayitaGuha, Neelkanth M. Bardhan, SubodhaKumar, Angela Belcher, Georgia Perakis
A Novel Microservices Architecture for Digital Twins
Jeremy Balian, Jun Deng , Anvi Sud, Shreya Tiwari, Shivani Maffi, Koninika Ray, Anil Srivastava, Jeevan Saini, Haresh KP, Mariano Vazquez
Bridging Scales with Healthy AI: Transforming Cancer Treatment through Multiscale Integration of Technology and Biology
Debsindhu Bhowmik‡, Chris Stanley, John Vant, Paul Inman, John Gounley, Anuj Kapadia
‘Healthy AI’ embodies a transformative approach that merges technological innovation with biological systems, underscoring the significance of different length scales in advancing cancer treatment and drug designing. This integrated framework highlights how AI can seamlessly connect cellular, molecular, and systemic levels, driving progress in personalized medicine and drug discovery.
Cancer treatment necessitates a deep understanding of interactions across various scales. At the cellular scale, Agent-Based Modeling (ABM) has been pivotal in simulating the complex dynamics between cancer cells and the immune system inside the tumor microenvironment. ABM offers crucial insights into tumor heterogeneity and immune responses however frequently faces challenging situations due to excessive computational costs and time constraints, which limit its scientific applicability. To address these challenges, AI-driven techniques, supported via HPC, provide a transformative solution. Integrating AI with ABM allows the execution of speculative events simulations, generating comprehensive datasets that capture tumor behavior under different unique conditions. AI models trained on those data can discover hidden characteristics and predict treatment outcomes with high accuracy. For example, AI can examine tumor morphologies from histopathological images to predict responses to various therapies, accelerating the development of personalized treatments and improving patient results. This integration of AI into ABM exemplifies how technology can enhance our understanding and manipulation of biological systems at the cellular level, bridging the gap between complex biological interactions and effective treatment strategies.
Moving to the molecular scale, Healthy AI drives innovation in designing drugs, which is essential for drug discovery. Traditional molecular design approaches often rely on rigid, predefined rules that can limit the exploration of novel compounds. To overcome these limitations, we are leveraging language models (LMs) with critic component and genetic algorithms (GAs) in a unified framework. LMs with critic facilitate the automated generation of molecular structures, while GAs simulates evolutionary processes to enhance structural diversity and optimize molecular properties. This approach enables the discovery of new molecules with desirable characteristics, demonstrating how AI can advance molecular design and connect it to practical applications in therapy.
The synergy between AI-driven cancer immunotherapy and advanced molecular scale drug design highlights the multiscale nature of ‘Healthy AI’. By connecting detailed cellular interactions with innovative molecular scale design, this approach accelerates personalized therapy development and drives new discoveries in drug designing. This comprehensive strategy illustrates how AI can bridge different scales, from cellular dynamics to molecular innovation, to address the complexities of cancer treatment and beyond.
In summary, our ‘Healthy AI’ platform represents a paradigm shift that integrates technological innovation with the complexities of biological systems across multiple scales. By harnessing AI to model and manipulate interactions from cellular to the molecular to the systemic level, this approach offers new possibilities for effective cancer therapies and groundbreaking discoveries, ultimately transforming the future of healthcare and scientific research.
Increasing Confidence in AI Models in the Medical Field
Jake Gwinn, Justin M. Wozniak, Thomas Brettin
Use of ATOM Modeling PipeLine (AMPL) and Generalized Generative Molecular Design (GGMD) to Discover Natural Product-Like Compounds to Target Brd4-BD1
Justin Overhulse1, Arjun Parambathu2, Jay Patel2, Kushagra Srivastava2, Leonardo Pierre3, Jiayi Yang4
Software Application Development for Medical Data Migration and Integration to NIDAP (NIH Integrated Data Analysis Platform)
Zhang LZ, Ning H, Zhuge Y, Cheng J, Li B, Chappidi S, Tasci E, Miller RW, Krauze A
This software application will be a powerful tool for both clinical and research purposes, significantly improving the efficiency of medical data management and analysis.
Evaluating the Efficacy of Synthetic Pathology Reports
Patrycja Krawczuk , Christopher Stanley, John Gounley, Heidi A. Hanson
Artificial intelligence-powered drug discovery: A case study on LD50 prediction and molecular optimization
Logan Hallee, Nikhil Rao, Nikolaos Rafailidis, Tom Le, Colin Horger, Herman Singh, Naomi Ohashi, Pinyi Lu
Methods: The ATOM Modeling PipeLine (AMPL), a data-driven modeling pipeline for drug discovery, was applied to build AI (regression) models to predict LD 50 . The models were trained, validated, and tested using the LD 50 dataset obtained from the Therapeutic Data Commons (TDC). We evaluated random forest, neural network, and XGBoost models over a large range of hyperparameters in terms of mean absolute error (MAE) and coefficient of determination (R 2 ). A production model was created based on the best hyperparameters and evaluated on additional data generated by EPA Toxicity Estimation Software Tool, NIH Collaborative Acute Toxicity Modeling Suite, and National Toxicology Program, which was applied as a scoring function in the GGMD framework. The production model was also tested on precisionFDA, a secure, collaborative, high-performance computing platform that builds a community of experts around the analysis of biological datasets in order to advance precision medicine. In our case study, GGMD used a junction tree variational autoencoder mapping structures to latent vectors, along with a genetic algorithm operating on latent vectors, to search a diverse molecular space for molecular optimization toward the design criteria. We applied the GGMD framework to design molecule with desirable LD 50 on the Delaware Advanced Research Workforce and Innovation Network (DARWIN), a big data and high-performance computing system designed to catalyze Delaware research and education.
Results: Among those evaluated AMPL models, the small autoencoder-like neural network model (Layer sizes: (1972, 66, 1940)) performed the best to predict LD 50 with a test MAE, 0.587. The model performance is comparable to the second-best model’s as published in the TDC leaderboard (https://tdcommons.ai/benchmark/admet_group/19ld50/). We were able to apply the GGMD framework to design molecules with reduced toxicity by optimizing molecular structures, while the framework could be also used to design compounds with high toxicity, indicating the dual roles of AI-powered drug discovery.
Summary: By using AMPL and GGMD, we created a predictive model to predict a proxy for toxicity, LD 50 , and generated new molecules with desirable toxicity properties, which demonstrated the potential of predictive and generative modeling in increasing the throughput of drug discovery. Meanwhile, we recognized the dangerous potential of misuse of AI-powered drug discovery, such as designing compounds with high toxicity, and revealed the need for efforts to ensure healthy AI development, use, and oversight.
Converting multi-omics data into multi-channel images for drug response modeling using convolutional neural networks
Priyanka Vasanthakumari1, Yitan Zhu1, Thomas Brettin2, Oleksandr Narykov1, Alexander Partin1, Maulik Shukla1, Nicholas Chia1, Fangfang Xia1, and Rick L. Stevens2,3
References Zhu, Y., Brettin, T., Xia, F., Partin, A., Shukla, M., Yoo, H., Evrard, Y.A., Doroshow, J.H. and Stevens, R.L., 2021. Converting tabular data into images for deep learning with convolutional neural networks. Scientific reports, 11(1), p.11325.
Using a Mechanism-Based Mathematical Model and Multiparametric MRI to Predict Response to Therapy of I-SPY 2 Breast Cancer Patients
Reshmi J. S. Patel, Chengyue Wu, Casey E. Stowers, Rania M. Mohamed, Jingfei Ma, Gaiane M. Rauch, Thomas E. Yankeelov
METHODS The I-SPY 2 clinical trial for LABC patients acquired dynamic contrast-enhanced (DCE) and diffusion-weighted (DW) magnetic resonance imaging (MRI) scans before (V1), three weeks into (V2), and after (V3) the first NAT course [4]. Our subset of 91 patients includes 42 hormone receptor-positive/human epidermal growth factor receptor 2-negative (HR+/HER2−), 22 HER2+, and 27 triple negative breast cancer patients.
Our mathematical model is a reaction-diffusion partial differential equation solved in space and time via the finite difference method. The rate of change in voxel-wise tumor cellularity, NTC(x̄,t), is a function of cell diffusion, proliferation, and death due to NAT. Diffusion is mechanically coupled to the surrounding tissue [2].
NTC(x̄,t) was calculated at each visit from DW-MRI-derived apparent diffusion coefficient maps [2]. We applied clustering algorithms to segment tissues that defined the modeling domain. The initial drug concentration was assumed to be proportional to DCE-MRI-derived contrast agent accumulation. Using a Levenberg-Marquardt nonlinear least-squares optimization method, we calibrated the efficacy and spatially-resolved proliferation rates to the V1 and V2 NTC(x̄,t) data. We ran the calibrated model forward to predict tumor status at V3 [2].
RESULTS For 91 patients, our model achieved CCC values of 0.94 and 0.90 between the observed and predicted V1 to V3 ΔTTC and ΔTV, respectively. The model overestimated tumor volume and underestimated voxel-wise cellularity for a subset of patients with tumor tissue compression from V2 to V3. However, across the cohort, there was a median of 0% for the median percent difference between the voxel-wise observed and predicted change in NTC(x̄,t), indicating a high voxel-wise predictive accuracy.
CONCLUSION Our mechanism-based mathematical model calibrated to multiparametric MRI data early in a course of NAT can accurately predict tumor status for LABC patients after the NAT course, which supports the potential for personalizing treatment via mathematical modeling.
REFERENCES [1]. Shien T and Iwata H. Jpn J Clin Oncol. 2020. [2]. Jarrett AM et al. Nat Protoc. 2021. [3]. Wu C et al. Cancer Res. 2022. [4]. Barker AD et al. Clin Pharmacol Ther. 2009.
Global Explainability of a Deep Abstaining Classifier for Cancer Pathology Reports
Sayera Dhaubhadel, Jamaludin Mohd-Yusof, Trilce Estrada, Benjamin Hamilton McMahon, Tanmoy Bhattacharya
Objective: To characterize sources of confusion in our real world DAC for automated classification of cancer pathology reports from NCI-SEER registries via global analysis of local explainability results.
Materials and Methods: We use a multitask convolutional neural network (MTCNN) based deep abstaining classifier (DAC) for NLP that is tuned to achieve at least 97% accuracy by identifying and abstaining on confusing samples. We then generate local explanations of classification with two methods: local interpretable model-agnostic explanations (LIME) and (gradient • input) techniques. We then develop a pipeline to extract global explainability from tens of thousands of local explanations and provide global insights into classification decisions for the cancer histology task, which comprise a substantial portion of the misclassifications.
Results: Our DAC obtains ≥ 97% accuracy by identifying both the classes and instances of reports most likely to be correctly classified and abstaining on those samples which are sources of confusion. By comparing several hundred local explanations, we determined that the gradient • input produces qualitatively similar explanations to LIME with significantly improved throughput, enabling an efficient path to global explainability. Application of our global explainability pipeline to the tens of thousands of local explanations from the gradient • input allows us to separate classification mismatch into groups we call label noise, conflicting information, and insufficient information. The 97% accuracy of our deep abstaining classifier (DAC) restricted its resolving power to the top four classes of both lung and breast cancers, improving the interpretability of its local explanations. This enabled identification of keywords strongly associated with specific classification/confusion categories.
Discussion and Conclusion: Global analysis of tens of thousands of local explainability results enabled us to obtain global insights into sources of confusion in an MTCNN based DAC. This suggests several specific strategies to iteratively improve our DAC in this complex real-world implementation.
Acknowledgement: This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725.
UNNT: A novel Utility for comparing Neural Net and Tree-based models
Vineeth Gutta, Sunita Chandrasekaran
In this study we developed a novel robust framework called UNNT (A novel Utility for comparing Neural Net and Tree-based models) that trains and compares deep learning methods such as CNN and tree-based methods such as XGBoost on the user input dataset. We applied this software to a single drug response problem in cancer to identify the best performing ML method based on the National Cancer Institute 60 (NCI60) dataset. In addition, we studied the computational aspects of training each of these models where our results show that neither is evidently superior on both CPUs and GPUs while training. This shows that when both models have similar error rates for a dataset the hardware available determines the model choice for training.
▼ AI Back to top
Ultrasound Lesion Segmentation with an Unsupervised Learning Approach
Abdalrahman Alblwi and Kenneth E. Barner
Constraint-Based Hierarchical Loss: A Novel Loss Function for Hierarchical Classification
Abhishek Shivanna, Heidi A. Hanson, Adam Spannaus
Integrated Computing Environment for Next Generation Biology: Cloud-based HPC and Big Data Platform
Ramakrishnan Periyasamy, Sandeep Malviya, Vivek Gavane, Renu Gadhari, Kunal Tembhare, Neeraj Bharti, Palash Pullarwar, Prachi Barkale, Preet Jamsandekar, Pallavi Niturkar, Tina Sharma, Archana Achalere, Sunitha Manjari Kasibhatla, Uddhavesh Sonawane, Rajendra Joshi*
Machine Learning Surrogate Model for Molecular Pose Optimization in Drug Discovery
Sean Black, Vineeth Gutta, Sunita Chandrasekaran
Distributed KAN Enhanced Vision Transformers for Brain Tumor Detection
Vijayalakshmi Saravanan, Ephrem A. Yekun, Lakshman Tamil, and Arvind Ramanathan
▼ computational cancer Back to top
The Hallmarks of Predictive Oncology
Akshat Singhal, Xiaoyu Zhao, Patrick Wall, Emily So, Guido Calderini, Alexander Partin, Natasha Koussa, Priyanka Vasanthakumari, Sara Jones, Oleksandr Narykov, Yitan Zhu, Farnoosh Abbas-Aghababazadeh, Sisira Kadambat Nair, Jean-Christophe Bélisle-Pipon, Jason I. Griffiths, Athmeya Jayaram, Barbara A. Parker, Kay T. Yeung, Ryan Weil, Aritro Nath, Benjamin Haibe-Kains, Trey Ideker
An Ensemble Machine Learning Model Identifies Metabolite Modulators of Epigenetic Drugs
Scott E. Campit, Rupa Bhowmick, Taoan Lu, Aaditi Vivek Saoji, Ran Jin, Madeline R. Shay, Aaron M. Robida, Sriram Chandrasekaran*
Methods: We integrated global chromatin profiles, epigenetic drug sensitivity data, and metabolomics data from over 600 cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE). Using an ensemble of machine learning models: Least Absolute Shrinkage and Selection Operator (LASSO), stepwise regression, and k-Top Scoring Pairs (kTSP), we identified significant histone PTM-metabolite and metabolite-drug interactions. The predicted interactions were validated through experimental analysis, focusing on metabolites with synergistic or antagonistic relationships with chromatin-modifying drugs. We further analyzed the dynamics of metabolite/PTM interactions during the epithelial-mesenchymal transition (EMT).
Results: Our analysis uncovered novel metabolic dependencies of various epigenetic drugs. The predictions, validated for five metabolites, revealed synergistic or antagonistic interactions with Vorinostat, a histone deacetylase (HDAC) inhibitor, and GSK-J4, a H3K27me2/3 demethylase inhibitor. Specifically, our systems-pharmacology approach indicated that certain metabolites could influence the efficacy of HDAC inhibitors by modulating metabolic flux, as demonstrated previously through flux balance analysis. This supports the notion that metabolic gene expression is predictive of drug sensitivity. Importantly, our interaction network also offers insights into the mechanistic underpinnings of metabolism and epigenome regulation during cellular processes such as EMT.
Conclusion: Our findings highlight the importance of considering environmental variables, such as metabolite availability or nutrient levels, when assessing the efficacy of epigenetic therapeutics. By unveiling metabolic dependencies, this study advances our understanding of how metabolism influences chromatin-modifying drug efficacy and provides a foundation for developing predictive tools in cancer therapy.
▼ pathology Back to top
Training the MOSSAIC OncID Classifier on Kentucky Data
Isaac Hands, Sally Ellingson, John Gounley, Heidi A. Hanson, Patrycja Krawczuk, Dakota Murdock, Eric Durbin
The MOSSAIC classifier is a hierarchical self-attention network model created from two software packages. BARDI (Batch-processing Abstraction for Raw Data Integration) is a specialized framework engineered to facilitate the development of reproducible data pre-processing pipelines within machine learning workflows and includes steps to build the vocabulary used in the modeling. The modeling is done with FrESCO (Framework for Exploring Scalable Computational Oncology). We use this framework to make a binary classifier of reportable (positive class) vs non-reportable (negative class) pathology reports.
The Kentucky dataset was created by combining historical pathology reports labeled by an older rule-based classifier with newer pathology reports labeled by the MOSSAIC OncID classifier. For all documents labeled positive by either classifier, a professional Oncology Data Specialist (ODS) manually reviewed the label and further classified the reports into more specific reportable categories. In order to build a binary classifier, we labeled everything manually reviewed by the ODS and determined to be reportable as ‘Cancer’ and everything else as ‘No Cancer’. We took the most recent 10% of documents (n=196,929) as a complete holdout set for testing as it has approximately the same 6:1 ratio of non-reportable to reportable cases that we see in production. The remaining data for training (n=1,772,360) has the opposite balance of classes with approximately 4:1 more reportable cases.
Using the Kentucky data to train the new MOSSAIC classifier and testing on our holdout data resulted in a higher F1 score than the MOSSAIC model trained solely on Seattle data, due to a better false positive rate (FPR). Our baseline model had an FPR and F1 score of 0.014 and 0.920, compared to 0.094 and 0.769 using the original model.
Cancer registries optimize for the lowest false negative rate (FNR) while keeping a reasonable FPR and F1 score since it is critical that a SEER cancer registry does not miss cases. We continue to investigate our false negative pathology reports for insights on improving the model for use in production.
Towards Interactive Analysis of Whole Slide Multiplex Immunofluorescence Images for Biomarker Discovery on the Alps Supercomputer
Lukas Drescher, Ossia Eichhoff, Patrick Turko, James Whipman, Josephine Yates, Tumor Profiler Consortium, Valentina Boeva, Mitchell P. Levesque
To study spatial behavior, multiplex immunohistochemistry (IHC) has emerged as a popular approach in recent years, allowing the simultaneous visualization of different markers. In order to gain a holistic understanding of tumor heterogeneity and its interaction with CAFs, however, whole slide image analysis is required, which poses significant computational challenges and requires frequent iteration with expert feedback. Currently widely-used digital pathology tools are often not designed to handle millions of cells and have runtimes that exceed lab requirements, necessitating a new approach.
In this work, we focus on the spatial analysis of the tumor micro-environment of a set of metastatic melanoma patients undergoing immuno-therapy. In a preliminary step, we identify cells with a transcriptional profile compatible with CAFs on a large patient cohort using scRNA-seq and establish marker genes to differentiate them functionally. We then generate hypotheses through cell-cell communication analysis that informs a subsequent spatial analysis, where the role of the CAF phenotypes is analyzed on whole slide multiplexed IHC images spanning the full size of metastasis biopsies.
Building on recent progress in the NVIDIA RAPIDS suite, we construct a platform that enables a near-interactive whole slide spatial analysis experience on the Alps infrastructure, the newly introduced supercomputer at the Swiss National Supercomputing Centre. This involves efficient GPU-based clustering as well as a browser-based visualization that can handle millions of cells, improving by up to two orders of magnitude in analysis speed and visualization scale over baseline community implementations. As a result, phenotypes can be retrieved in an efficient, robust and scale-independent way using ensembles of clusterings.
We exemplify this workflow on a lymph node metastasis, where it allows the precise spatial analysis of rare CAF phenotypes and conclude with an outlook on the scale-out of the platform to a large, multi-centric patient cohort in order to identify predictive spatial biomarkers.