2024

Novae: a graph-based foundation model for spatial transcriptomics data

Quentin Blampey, Hakim Benkirane, Nadege Bercovici, Fabrice André, Paul-Henry Cournede

Preprint on bioRxiv

Spatial transcriptomics is advancing molecular biology by providing high-resolution insights into gene expression within the spatial context of tissues. This context is essential for identifying spatial domains, enabling the understanding of micro-environment organizations and their implications for tissue function and disease progression. To improve current model limitations on multiple slides, we have designed Novae (https://github.com/MICS-Lab/novae), a graph-based foundation model that extracts representations of cells within their spatial contexts. Our model was trained on a large dataset of nearly 30 million cells across 18 tissues, allowing Novae to perform zero-shot domain inference across multiple gene panels, tissues, and technologies. Unlike other models, it also natively corrects batch effects and constructs a nested hierarchy of spatial domains. Furthermore, Novae supports various downstream tasks, including spatially variable gene or pathway analysis and spatial domain trajectory analysis. Overall, Novae provides a robust and versatile tool for advancing spatial transcriptomics and its applications in biomedical research.

Novae
2024

Multimodal CustOmics: A Unified and Interpretable Multi-Task Deep Learning Framework for Multimodal Integrative Data Analysis in Oncology

Hakim Benkirane, Maria Vakalopoulou, David Planchard, Julien Adam, Ken Olaussen, Stefan Michiels, Paul-Henry Cournède

Preprint on bioRxiv

Characterizing cancer poses a delicate challenge as it involves deciphering complex biological interactions within the tumor’s microenvironment. Histology images and molecular profiling of tumors are often available in clinical trials and can be leveraged to understand these interactions. However, despite recent advances in representing multimodal data for weakly supervised tasks in the medical domain, numerous challenges persist in achieving a coherent and interpretable fusion of whole slide images and multi-omics data. Each modality operates at distinct biological levels, introducing substantial correlations both between and within data sources. In response to these challenges, we propose a deep-learning-based approach designed to represent multimodal data for precision medicine in a readily interpretable manner. While demonstrating superior performance compared to state-of-the-art methods across multiple test cases, our approach also provides robust results and extracts various scores characterizing the activity of each modality and their interactions at the pathway and gene levels. The strength of our method lies in its capacity to unravel pathway activation through multimodal relationships and extend enrichment analysis to spatial data for supervised tasks. We showcase the efficiency and robustness of its predictive capacity and interpretation scores through an extensive exploration of multiple TCGA datasets and validation cohorts, underscoring its value in advancing our understanding of cancer. The method is publicly available in Github.

Multimodal CustOmics
2023

CustOmics: A versatile deep-learning based strategy for multi-omics integration

Hakim Benkirane, Yoann Pradat, Stefan Michiels, Paul-Henry Cournède

PLoS Computational Biology

The availability of patient cohorts with several types of omics data opens new perspectives for exploring the disease’s underlying biological processes and developing predictive models. It also comes with new challenges in computational biology in terms of integrating high-dimensional and heterogeneous data in a fashion that captures the interrelationships between multiple genes and their functions. Deep learning methods offer promising perspectives for integrating multi-omics data. In this paper, we review the existing integration strategies based on autoencoders and propose a new customizable one whose principle relies on a two-phase approach. In the first phase, we adapt the training to each data source independently before learning cross-modality interactions in the second phase. By taking into account each source’s singularity, we show that this approach succeeds at taking advantage of all the sources more efficiently than other strategies. Moreover, by adapting our architecture to the computation of Shapley additive explanations, our model can provide interpretable results in a multi-source setting. Using multiple omics sources from different TCGA cohorts, we demonstrate the performance of the proposed method for cancer on test cases for several tasks, such as the classification of tumor types and breast cancer subtypes, as well as survival outcome prediction. We show through our experiments the great performances of our architecture on seven different datasets with various sizes and provide some interpretations of the results obtained. Our code is available on (https://github.com/HakimBenkirane/CustOmics).

CustOmics
2022

Hyper-AdaC: adaptive clustering-based hypergraph representation of whole slide images for survival analysis

Hakim Benkirane, Maria Vakalopoulou, Stergios Christodoulidis, Ingrid-Judith Garberis, Stefan Michiels, Paul-Henry Cournède

Machine Learning for Health

The emergence of deep learning in the medical field has popularized the development of models to predict survival outcomes from histopathology images in precision oncology. Graph-based formalism has opened interesting perspectives for generating informative representations, as they can be context-aware and model local and global topological structures in the tumor’s microenvironment. However, the critical issue in using graph representations lies in their generalizability. They can suffer from overfitting due to their large sizes or high discrepancies between nodes due to random sampling from WSI. In addition, standard graph formulations are limited to pairwise interactions, which can sometimes fail to represent the reality observed in histopathology and hinder the interpretability of those interactions. In this work, we present Hyper-AdaC, an adaptive clustering-based hypergraph representation to model high-order correlations among different regions of the WSIs while being compact enough to help graph neural networks generalize in the case of survival prediction. We evaluate our approach on 5 different public available cancer datasets. Our method outperforms most state-of-the-art graph-based methods for survival prediction with WSIs, creating a more efficient and robust alternative to other graph representations. Moreover, due to our formulation, attention maps are depicted at different resolutions depending on the tissue characteristics of each WSI. The code is available at: https://github. com/HakimBenkirane/Hyper-adaC.

Hyper-adaC