Advanced Machine Learning Models for Analyzing Single-cell RNA-Sequencing Data
Date: Monday, February 12th,2024
Time: 10:00 am - 1:00 pm
Location: Essex Hall 122
The advent of high-throughput scRNA-seq technologies has enabled the study of individual cells and their biological mechanisms. Traditional clustering methods, commonly employed in scRNA-seq data analysis for identifying cell types, face challenges due to the sparsity and high dimensionality of the data. To overcome these limitations, we propose an integrated approach that combines non-linear dimensionality reduction techniques with clustering algorithms.
Our method involves the use of modified locally linear embedding in conjunction with independent component analysis to identify representative clusters of different cell types. We evaluate the performance of this approach across thirteen publicly available scRNA-seq datasets, encompassing various tissues, sizes, and technologies. Gene set enrichment analysis further confirms the effectiveness of our method, demonstrating superior performance compared to existing unsupervised methods across diverse datasets.
Also, we investigate Neural Network-based methods combined with self-organizing maps, feature selection approaches for informative marker gene selection in sparse datasets, as well as supervised techniques, to overcome the high-dimensionality and sparsity of scRNA-seq datasets in cell type identification.
Building on the foundation of identifying cell types, we extend our investigation to intercellular signalling networks. Recognizing the limitations of existing link prediction approaches based on graph-structured data, we introduce a novel method named Subgraph Embedding of Gene expression matrix for prediction of CEll-cell Communication (SEGCECO). SEGCECO utilizes an attributed graph convolutional neural network to predict cell-cell communication from scRNA-seq data.
Overcoming challenges associated with high-dimensional and sparse scRNA-seq data, we employ SoptSC, a similarity-based optimization method, to construct a cell-cell communication network. Our experiments on six datasets from human and mouse pancreas tissue reveal that SEGCECO outperforms latent feature-based approaches and the state-of-the-art link prediction method, WLNM, achieving a remarkable 0.99 ROC and 99% prediction accuracy.
In summary, our approach, spanning the identification of cell types and the prediction of cell-cell communication, leverages advanced techniques to enhance the analysis of scRNA-seq data. This research contributes to the comprehensive understanding of disease modules and intercellular signalling networks, paving the way for more accurate and insightful investigations in the field of single-cell genomics.
single-cell transcriptomics, cellular interaction, graph convolutional network, sub-graph embedding, and cell-type identification.
Internal Reader: Dr. Dima Alhadidi
Internal Reader: Dr. Saeed Samet
External Reader: Dr. Esam Abdel-Raheem
External Examiner: Dr. Yifeng Li
Advisor(s): Dr. Luis Rueda