MSc Thesis Defense by: Zannatul Ferdoush

Wednesday, April 24, 2024 - 13:00

The School of Computer Science is pleased to present…

A biomarker identification model from protein-protein interaction network using natural language processing and graph convolutional network.

MSc Thesis Defense by: Zannatul Ferdoush


Date: Wednesday, 24 Apr 2024

Time:  1:00 PM

Location: Essex Hall, Room 122


A biomarker identification model, integrating natural language processing (NLP) and graph convolutional neural network (GCN), offers a novel approach to address the limitations of a simple neural network's ability to capture the contextual semantics of genes, extract spatial feature information and understand nonlinear complex semantic relations of genes. First, we explore microarray datasets to identify differentially expressed genes (DEGs) and construct a high-confidence protein-protein interaction (PPI) network. By employing Word2Vec, an NLP algorithm, for preprocessing and vectorizing gene ontology (GO) annotations, our model reveals complex biological relationships among genes, enriching our understanding of disease pathogenesis. GO annotations are crucial as they provide comprehensive information about gene functions, biological processes, and cellular components, thus augmenting our understanding of how genes interact within the network. Integrating multi-layered GCNs facilitates effective learning of complex semantic relations and spatial feature information within the PPI network. Experiments on publicly available datasets of Glioblastoma Multiforme (GBM), the most aggressive form of brain tumour, demonstrate that our model significantly enhances biomarker identification compared to existing state-of-the-art methods, showcasing its potential for advancing disease research and clinical decision-making. Survival analysis to explore the relationship between the expression levels of identified biomarkers and GBM patient outcomes further validates our findings. Our study underscores the importance of integrating advanced computational techniques to comprehensively analyze complex diseases like GBM, offering promising avenues for biomarker discovery and therapeutic development.
Keywords: Biomarker, Word2Vec, Graph Convolutional Network, Glioblastoma multiforme
Thesis Committee:
Internal Reader: Dr. Dan Wu
External Reader: Dr. Mir Munir Rahim   
Advisor: Dr. Ziad Kobti
Chair: Dr. Kalyani Selvarajah