MSc Thesis Defense Announcement of Li Zhou:"Classification of Breast Cancer Nottingham Prognostic Index using High-dimensional Embedding and Convolutional Neural Networks"

Tuesday, January 25, 2022 - 09:00 to 11:00


The School of Computer Science is pleased to present… 

MSc Thesis Defense by: Li Zhou 

Date: Tuesday January 25th, 2022 
Time:  9:00 am – 11:00 am 
Passcode: If interested in attending this event, contact the Graduate Secretary at with sufficient notice before the event to obtain the passcode.


Nottingham Prognostics Index (NPI) is a widely-used prognostics measure that predicts operable primary breast cancer survival. NPI value is calculated based on the size of the tumor, the number of lymph nodes, and the tumor’s grade. This work builds a prediction model for multi-class breast cancer NPI classes. Rapid development in next-generation sequencing led to the ability to measure different biological indicators called multi-omics data. The availability of multi-omics data sparked the challenge of integrating and analyzing these various biological measures to understand the progression of the diseases. High-dimensional embedding techniques are used to present the features in the lower dimension, that is a 2-dimensional map. This thesis presents a supervised learning method used to predict breast cancer NPI. The objectives of this research are (i) build a diagnosis system for breast cancer NPI based on multi-omics data; (ii) find gene biomarkers for each NPI class; (iii) build a novel prediction model based on t-distributed stochastic neighbor embedding (t-SNE) and residual neural network (ResNet) to integrate multi-omics data in the classification mechanism.
The dataset consists of three omics: gene expression, copy number alteration (CNA), and mRNA. We assembled our proposed model in two ways, concatenated and merged. In the concatenated approach, we do feature selection, gene similarity network (GSN) template drawing, and coloring using helix color code for each omic before integrate them into the prediction model. While in the merged approach, the gene expression data set is used as a sample data set to draw the GSN template and then color the
template by merging three omics into one GSN template as three elements of the RGB system for each patient.
We evaluated four models in the two assembled approaches, combining the two embedding techniques with the two different classification models. The embedding techniques are t-distributed stochastic neighbor embedding (t-SNE) and self-organizing
iv map (SOM), while the classification models are visual geometry group (VGG) and residual neural network (ResNet). t-SNE combined with ResNet in the concatenated approach outperformed the other methods in both approaches with an accuracy of
98.48% and area under the curve (AUC) equals to 0.9999. The set of genes extracted from the three omics can serve as potential NPI associative biomarkers. The findings in the literature confirm the associations between some of these genes and breast cancer prognosis and survival.                                                                    
Keywords: classification, data integration, multi-omics data, residual neural network. 

MSc Thesis Committee:  

Internal Reader: Dr. Jianguo Lu                  
External Reader: Dr. Huapeng Wu            
Co-Advisor: Dr. Luis Rueda 
Co-Advisor: Dr. Abedalrhman Alkhateeb 
Chair: Dr. Kalyani Selvarajah 

 MSc Thesis Defense Announcement  Vector Institute artificial intelligence approved artificial intelligence topic   


5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 (working remotely)