MSc Thesis Defense Announcement of Li Zhou:"Classification of Breast Cancer Nottingham Prognostic Index using High-dimensional Embedding and Convolutional Neural Networks"

Tuesday, January 25, 2022 - 09:00 to 11:00

SCHOOL OF COMPUTER SCIENCE

The School of Computer Science is pleased to present…

MSc Thesis Defense by: Li Zhou

Date: Tuesday January 25th, 2022

Time: 9:00 am – 11:00 am

Meeting URL: https://us06web.zoom.us/j/83587658226?from=addon

Passcode: If interested in attending this event, contact the Graduate Secretary at csgradinfo@uwindsor.ca with sufficient notice before the event to obtain the passcode.

Abstract:

Nottingham Prognostics Index (NPI) is a widely-used prognostics measure that predicts operable primary breast cancer survival. NPI value is calculated based on the size of the tumor, the number of lymph nodes, and the tumor’s grade. This work builds a prediction model for multi-class breast cancer NPI classes. Rapid development in next-generation sequencing led to the ability to measure different biological indicators called multi-omics data. The availability of multi-omics data sparked the challenge of integrating and analyzing these various biological measures to understand the progression of the diseases. High-dimensional embedding techniques are used to present the features in the lower dimension, that is a 2-dimensional map. This thesis presents a supervised learning method used to predict breast cancer NPI. The objectives of this research are (i) build a diagnosis system for breast cancer NPI based on multi-omics data; (ii) find gene biomarkers for each NPI class; (iii) build a novel prediction model based on t-distributed stochastic neighbor embedding (t-SNE) and residual neural network (ResNet) to integrate multi-omics data in the classification mechanism.

The dataset consists of three omics: gene expression, copy number alteration (CNA), and mRNA. We assembled our proposed model in two ways, concatenated and merged. In the concatenated approach, we do feature selection, gene similarity network (GSN) template drawing, and coloring using helix color code for each omic before integrate them into the prediction model. While in the merged approach, the gene expression data set is used as a sample data set to draw the GSN template and then color the

template by merging three omics into one GSN template as three elements of the RGB system for each patient.

We evaluated four models in the two assembled approaches, combining the two embedding techniques with the two different classification models. The embedding techniques are t-distributed stochastic neighbor embedding (t-SNE) and self-organizing

iv map (SOM), while the classification models are visual geometry group (VGG) and residual neural network (ResNet). t-SNE combined with ResNet in the concatenated approach outperformed the other methods in both approaches with an accuracy of

98.48% and area under the curve (AUC) equals to 0.9999. The set of genes extracted from the three omics can serve as potential NPI associative biomarkers. The findings in the literature confirm the associations between some of these genes and breast cancer prognosis and survival.

Keywords: classification, data integration, multi-omics data, residual neural network.

MSc Thesis Committee:

Internal Reader: Dr. Jianguo Lu

External Reader: Dr. Huapeng Wu

Co-Advisor: Dr. Luis Rueda

Co-Advisor: Dr. Abedalrhman Alkhateeb

Chair: Dr. Kalyani Selvarajah

MSc Thesis Defense Announcement

5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 csgradinfo@uwindsor.ca (working remotely)