MSc Thesis Defense Announcement of Narinder Pal Singh: "An Enhancement to CNN Approach with Synthesized Image Data for Disease Subtype Classification "

Friday, August 6, 2021 - 11:00 to 13:00


The School of Computer Science is pleased to present… 

MSc Thesis Defense by: Narinder Pal Singh 

Date: Friday, August 6, 2021 
Time: 11:00 AM to 01:00 PM 
Passcode:If you are interested in attending this event contact the Graduate Secretary at


The introduction of genetic testing has profoundly enhanced the prospects of early detection of diseases and techniques to suggest precision medicines. The subtyping of critical diseases has proven to be an essential part of the development of individualized therapies and has led to deeper insights into the heterogeneity of the disease. Studies suggest that variants in particular genes have significant effects on certain types of immune system cells and are also involved in the risk of certain critical illnesses like cancer. By analyzing the genetic sequence of a patient, disease types and subtypes can be predicted. Recent research work has shown that the CNN's prediction quality within this context using gene intensity features could be improved when the input is structured into 2D images.  
Constructed from chromosome locations or from transformations involving kPCA, t-SNE, etc., these two-dimensional images express certain types of relationships among the intensity features. While this approach extends the success of convolutional neural networks to non-image data, getting a precise mapping of features on the images to reflect the relationship among the features is hard, if not impossible. To this end, we propose an enhancement to the approach by providing the CNN training procedure with not only the samples of the structured image data but also the samples from the unstructured raw gene expression data in its original form. While the former is fed into the convolutional layers in the network, the latter is input only to the fully connected layers of the network. The proposed method is applied to The Cancer Genome Atlas (TCGA) dataset for cancer subtypes with the median values of the expression level of all expressed genes in an RNA sequence. According to the experiments, our proposed approach can improve the classification accuracy by 2.7% when it is applied to the state-of-the-art method with 2D CNN architecture trained using images that are constructed based on chromosome locations of the genes. When built on top of the method with 2D CNN architecture trained using images that are constructed with transformation process involving t-SNE, classification accuracy is enhanced by 4.7%. For the implementation of the proposed approach on the 1D CNN model using the data structured using covariance between the features, the classification accuracy is improved by 1% and an increase of 3% is observed when the approach is implemented over the model trained using 1D CNN with data ordered based on chromosome locations. 
Keywords: Convolutional neural networks, cancer subtype classification, RNA Seq, precision medicine  

MSc Thesis Committee: 

Internal Reader: Dr. Ahmad Biniaz   
External Reader: Dr. Balakumar Balasingam
Advisor: Dr. Jessica Chen
Chair: Dr. Asish Mukhopadhyay

 MSc Thesis Defense Announcement  


5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 (working remotely)