Monday, December 14, 2020 - 13:00 to 15:00
SCHOOL OF COMPUTER SCIENCE
The School of Computer Science is pleased to present…
MSc Thesis Defense by: Alexandru Filip
Date: Monday December 14, 2020
Time: 1:00 pm – 3:00pm
Zoom URL: https://zoom.us/j/97454548790?
Passcode: If interested in attending this event, please contact the Graduate Secretary at firstname.lastname@example.org for the passcode.
With the increasing quantity of biological data, it is important to develop algorithms that can quickly find patterns in large databases of DNA, RNA and protein sequences. Previous research has been very successful at applying deep learning methods to the problems of motif detection as well as classification of biological sequences. There are, however, limitations to these approaches. Most are limited to finding motifs of a single length. In addition, most research has focused on DNA and RNA, both of which use a four letter alphabet. A few of these have attempted to apply deep learning methods on the larger, twenty letter, alphabet of proteins.
We present an enhanced deep learning model, called DeePSLiM, capable of detecting predictive, short linear motifs (SLiM) in protein sequences. The model is a shallow network that can be trained quickly on large amounts of data. The SLiMs are predictive because they can be used to classify the sequences into their respective families. The model was able to reach scores of 94.5\% on accuracy, precision, recall, F1-Score and Matthews-correlation coefficient, as well as 99.9\% area under the receiver operator characteristic curve (AUROC).
Keywords: Protein, Motif Discovery, Short Linear Motif, Neural Network, Machine learning
Internal Reader: Dr. Asish Mukhopadhyay
External Reader: Dr. Mohamed Belalia
Advisor: Dr. Luis Rueda and Dr. Alioune Ngom
Chair: Dr. Xiaobu Yuan
MSc Thesis Defense Announcement
5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 email@example.com