Deep Multi-View Network for Protein Sequence Classification
MSc Thesis Proposal by: Jaber Al Siam
Date: Wednesday, 28th January 2026
Time: 1:00 PM
Location: Essex Hall 122
Abstract:
Protein family classification is a fundamental problem in computational biology with applications in protein function annotation, structural inference, and drug discovery. While recent deep learning approaches have shown promising results, they often rely on single-view sequence representations, limiting their ability to capture complementary biological signals such as physicochemical properties, contextual semantics, and interaction information.
This thesis investigates a multi-view deep learning framework for protein sequence classification that integrates heterogeneous protein representations. The proposed approach combines physicochemical time-series encoded using Gramian Angular Fields, frequency-based geometric representations of amino acid composition, contextual embeddings learned from protein language models, and graph-based embeddings derived from protein–protein interaction networks. Each representation is processed through a dedicated neural branch, and the resulting features are fused to enable joint learning across multiple biological views.
The objective of this work is to systematically analyze how different protein representations contribute to classification performance and robustness. The proposed framework will be evaluated on large-scale PDB-derived datasets using cross-validation and ablation studies to assess the contribution of each view.
Thesis Committee:
Reader 1: Dr. Jessica Chen
Reader 2: Dr. Ikjot Saini
Advisor: Dr. Alioune Ngom