Training Vision Transformers for Video Action Recognition PhD. Comprehensive Exam by Jonathan Khalil

Thursday, November 30, 2023 - 10:00

The School of Computer Science would like to present…                                

Training Vision Transformers for Video Action Recognition

PhD. Comprehensive Exam by:  Jonathan Khalil


Date: November 30, 2023

Time: 10:00 am – 11:00 am

Location: Essex Hall, Room 122



This PhD comprehensive exam delves into the realm of Video Action Recognition with a primary focus on harnessing the potential of Vision Transformers (ViTs) as a pivotal deep learning architecture. The research encompasses a thorough exploration of ViTs, emphasizing their ability to model complex spatiotemporal dynamics, and aims to advance the state of the art in video analysis. The foundational motivation lies in the increasing importance of accurate action recognition in various industries such as sports, with applications spanning athlete performance assessment, tactical analysis, and immersive sports content creation. Traditional methods often struggle to capture the intricacies of actions, making ViTs an intriguing candidate for addressing these challenges. This comprehensive exam commences with a comprehensive literature review, providing historical context on action recognition and tracing the evolution of deep learning approaches. It introduces the ViT architecture, highlighting its core components and unique capabilities in handling video data.


Key Words: Vision Transformers (ViTs), Video Action Recognition, Deep Learning, Computer Vision, Human Action Recognition (HAR), Spatiotemporal Dynamics, Attention Mechanisms, Model Training Strategies, Sports Analytics, Sports Video Analysis.


PhD Committee:

Internal Reader: Dr. Sharif Saad

Internal Reader: Dr. Dan Wu

External Reader: Dr. Mohammad Hassanzadeh

(Faculty/department) Faculty of Science / Department of Computer Science
Advisor: Dr. Alioune Ngom