The School of Computer Science would like to present…
Training Vision Transformers for Video Action Recognition
PhD. Comprehensive Exam by: Jonathan Khalil
Date: November 30, 2023
Time: 10:00 am – 11:00 am
Location: Essex Hall, Room 122
Abstract
This PhD comprehensive exam delves into the realm of Video Action Recognition with a primary focus on harnessing the potential of Vision Transformers (ViTs) as a pivotal deep learning architecture. The research encompasses a thorough exploration of ViTs, emphasizing their ability to model complex spatiotemporal dynamics, and aims to advance the state of the art in video analysis. The foundational motivation lies in the increasing importance of accurate action recognition in various industries such as sports, with applications spanning athlete performance assessment, tactical analysis, and immersive sports content creation. Traditional methods often struggle to capture the intricacies of actions, making ViTs an intriguing candidate for addressing these challenges. This comprehensive exam commences with a comprehensive literature review, providing historical context on action recognition and tracing the evolution of deep learning approaches. It introduces the ViT architecture, highlighting its core components and unique capabilities in handling video data.
Key Words: Vision Transformers (ViTs), Video Action Recognition, Deep Learning, Computer Vision, Human Action Recognition (HAR), Spatiotemporal Dynamics, Attention Mechanisms, Model Training Strategies, Sports Analytics, Sports Video Analysis.
PhD Committee:
Internal Reader: Dr. Sharif Saad
Internal Reader: Dr. Dan Wu
External Reader: Dr. Mohammad Hassanzadeh