Monday, April 8, 2024 - 13:00
The School of Computer Science is pleased to present…
Self-supervised methods for Video Search and Retrieval Tasks
PhD Dissertation Proposal by: Jonathan Khalil
Date: April 8, 2024
Time: 1:00 pm
Location: Chrysler Hall South, CS53
Abstract:
With the growth of video data on the internet, efficient methods for video search and retrieval have become imperative. By leveraging self-supervised learning, we aim to overcome the limitations of traditional supervised approaches that rely on labeled data, which is often scarce and costly to obtain. This proposal presents self-supervised methods for video search and retrieval tasks.
The first stage of the research introduces a framework that combines a ResNet with Transformer, tailored for zero-shot action recognition (ZSAR). Our proposed framework aims to learn rich visual representations with visual-semantic associations. Through preliminary experiments without pre-training on additional datasets, our proposed model achieves better results over existing methods in ZSAR, achieving 57.2% top-1 accuracy on benchmark datasets including UCF101, HMDB51, ActivityNet.
The next stages of the research will investigate cross-modal self-supervised learning techniques to leverage information from video frames, audio tracks, and text descriptions. Additionally, we will investigate methods for capturing contextual information and temporal dynamics over extended time horizons, allowing the model to understand complex temporal structures and events in videos.
Keywords: Self-Supervised Learning, Video Indexer, Transformer, ResNet, ZSAR.
Thesis Committee:
Internal Reader: Dr. Sherif Saad
Internal Reader: Dr. Dan Wu
External Reader: Dr. Mohammad Hassanzadeh
Advisor: Dr. Alioune Ngom