PhD Dissertation Proposal by: Jonathan Khalil

Monday, April 8, 2024 - 13:00

The School of Computer Science is pleased to present…

Self-supervised methods for Video Search and Retrieval Tasks

PhD Dissertation Proposal by: Jonathan Khalil

Date: April 8, 2024

Time: 1:00 pm

Location: Chrysler Hall South, CS53

Abstract:

With the growth of video data on the internet, efficient methods for video search and retrieval have become imperative. By leveraging self-supervised learning, we aim to overcome the limitations of traditional supervised approaches that rely on labeled data, which is often scarce and costly to obtain. This proposal presents self-supervised methods for video search and retrieval tasks.

The first stage of the research introduces a framework that combines a ResNet with Transformer, tailored for zero-shot action recognition (ZSAR). Our proposed framework aims to learn rich visual representations with visual-semantic associations. Through preliminary experiments without pre-training on additional datasets, our proposed model achieves better results over existing methods in ZSAR, achieving 57.2% top-1 accuracy on benchmark datasets including UCF101, HMDB51, ActivityNet.

The next stages of the research will investigate cross-modal self-supervised learning techniques to leverage information from video frames, audio tracks, and text descriptions. Additionally, we will investigate methods for capturing contextual information and temporal dynamics over extended time horizons, allowing the model to understand complex temporal structures and events in videos.

Keywords: Self-Supervised Learning, Video Indexer, Transformer, ResNet, ZSAR.

Thesis Committee:

Internal Reader: Dr. Sherif Saad

Internal Reader: Dr. Dan Wu

External Reader: Dr. Mohammad Hassanzadeh

Advisor: Dr. Alioune Ngom

Vector Logo

MAC STUDENTS ONLY - Register here