MSc Thesis Proposal by: Rajath Devadatta Bharadwaj

Tuesday, May 7, 2024 - 13:30

The School of Computer Science at the University of Windsor is pleased to present …

Emo TalkGen: Evaluating Multimodal Synthesis for Emotionally Expressive Talking Head Generation using Diffusion models

MSc Thesis Proposal by: Rajath Devadatta Bharadwaj

Date: Tuesday, 07 May 2024

Time: 1:30 pm

Location: Essex Hall, Room 122


In the pursuit of advancing the capabilities of talking head generation systems, this thesis proposes a comprehensive architecture designed to synthesize emotionally expressive digital avatars from audio input and a single driver image. The proposed architecture leverages a multimodal approach, combining image codec encoders for facial feature extraction, text encoders for linguistic and emotional content analysis, and audio codec encoders with lip regressors to align speech with lip movements. At the core, is a diffusion model responsible for integrating these diverse inputs to generate a cohesive and emotionally resonant visual output. This research aims to observe and quantify the performance of the proposed architecture in replicating a range of human emotions accurately. By employing a dynamic emotion changer, the architecture is tested for its ability to adapt expressions in real-time, reflecting subtle changes in the emotional undertones of speech. The evaluation focuses not on surpassing existing models but on analyzing the practical application and potential advancements this architecture offers. The outcome of this investigation is expected to contribute significant insights into the viability of such a system for real-world implementation and set a benchmark for future innovations in the field.
Keywords: Diffusion Models, 3D Morphable Models (3DMM), Talking Head, Lip Sync
Thesis Committee:
Internal Reader: Dr. Imran Ahmed
External Reader: Dr. Scott Mundle
Advisor: Dr. Boubaker Boufama
Vector Institute Logo

MAC STUDENTS ONLY - Register here