MSc Thesis Proposal Announcement of Nazia Siddiqui:"Comparison of generative models for Text-to-Image Generation"

Tuesday, August 30, 2022 - 13:00 to 14:30

SCHOOL OF COMPUTER SCIENCE

The School of Computer Science is pleased to present…

MSc Thesis Proposal by: Nazia Siddiqui

Date: Tuesday August 30th 2022

Time: 1:00 PM – 2:30 PM

Meeting URL: https://us06web.zoom.us/j/86333623688?from=addon

Passcode: If interested in attending this event, contact the Graduate Secretary at csgradinfo@uwindsor.ca with sufficient notice before the event to obtain the passcode.

Abstract:

The development of deep learning algorithms has tremendously helped computer vision applications, image processing methods, Artificial Intelligence, and Natural Language Processing. One such application is image synthesis, which is the creation of new images from text. Recent techniques for text-to-image synthesis offer an intriguing yet straight forward conversion capability from text to image and have become a popular research topic. Synthesis of images from text descriptors has practical and creative applications in computer-aided design, multimodal learning, digital art creation, etc. Non-fungible tokens (NFTs) are a form of digital art that is being used as tokens for trading across the globe. Text-to-image generators let anyone with enough creativity can develop digital art, which can be used as NFTs. They can also be beneficial for the development of synthetic datasets. Generative Adversarial Networks (GANs) is a generative model that can generate new data using a training set. Diffusion Models are another type of generative model which can create desired data samples from the noise by adding random noise to the data and then learning to reverse the diffusion process. This thesis compares both models to determine which is better at producing images that match the given description. We have implemented the Vector-Quantized GAN + Connecting Text and Images (CLIP) model. It combines the VQGAN and CLIP machine learning techniques to create images from text input. For both models, we use text input from the MS-COCO dataset. This thesis will attempt to assess the model’s using metrics like Inception Score and Fréchet Inception Distance. The semantic object accuracy score (SOA) is another metric that considers the caption used during the image generation process for analysis. We plan to incorporate more measures for the analysis. Further, we intend to compare the images produced using the same text input by the Guided Language-to-Image Diffusion for Generation and Editing (GLIDE) model, which is a diffusion model.

Keywords: Text to Image Generation, Generative Models, GAN’s, Diffusion Models

MSc Thesis Committee:

Internal Reader: Dr. Boubakeur Boufama

External Reader: Dr. Mohammed Khalid

Advisor: Dr. Imran Ahmed

MSc Thesis Proposal Announcement

5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 csgradinfo@uwindsor.ca