MSc Thesis Proposal Announcement of Nazia Siddiqui:"Comparison of generative models for Text-to-Image Generation"

Tuesday, August 30, 2022 - 13:00 to 14:30


The School of Computer Science is pleased to present… 

MSc Thesis Proposal by: Nazia Siddiqui    

Date: Tuesday August 30th 2022 
Time:  1:00 PM – 2:30 PM  
Passcode: If interested in attending this event, contact the Graduate Secretary at with sufficient notice before the event to obtain the passcode.


The development of deep learning algorithms has tremendously helped computer vision applications, image processing methods, Artificial Intelligence, and Natural Language Processing. One such application is image synthesis, which is the creation of new images from text. Recent techniques for text-to-image synthesis offer an intriguing yet straight forward conversion capability from text to image and have become a popular research topic. Synthesis of images from text descriptors has practical and creative applications in computer-aided design, multimodal learning, digital art creation, etc. Non-fungible tokens (NFTs) are a form of digital art that is being used as tokens for trading across the globe. Text-to-image generators let anyone with enough creativity can develop digital art, which can be used as NFTs. They can also be beneficial for the development of synthetic datasets. Generative Adversarial Networks (GANs) is a generative model that can generate new data using a training set. Diffusion Models are another type of generative model which can create desired data samples from the noise by adding random noise to the data and then learning to reverse the diffusion process. This thesis compares both models to determine which is better at producing images that match the given description. We have implemented the Vector-Quantized GAN + Connecting Text and Images (CLIP) model. It combines the VQGAN and CLIP machine learning techniques to create images from text input. For both models, we use text input from the MS-COCO dataset. This thesis will attempt to assess the model’s using metrics like Inception Score and Fréchet Inception Distance. The semantic object accuracy score (SOA) is another metric that considers the caption used during the image generation process for analysis. We plan to incorporate more measures for the analysis. Further, we intend to compare the images produced using the same text input by the Guided Language-to-Image Diffusion for Generation and Editing (GLIDE) model, which is a diffusion model. 
Keywords: Text to Image Generation, Generative Models, GAN’s, Diffusion Models 

MSc Thesis Committee:  

Internal Reader: Dr. Boubakeur Boufama 
External Reader: Dr. Mohammed Khalid  
Advisor: Dr. Imran Ahmed 

 MSc Thesis Proposal Announcement

Vector Institute in Artificial Intelligence, artificial intelligence approved topic logo

5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716