MSc Thesis Defense Announcement of Nazia Siddiqui:"Comparative Study of Generative Models for Text-to-Image Generation "

Thursday, January 19, 2023 - 14:00 to 15:30


The School of Computer Science is pleased to present… 

MSc Thesis Defense by: Nazia Siddiqui 

Date: Thursday, January 19, 2023 
Time: 2:00 pm -3:30 pm 
Location: Essex Hall, Room 122 
Reminder: QR code and sign-in sheet attendance are mandatory.


The development of deep learning algorithms has tremendously helped computer vision applications, image processing methods, Artificial Intelligence, and Natural Language Processing. One such application is image synthesis, which is the creation of new images from text. Recent techniques for text-to-image synthesis offer an intriguing yet straight forward conversion capability from text to image and have become a popular research topic. Synthesis of images from text descriptors has practical and creative applications in computer-aided design, multimodal learning, digital art creation, etc. Non-Fungible Tokens (NFTs) are a form of digital art that is being used as tokens for trading across the globe. Text-to-image generators let anyone with enough creativity can develop digital art, which can be used as NFTs. They can also be beneficial for the development of synthetic datasets. Generative Adversarial Networks (GANs) is a generative model that can generate new data using a training set. Diffusion Models are another type of generative model which can create desired data samples from the noise by adding random noise to the data and then learning to reverse the diffusion process. This thesis compares both models to deter­ mine which is better at producing images that match the given description. We have implemented the Vector-Quantized GAN (VQGAN) + Connecting Text and Image (CLIP) model. It combines the VQGAN and CLIP machine learning techniques to create images from text input. The diffusion model that we have implemented is Guided Language to Image Diffusion for Generation and Editing (GLIDE). For both models, we use text input from the MS-COCO data set. This thesis is an attempt to assess and compare the images generated using text for both models using metrics like Inception Score (IS) and Frechet Inception Distance (FID). The semantic object accuracy score (SOA) is another metric that considers the caption used during the image generation process for analysis. 
Keywords: Text to Image Generation, Generative Models, GAN’s, Diffusion Models 

MSc Thesis Committee:  

Internal Reader: Dr. Boubakeur Boufama      
External Reader: Dr. Mohammed Khalid       
Advisor: Dr. Imran Ahmad 
Chair: Dr. Robin Gras 

MSc Thesis Defense Announcement


Vector Institute in Artificial Intelligence, artificial intelligence approved topic logo



5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716