The University of Windsor is preparing for a safe return to campus. Learn More.

MSc Thesis Proposal Announcement by Qinlan Lu:"A Comparative Study of Word Embedding Algorithms "

Friday, August 7, 2020 - 10:00 to 11:00

SCHOOL OF COMPUTER SCIENCE 

The School of Computer Science is pleased to present… 

MSc Thesis Proposal by: Qinglan Lu 

 
Date: Friday August 7th, 2020 
 
Time:  10:00 AM – 11:00 AM 
 
 
 

Abstract:  

 
One of the trends in Natural Language Processing (NLP) is the use of word embedding. Its aim is to build a low dimensional vector representation of word from a corpus of text. Sikp-Gram with Negative Sampling(SGNS) from Word2Vec and GloVe are two representative word embedding methods. Existing papers have different conclusions on the performance of these two methods. This thesis focused on GloVe and studied its commonalities and differences with SGNS. We explained the similarities between the objective functions of these two models, and shown that the objective of SGNS is similar to the objective of a specialized form of GloVe, although their cost functions are defined differently. Then, we found out which method is more efficient through detailed comparison. To show the impact of selections of hyper-parameters and benchmarks on the performance of both algorithms, we trained both methods on Text8 to tune the hyper-parameters, including vector dimension, window size and Xmax. The trained models are evaluated on 8 word similarity tasks, including WS353, WS353 Similarity, WS353 Relatedness, bruni MEN, Radinsky MTurk, Rare Words, RG and MC, and 2 word analogy tasks, including Google and MSR. Through hyper-parameter tuning, we found that SGNS outperforms GloVe in most tasks. Inspired by the differences between their window styles, we changed the weighting scheme applied in GloVe’s window to that of SGNS. We trained this new method on Text8 with the tuned hyper-parameters, and evaluated on the same tasks. Compared to GloVe, our method has improvement on all the similarity tasks.  
 
Keywords: word embedding, word co-occurrences, GloVe, Word2Vec. 
 
 

Thesis Committee:  

 
Internal Reader: Dr. Jessica Chen              
 
External Reader: Dr. Mehdi S. Monfared                
 
Advisor: Dr. Jianguo Lu 
 
 

MSc Thesis Proposal Announcement  Vector Institute approved Artificial Intelligence topic

5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 csgradinfo@uwindsor.ca