MSc Thesis Defense Announcement of Qinglan Lu:"Improve GloVe Word Embedding Using Linear Weighting Scheme for Word Similarity Tasks "

Thursday, May 13, 2021 - 13:00 to 15:00

SCHOOL OF COMPUTER SCIENCE 

The School of Computer Science is pleased to present… 

MSc Thesis Defense by: Qinglan Lu 

 
Date: Thursday, May 13th, 2021 
Time:  1:00pm – 3:00 pm 
Passcode: If interested in attending the event, contact the Graduate Secretary at csgradinfo@uwindsor.ca
 

Abstract:  

One of the trends in Natural Language Processing (NLP) is the use of word embedding. Its aim is to build a low dimensional vector representation of words from text corpora. GloVe (Global Vectors for Word Representation) and SGNS (Sikp-Gram with Negative Sampling) are two representative word embedding methods. Existing papers have different conclusions on the performance of these two methods. This thesis focuses on GloVe and studies its commonalities and differences with SGNS. 
 
Word co-occurrence is the cornerstone of all word embedding algorithms. One difference between GloVe and SGNS is the definition of co-occurrence.  The weight of co-occurring words tapers off with the distance between them. GloVe and SGNS adopts different weighting schemes. In SGNS, weight decreases linearly with the distance. In Glove, the weight decreases harmonically, giving less weight to the words in the center of the window. We propose GloVe_L (Glove Linear), by changing the weighting scheme to the linear weighting. We found that GloVe_L outperforms Glove consistently in word similarity tasks. The conclusion is supported by extensive experiments on 8 Word evaluation benchmarks on Wikipedia training corpus.  The thesis also explores the impact of hyper-parameters on the result, including window size and Xmax in GloVe. Another interesting observation is that Glove_L does not work well for word analogy tasks.   
 
Keywords: word co-occurrence, GloVe, Word2Vec, word embedding 
 

MSc Thesis Committee 

Internal Reader: Jessica Chen      
External Reader: Mehdi S. Monfared  
Advisor: Jianguo Lu  
Chair: Arunita Jaekel    
 
 
 

 MSc Thesis Defense Announcement   Vector Institute approved artificial intelligence topic logo

 
5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 csgradinfo@uwindsor.ca