MSc Thesis Defense by Zeeshan Mansoor: Improving Document Embedding Using Retrofitting

Tuesday, April 16, 2019 - 10:00 to 12:00

SCHOOL OF COMPUTER SCIENCE

The School of Computer Science at the University of Windsor is pleased to present …

Improving Document Embedding Using Retrofitting

 

MSc Thesis Defense by:

Zeeshan Mansoor

 

Date:  Tuesday, April 16th, 2019

Time:  10:00 am – 12:00 pm

Location: 3105, Lambton Tower 

 

Abstract: 

Data-driven learning of document vectors that capture linkage between them is of immense importance in natural language processing (NLP) for downstream tasks like information retrieval, classification, and clustering. Inherently, documents are linked together in the form of links or citations in case of web pages or academic papers respectively. Methods like PV-DBOW try to capture the semantic representation of the document using only the textual information and ignore the linkage information altogether. Similarly, methods developed for network representation learning like Node2Vec capture the linkage information between the documents but they ignore the textual information. In this thesis, we proposed a method based on Retrofit for learning word embeddings using a semantic lexicon. Our approach tries to incorporate both the textual and network information together while learning the document representation. Our experimentation result shows that our method improves the classification score by 4% and we analyze the best weight for adding network information to the embeddings. Furthermore, we also introduce a new dataset containing both network and content information. 
 
 

Thesis Committee:

Internal Reader: Dr. Ziad Kobti   
External Reader: Dr. Guoqing Zhang
Advisor: Dr. Jianguo Lu  
Chair: Dr. Robin Gras
 

Thesis Defense Announcement

 
5113 Lambton Tower, 401 Sunset Avenue, Windsor ON, N9B 3P4, (519)253-3000, Ext. 3716 csgradinfo@uwindsor.ca
(519)253-3000