Monday, February 10, 2020 - 10:00 to 11:00
SCHOOL OF COMPUTER SCIENCE
The School of Computer Science is pleased to present…
MSc Thesis Proposal by: Jayanth Prakash Kulkarni
Date: Monday February 10th, 2020
Time: 10:00 AM – 11:00 AM
Location: Lambton Tower, 3105
Word embedding is the set of techniques to learn a short and dense vector representation of words. The research on word embedding has been the impetus of recent advances in artificial neural networks, and its success has wide impacts in areas such as natural language processing, complex networks, and knowledge representation and discovery. After the seminal work by Yoshuo Bengio, many neural network based algorithms are proposed to improve the performance. One particular approach is that, instead of using individual words, phases are used to train a better model. There are two sub-problems in this approach: one is how to identify the phrases; the other is how to utilize the phrases in a variety of algorithms. For the first problem, Tomas Mikolov uses an empirical formula to identify the phrases. We explore the improvements over this ad-hoc phrase extraction method by experimenting with a variety of metrics introduced in information theory, such as point-wise mutual information (PMI). A naïve application of PMI may be problematic for rare phrases. We will explore the varients of PMI to overcome this. Our method will be evaluated on real data that are collected from the keywords in academic papers.
For the second problem, we will run phrases against a variety of embedding algorithms, including Word2vec and Glove. The performance will be evaluated using standard benchmarks on word similarity and word analogy tasks.
Internal Reader: Dr. Dan Wu
External Reader: Dr. Mohamed Belalia
Advisor: Dr. Jianguo Lu
MSc Thesis Proposal Announcement
5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 firstname.lastname@example.org