MSc Thesis Defense Announcement of Jayanth Prakash Kulkarni:"Multi-Word Terminology Extraction and Its Role in Document Embedding"

Friday, January 8, 2021 - 11:00 to 13:00


The School of Computer Science is pleased to present… 

MSc Thesis Defense by: Jayanth Prakash Kulkarni 

Date: Friday January 8th, 2021 
Time:  11:00 am – 1:00 pm 
Passcode: If interested in attending the event, contact the Graduate Secretary at for the passcode.


Automated terminology extraction is a crucial task in natural language processing and ontology construction. Termhood can be inferred using linguistic and statistic techniques. This thesis focuses on the statistic methods. Inspired by feature selection techniques in documents classification, we experiment with a variety of metrics including PMI (point-wise mutual information), MI (mutual information), and Chi-square. We find that PMI is in favor of identifying top keywords in a domain, but MI can recognize more keywords overall. Based on this observation, we propose a hybrid approach, called HMI, that combines the best of PMI and MI. HMI outperforms both PMI and MI. The result is verified by comparing overlapping between the extracted keywords and the author-identified keywords in arXiv data. When the corpora are computer science and physics papers, the top-100 hit rate can reach 0.96 for HMI. 
We also demonstrate that terminologies can improve documents embeddings. In this experiment, we treat machine-identified multi-word terminologies with one word. Then we use the transformed text as input for the document embedding. Compared with the representations learnt from unigrams only, we observe a performance improvement over 5.67% for F1 score in arXiv data on document classification tasks. 
Keywords: Terminology extraction, document embedding, pointwise mutual information, mutual information, chi-squared

Thesis Committee:  

Internal Reader: Dr. Dan Wu        
External Reader: Dr. Mohamed Belalia    
Advisor: Dr. Jianguo Lu 
Chair: Dr. Saeed Samet   

 MSc Thesis Defense Announcement  Vector Institute in Artificial Intelligence artifical intelligence approved logo


5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716