MSc Thesis Defense Announcement of Yan Gao:"Author Name Disambiguation using Co-training"

Friday, August 14, 2020 - 14:00 to 15:30



The School of Computer Science is pleased to present… 


MSc Thesis Defense by: Yan Gao 

Date: August 14th, 2020 
Time:  2:00PM – 3:30PM 


In the community of bibliometrics, author name ambiguity means that author's name is not a reliable identifier for associating academic papers with their authors. Author name ambiguity has been the problem in bibliometrics and service providers like Google Scholar. Author Name Disambiguation (AND) is often tackled using classification techniques, where labelled papers are provided, and papers are assigned to correct authors according to the paper text and paper citations.  When applying classification methods on author name disambiguation, there are two issues standing out: one is that a paper has multiple views (paper text and citation network). The second problem is the paucity of training data: there are not many papers that are labelled.  
To cope with these two issues, we propose to use co-training in AND. Co-training uses two views to classify papers iteratively and add the top selected papers into the training pool. We demonstrate that co-training outperforms the baseline multi-view classification algorithm. We also experiment with hyper-parameters in the co-training algorithm. 
The experiment is done on the PubMed dataset, where authors are labelled with ORCID. Papers are represented by two embeddings that are learnt from paper content and paper citation network separately. Baseline classifiers for comparison are logistic regression and SVM. 

Thesis Committee:  

Internal Reader: Dr. Christie Ezeife 
External Reader: Dr. Sang-Chul Suh 
Advisor: Dr. Jianguo Lu 
Chair: Dr. Arunita Jaekel 

MSc Thesis Defense Announcement


5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716