Friday, August 14, 2020 - 14:00 to 15:30
SCHOOL OF COMPUTER SCIENCE
The School of Computer Science is pleased to present…
MSc Thesis Defense by: Yan Gao
Date: August 14th, 2020
Time: 2:00PM – 3:30PM
ZOOM url: https://zoom.us/j/91552395421?
In the community of bibliometrics, author name ambiguity means that author's name is not a reliable identifier for associating academic papers with their authors. Author name ambiguity has been the problem in bibliometrics and service providers like Google Scholar. Author Name Disambiguation (AND) is often tackled using classification techniques, where labelled papers are provided, and papers are assigned to correct authors according to the paper text and paper citations. When applying classification methods on author name disambiguation, there are two issues standing out: one is that a paper has multiple views (paper text and citation network). The second problem is the paucity of training data: there are not many papers that are labelled.
To cope with these two issues, we propose to use co-training in AND. Co-training uses two views to classify papers iteratively and add the top selected papers into the training pool. We demonstrate that co-training outperforms the baseline multi-view classification algorithm. We also experiment with hyper-parameters in the co-training algorithm.
The experiment is done on the PubMed dataset, where authors are labelled with ORCID. Papers are represented by two embeddings that are learnt from paper content and paper citation network separately. Baseline classifiers for comparison are logistic regression and SVM.
Internal Reader: Dr. Christie Ezeife
External Reader: Dr. Sang-Chul Suh
Advisor: Dr. Jianguo Lu
Chair: Dr. Arunita Jaekel
MSc Thesis Defense Announcement
5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 email@example.com