Friday, November 6, 2020 - 14:00 to 17:00
SCHOOL OF COMPUTER SCIENCE
The School of Computer Science is pleased to present…
PhD Dissertation Defense by: Fen Zhao
Date: Friday November 6th, 2020
Time: 2:00 pm – 5:00 pm
Zoom URL: https://zoom.us/j/92559237179?
Passcode: If interested in attending the event, contact the Graduate Secretary at firstname.lastname@example.org to request the passcode.
Academic networks are derived from scholarly data. They are heterogeneous in the sense that different types of nodes are involved, such as papers and authors. This dissertation studies such heterogeneous networks for learning vector representations of authors and measuring the academic influence.
The success of Skip-Gram Negative Sampling (SGNS) has been developed for language modeling and extended to learn embeddings from networks. Following this trend, heterogeneous network embedding algorithms are also mostly derived from SGNS, with additional restrictions on Random Walk patterns that are commonly called MetaPath. MetaPath based Random Walks produce traces of mixed node types, and different node types are projected into one single low-dimensional space. We propose that different types of nodes should be projected into different spaces. More specifically, we conduct Random Walks to generate traces that contain nodes of mixed types, then further separate the traces into different layers so that each layer contains the nodes of one type only. Such stratification improves embeddings that are derived from the mixed traces by a large margin. Our proposed algorithm called Stratified Embedding for Heterogeneous Networks (SEHN), the stratified version of Metapath2vec, improves the state-of-the-art Metapath2vec up to 24%. The efficacy of stratification is also demonstrated on two classic network embedding algorithms DeepWalk and Node2vec. The result is validated in two heterogeneous academic networks. We also demonstrate that SEHN outperforms the embedding of homogeneous author networks that are induced from their corresponding heterogeneous networks.
Random Walk technique is also widely used to measure the node importance in a network. On academic data, the influence has been traditionally measured by the citation count and metrics derived from it. PageRank, a Random Walk based ranking algorithm, has been used to give higher weight to citations from more influential papers on paper citation networks. A better metric is to add authors into the citation network so that the importance of authors and papers are evaluated recursively within the same framework. This dissertation proposes a new weighted heterogeneous author paper network, which contains both citation relations and authorship relations. Our method can eliminate the mutual citation issue of the paper citation network, as well as the self-citation issue of the author citation network. Tested on two large networks, we find that our method outperforms the other 10 methods in terms of the number of award winners among top-ranked authors. Specifically, our method improves citation ranking up to 104.29%.
Keywords: Heterogeneous Network, Network Embeddings, PageRank, Author Ranking
Internal Reader: Dr. Arunita Jaekel
Internal Reader: Dr. Yung H. Tsin
External Reader: Dr. Abdulkadir Hussein (Mathematics and Statistics)
External Examiner: Dr. Shengrui Wang (Université de Sherbrooke)
Advisor: Dr. Jianguo Lu
Chair: Dr. Ali Polat
PhD Dissertation Defense Announcement
5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 email@example.com