PhD Dissertation Defense by Roohollah Etemadi: "Scalable Algorithms on Large Graphs Based on Sampling"

Tuesday, April 23, 2019 - 13:30 to 16:00

SCHOOL OF COMPUTER SCIENCE

The School of Computer Science at the University of Windsor is pleased to present...

PhD. Dissertation Defense by:

Roohollah Etemadi

Date: Tuesday, April 23, 2019
Time: 1:30 pm - 4:00 pm
Location: 3105, Lambton Tower
 

Abstract: 

Analyzing real-world networks is a computationally intensive task due to the sheer size of the networks. More importantly, direct analysis is impossible when the network data is not entirely accessible. For instance, user networks in Twitter and Facebook are not available for third parties to explore their properties directly. Thus, sampling-based algorithms are indispensable. Deriving unbiased estimators and their error bounds are two important tasks in sampling methods. Existing methods use the properties of original network graphs to construct the confidence interval (CI). However, the true values of such properties are unknown in the sampling scenario. Therefore, practitioners need to use the statistics of sampled data to obtain the error bound of estimation. Furthermore, for biased estimators, they need to quantify, estimate, and correct the bias using statistics of sampled data. This dissertation addresses the CI and bias problems in real-world network analysis. It uses estimations of the number of triangles (Δ) and clustering coefficient (C) as a case study. Our methods can be utilized in other sampling problems. We propose two new methods to estimate Δ based on random edge sampling in both streaming and non-streaming models. The methods outperform the state-of-the-art methods consistently and can be better by orders of magnitude when the graph is very large. More importantly, we proved the improvement ratio analytically and verified our results extensively in real-world networks. The analytical results are achieved by simplifying the variances of the estimators based on the assumption that the graph is very large. We believe that such big data assumption can lead to interesting results not only in triangle estimation but also in other sampling problems. Next, we study the estimation of C in both streaming and non-streaming sampling models. Despite numerous algorithms proposed in this area, the bias and variance of the estimators remain an open problem. We quantify the bias using Taylor expansion and find that the bias can be determined by the structure of the sampled data. Based on the understanding of the bias, we give new estimators that correct the bias. The results are derived analytically and verified in 56 real networks ranging in different sizes and structures. The experiments reveal that the bias ranges widely from data to data. The relative bias can be as high as 4% in non-streaming model and 2% in streaming model or can be negative.

 

Committee members:

External Reader: Dr. Jimmy Huang, School of Information Technology, York University

Outside Department Reader: Dr. Majid Ahmadi, Department of Electrical and Computer

Engineering, University of Windsor

Department Readers: Dr. Dan Wu and Dr. Mehdi Kargar

Advisors: Dr. Jianguo Lu and Dr. Yung H. Tsin              

Chair: Dr. Yuntong Wang, Department of Economics

 

Dissertation Defense Announcement

5113 Lambton tower, 401 Sunset Ave., Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716, csgradinfo@uwindsor.ca

 

(519)253-3000