The School of Computer Science is pleased to present…
Clustering Numerical Datasets by Clique Partitioning
MSc Thesis Defense by: Jiajie Yang
Date: Tuesday, August 12, 2025
Time: 1:00 pm
Location: Essex Hall, Room 122
The pairwise clustering problems seek an optimal partition of a given set of data points according to their pairwise similarity or distance. The problems have been extensively studied with well-known results in the literature. When the pairwise relationship is expressed in both positive and negative values, the clustering possesses the property that the number of clusters need not to be given, and promising techniques have been explored in this setting to reach provable global optimality efficiently by way of integer linear programming. The techniques are developed for categorical datasets. In this paper, we present a new perspective on problem definitions on numerical datasets, offering a pathway to extend these established exact global optimization methods to the realm of numerical data. This is done by introducing a negation followed by a parameterized origin-translation to the pairwise distances. We present our study on the impact of the translation on the optimal number of clusters and explore the relationship between the clustering outcomes and the underlying characteristics of pairwise similarities within the dataset. The insights gained from the study offer valuable guidance in the informed selection of appropriate clustering techniques and parameter settings.
Program Reader: Dr. Jianguo Lu
Program Reader 2: Dr. Asish Mukhopadhyay
Advisor: Dr. Jessica Chen
Chair: Dr. Peter Tsin