The School of Computer Science is pleased to present…
Title: Generalization of a Global Optimal Solution for a Clustering Problem in Qualitative Data Analytics
MSc Thesis Proposal by: Jiajie Yang
Date: Friday January 05, 2024
Time: 3:00 pm – 4:30 pm
Location: Essex Hall Room 122
Abstract:
The clustering problems are optimization problems that emerged from various fields, including machine learning, statistics, and image processing. Many techniques like the k-means, DBSCAN, etc. have been developed to tackle the problems, with great success in terms of both accuracy and efficiency. Due to the NP-hardness of the problems, however, most of the popular algorithms suffer from the local-optima phenomenon. Owing to this, we follow the research work on the global optimization problems with the branch and bound scheme so that, keeping the constraints on the computation time, if a provably optimal solution cannot be obtained, we could terminate the search for the global optima prematurely while expecting not only the best solution currently found but also its provable level of closeness to the global optimum. The study on the global optimization of data clustering has greatly advanced along both the quantitative and the qualitative approaches. Defining the relationship between data points in terms of similarity instead of Euclidean distance, the qualitative approach is typically applicable to the datasets with categorical values. A prominent study shows that some clustering problems in this regard can be formulated into the clique partition problem where some classical ILP methods turn out to be quite effective in reaching quality lower bounds. In the present thesis work, we study the limitations of this line of research and seek for possible ways to extend its applicability across a wider range of datasets.
Keywords:
Integer Linear Programming, Global Optimization, Qualitative Data Analytics, Clustering Techniques
Thesis Committee:
Internal Reader: Dr. Jianguo Lu
External Reader: Dr. Yash Aneja
Advisor: Dr. Jessica Chen