MSc Thesis Proposal Announcement of Rayhaan Pirani:"Optimization of Anomaly Detection Time in Large Datasets: A Case Study in Bank Loan Defaults"

Tuesday, April 11, 2023 - 12:00 to 13:30


The School of Computer Science is pleased to present…  

MSc Thesis Proposal by: Rayhaan Pirani 

Date: Tuesday, April 11th, 2023 
Time:  12:00pm – 1:30pm  
Location: Essex Hall, Room 122 
Reminders: 1. Two-part attendance mandatory (sign-in sheet, QR Code) 2. Arrive 5-10 minutes prior to event starting - LATECOMERS WILL NOT BE ADMITTED. Note that due to demand, if the room has reached capacity, even if you are "early" admission is not guaranteed. 3. Please be respectful of the presenter by NOT knocking on the door for admittance once the door has been closed whether the presentation has begun or not (If the room is at capacity, overflow is not permitted (ie. sitting on floors) as this is a violation of the Fire Safety code). 4. Be respectful of the decision of the advisor/host of the event if you are not given admittance. The School of Computer Science has numerous events occurring soon. 



Given the rise in loan defaults, especially after the onset of the COVID-19 pandemic, it is necessary to predict if customers might default on a loan for risk management. In this thesis, we propose an early warning system architecture using anomaly detection based on the unbalanced nature of loan default data in the real world. Most customers do not default on their loans; only a tiny percentage do, resulting in an unbalanced dataset. We aim to evaluate potential anomaly detection methods for their suitability in handling unbalanced datasets. We conduct a comparative study on different classification and anomaly detection approaches on four loan default datasets. We compare these approaches using standard evaluation metrics such as accuracy, precision, recall, F1 score, training and prediction time, and area under the receiver operating characteristic (ROC) curve. We then evaluate various anomaly detection methods on the same datasets and compare the metrics to identify the best detection method that can be incorporated in an early warning system that is versatile, fast, scalable, real-time, probabilistic, and works on unbalanced datasets. This thesis thus aims to cover certain contexts that are lacking in current research, such as the versatility of the best detection approach considering the loan type, evaluation of prediction speed and performance, considering the probabilistic nature of early warning systems over binary classification prediction systems, incorporating anomaly detection over classification methods, and comparing multiple approaches in the same context. 
Keywords: Anomaly Detection, Unbalanced Dataset, Early Warning System, Loan Default. 

MSc Thesis Committee:  

Internal Reader: Dr. Alioune Ngom 
External Reader: Dr. Bharat Maheshwari 
Advisor: Dr. Ziad Kobti 

MSc Thesis Proposal Announcement   Vector Institute, approved artificial intelligence topic logo


5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716