Evaluating Large Language Models to Support Software Engineering Tasks MSc Thesis Defense by: Nafisha Binte Moin

Monday, January 19, 2026 - 13:00

The School of Computer Science is pleased to present…

Evaluating Large Language Models to Support Software Engineering Tasks

MSc Thesis Defense by: Nafisha Binte Moin

Date: 19th January, 2026

Time: 1:00 pm

Location: Lambton Tower Room 3105

Abstract: Large Language Models (LLMs) are increasingly used in software engineering (SE) tasks such as test generation and commit analysis, yet questions remain regarding the effectiveness of retrieval-augmented generation (RAG) techniques and the reliability of LLM outputs in practice. This research addresses these gaps through two complementary studies on automated unit test generation and bug-fix commit annotation. First, we study LLM-based unit test generation, evaluating four prompt strategies and integrating sparse (BM25, BM25L) and dense (SBERT-based FAISS, LSH, ANNOY, and HNSW) retrievers within a RAG framework. Results show that a few-shot instructional prompt achieves the highest correctness (99% pass rate) and branch coverage (72.56%). RAG further improves test robustness and diversity, with dense retrievers, particularly SBERT with HNSW, consistently outperforming sparse approaches. Compared to Pynguin, LLM-generated tests are more executable, structured, and semantically meaningful. Second, we evaluate LLMs as automated annotators for bug-fix commit identification across six GitHub repositories comprising over 23,000 commits. Experiments with GPT-4o and Claude 4.5 Sonnet under zero-shot, few-shot, and RAG configurations show that RAG-enhanced models achieve the best performance, with F1-scores between 0.70 and 0.85 and recall up to 0.95, substantially outperforming keyword-based heuristics. Overall, our results demonstrate that RAG-enhanced LLMs offer an effective and scalable solution for improving both unit test generation and bug-fix commit annotation in real-world SE settings. Datasets and implementations are publicly available.

Keywords: Large Language Models, Software Engineering, Unit Test Generation, Retrieval-Augmented Generation, Prompt Engineering

Thesis Committee:

Defense Chair: Dr. Ikjot Saini

Reader 1: Dr. Jessica Chen

Reader 2: Dr. Dan Wu

Advisor: Dr. Muhammad Asaduzzaman

Evaluating Large Language Models to Support Software Engineering Tasks MSc Thesis Defense by: Nafisha Binte Moin

The School of Computer Science is pleased to present…

Registration Link (only MAC students need to pre-register)