The School of Computer Science is pleased to present…
Evaluating Large Language Models to Support Software Engineering Tasks
MSc Thesis Proposal by:
Nafisha Binte Moin
Date: Thursday, September 11, 2025
Time: 11:00 am
Location: Erie Hall, Room 2127
Large Language Models (LLMs) are rapidly transforming the software engineering (SE) domain by assisting in tasks such as code generation, bug localization, vulnerability detection, and test generation. While various techniques have been developed to provide task-specific instructions and enhance the performance of LLMs in SE tasks, there is still a lack of systematic evaluation to understand their applicability and limitations. For example, Retrieval-Augmented Generation (RAG) techniques enhance LLM performance by integrating external information through information retrieval systems. However, the impact of retriever selection on LLM performance remains unclear. This research addresses this gap by focusing on the problem of automatic unit test generation using LLMs. Manually creating unit tests is time-consuming, error-prone, and difficult to scale, particularly for dynamically typed languages like Python. Existing static analysis tools (e.g., Pynguin) often struggle to generate fault-revealing or semantically meaningful tests. To tackle these challenges, this study evaluates prompt engineering and RAG techniques for automated unit test generation, comparing the effectiveness of different retrievers. Experimental results show that branch coverage improves significantly (by approximately 30%) when using RAG techniques compared to traditional few-shot prompting, while other metrics such as mutation score, line coverage, and pass rate remain comparable. Additionally, the use of retrieval techniques reduces the number of non-executable test scripts generated by the LLM.
Reader 1: Dr. Jessica Chen
Reader 2: Dr. Dan Wu
Advisor: Dr. Muhammad Asaduzzaman
Registration Link (Only MAC students need to pre-register)