Evaluating Large Language Models to Support Software Engineering Tasks

Thursday, September 11, 2025 - 11:00

The School of Computer Science is pleased to present…

MSc Thesis Proposal by:

Nafisha Binte Moin

Date: Thursday, September 11, 2025

Time: 11:00 am

Location: Erie Hall, Room 2127

Abstract:

Large Language Models (LLMs) are rapidly transforming the software engineering (SE) domain by assisting in tasks such as code generation, bug localization, vulnerability detection, and test generation. While various techniques have been developed to provide task-specific instructions and enhance the performance of LLMs in SE tasks, there is still a lack of systematic evaluation to understand their applicability and limitations. For example, Retrieval-Augmented Generation (RAG) techniques enhance LLM performance by integrating external information through information retrieval systems. However, the impact of retriever selection on LLM performance remains unclear. This research addresses this gap by focusing on the problem of automatic unit test generation using LLMs. Manually creating unit tests is time-consuming, error-prone, and difficult to scale, particularly for dynamically typed languages like Python. Existing static analysis tools (e.g., Pynguin) often struggle to generate fault-revealing or semantically meaningful tests. To tackle these challenges, this study evaluates prompt engineering and RAG techniques for automated unit test generation, comparing the effectiveness of different retrievers. Experimental results show that branch coverage improves significantly (by approximately 30%) when using RAG techniques compared to traditional few-shot prompting, while other metrics such as mutation score, line coverage, and pass rate remain comparable. Additionally, the use of retrieval techniques reduces the number of non-executable test scripts generated by the LLM.

Keywords: Large Language Models, Software Engineering, Unit Test Generation, Retrieval-Augmented Generation, Prompt Engineering

Thesis Committee:

Reader 1: Dr. Jessica Chen

Reader 2: Dr. Dan Wu

Advisor: Dr. Muhammad Asaduzzaman

Vector Logo

Evaluating Large Language Models to Support Software Engineering Tasks - MSc Thesis Proposal by: Nafisha Binte Moin