The School of Computer Science would like to present…
Evaluation of LLM-Generated Code
PhD. Comprehensive Exam by: Salma Aly
Date: April 20th, 2026
Time: 10:30 AM – 12:30 PM
Location: MS Teams.
Abstract: Large Language Models (LLMs) have become very efficient in generation tasks such as code generation. However, evaluating whether the generated code is correct remains a challenge. In this presentation, we will address the growing challenge of evaluating the correctness of code generated by LLMs, where existing assessment methods often provide incomplete or misleading signals. Despite widespread reliance on benchmarks and metrics such as execution-based testing, these approaches primarily capture limited aspects of correctness and does not reflect properties required for reliable deployment in real-world software systems. Hence, we will discuss the current evaluation landscape through a systematic analysis of recent literature. It will present a multi-dimensional view of code correctness, analyze how existing benchmarks and metrics align with these dimensions, and identify key limitations and trade-offs across evaluation metrics. Finally, we will be discussing the implications of these findings for future research and the need for more comprehensive and context-aware evaluation strategies.
Keywords: Large Language Models (LLMs); Code Generation; Code Evaluation.
PhD Doctoral Committee:
External Reader: Kevin Granville
Internal Reader: Muhammad Asaduzzaman
Internal Reader: Jianguo Lu
Advisor(s): Ziad Kobti, Hussein Assaf
Microsoft Teams meeting
Meeting ID: 235 146 845 288 992
Passcode: 5jY7wD2r
