PhD. Comprehensive Exam: Evaluation of LLM-Generated Code by Salma Aly

Monday, April 20, 2026 - 10:30

The School of Computer Science would like to present…

Evaluation of LLM-Generated Code

PhD. Comprehensive Exam by: Salma Aly

Date: April 20th, 2026

Time: 10:30 AM – 12:30 PM

Location: MS Teams.

Abstract: Large Language Models (LLMs) have become very efficient in generation tasks such as code generation. However, evaluating whether the generated code is correct remains a challenge. In this presentation, we will address the growing challenge of evaluating the correctness of code generated by LLMs, where existing assessment methods often provide incomplete or misleading signals. Despite widespread reliance on benchmarks and metrics such as execution-based testing, these approaches primarily capture limited aspects of correctness and does not reflect properties required for reliable deployment in real-world software systems. Hence, we will discuss the current evaluation landscape through a systematic analysis of recent literature. It will present a multi-dimensional view of code correctness, analyze how existing benchmarks and metrics align with these dimensions, and identify key limitations and trade-offs across evaluation metrics. Finally, we will be discussing the implications of these findings for future research and the need for more comprehensive and context-aware evaluation strategies.

Keywords: Large Language Models (LLMs); Code Generation; Code Evaluation.

PhD Doctoral Committee:

External Reader: Kevin Granville

Internal Reader: Muhammad Asaduzzaman

Internal Reader: Jianguo Lu

Advisor(s): Ziad Kobti, Hussein Assaf

Microsoft Teams meeting

Join: https://teams.microsoft.com/meet/235146845288992?p=d32X8fX8xT3K9DTLqj

Meeting ID: 235 146 845 288 992

Passcode: 5jY7wD2r