MSc Thesis Proposal: HLFEval: A Hybrid LLM-Based Framework for Sentence-Level Evaluation of LLM Text Responses by Nehme Haidura

Friday, May 22, 2026 - 14:30

HLFEval: A Hybrid LLM-Based Framework for Sentence-Level Evaluation of LLM Text Responses

MSc Thesis Proposal by:

Nehme Haidura

Date: Friday, May 22nd, 2026

Time: 2:30 PM

Location: Microsoft Teams

Join: https://teams.microsoft.com/meet/260674284509541?p=dHs6lwD0WemQWxeH15

Meeting ID: 260 674 284 509 541

Passcode: Uh7PK2rR

Abstract:

Large Language Model (LLM) evaluation remains a challenging problem, as existing evaluation methods often focus on isolated aspects of response quality while overlooking important dimensions such as fluency, safety, and factuality. In this thesis, we propose HLFEval, a hybrid LLM-based framework for sentence-level evaluation of generated responses across four complementary dimensions: semantic relevance, fluency, safety, and factuality. The framework combines established evaluation methods, including GPT-2 perplexity for fluency, detoxify for safety, and an ensemble of open-source LLM judges for factuality, into a unified composite scoring system using learned weights. In addition, semantic relevance is measured using a novel LLM-assisted token-level scoring method that decomposes text into linguistic categories and constructs similarity heatmaps between reference and candidate responses, providing an alternative to BERTScore. The framework is evaluated across 12 non-overlapping batches under both independent and sequential training settings. Experimental results demonstrate stable convergence and consistent improvements under sequential training, while maintaining competitive performance under independent training. These findings highlight the potential of HLFEval as a transparent and adaptive framework for multidimensional evaluation of generated responses.

Thesis Committee:

Internal Reader: Dr. Jianguo Lu

External Reader: Dr. Esam Abdel-Raheem

Advisor(s): Dr. Ziad Kobti, Dr. Hussein Assaf

Vector institute Logo