CERA: Context-Engineered Reviews Architecture for LLM-based Synthetic Dataset Generation
MSc Thesis Defense by:
Kap Thang
Date: April 28th, 2026
Time: 11:00 am – 12:30 pm
Location: Essex Hall 122
Abstract:
Aspect-Based Sentiment Analysis models require large-scale annotated datasets that are scarce, expensive to create, and suffer from class imbalance. While large language models offer promising synthetic data generation, existing approaches lack factual grounding, struggle with the polite phenomenon, and provide limited aspect-level control. This thesis presents CERA (Context-Engineered Reviews Architecture), a training-free three-phase framework for generating realistic, controllable synthetic review text. CERA integrates a Composition Phase with a Subject Intelligence Layer for agentic web-search factual grounding and multi-agent verification, a Generation Phase with configurable polarity balance and demographic-grounded personas, and an Evaluation Phase using multi-dimensional quality assessment. We evaluate CERA across three review domains (laptop, restaurant, hotel) using intrinsic text quality metrics, extrinsic evaluation via latent aspect detection, a factual grounding ablation study, and a user evaluation study. CERA achieves Real-data-level corpus diversity while heuristic prompting collapses, generalizes across domains, and scales to 8,000 reviews with broadly stable semantic fidelity. A factual grounding ablation across six subjects and 360 datasets demonstrates that the Subject Intelligence Layer is essential: the full CERA pipeline achieves 64–86% Factual Score on novel subjects while conditions without it collapse to below 10%. User evaluation (N=50) shows CERA reviews are selected as real 30% of the time in triplet identification (chance level 33%), outperforming Heuristic at 18%.
Keywords: Synthetic Data Generation, Aspect-Based Sentiment Analysis, Large Language Models, Controllable Text Generation
Thesis Committee:
Internal Reader: Dr. Arunita Jaekel
External Reader: Dr. Mahsa Hosseini
Advisor: Dr. Luis Rueda
