CERA: Context-Engineered Reviews Architecture for Synthetic ABSA Dataset Generation - MSc Thesis Proposal by: Kap Thang

Wednesday, December 17, 2025 - 11:00

The School of Computer Science is pleased to present…

CERA: Context-Engineered Reviews Architecture for Synthetic ABSA Dataset Generation

MSc Thesis Proposal by: Kap Thang
Date: December 17th, 2025
Time:  11:00 am – 12:30 pm
Location: Essex Hall Room 122
Abstract:

Aspect-Based Sentiment Analysis (ABSA) models require large scale annotated datasets that are scarce, expensive to create, and suffer from class imbalance. While Large Language Models (LLMs) offer promising synthetic data generation, existing approaches lack factual grounding, struggle with the “polite phenomenon”—the tendency to generate overly positive content—and provide limited aspect-level control. We propose CERA (Context-Engineered Reviews Architecture), a training-free three-phase framework for generating realistic, controllable synthetic review datasets with implicit aspect structure suitable for semi-supervised ABSA training via pseudo-labeling. CERA integrates: (1) a Composition Phase with a Subject Intelligence Layer for factual grounding, (2) a Generation Phase with configurable polarity balance and temperature control, and (3) an Evaluation Phase using multi-dimensional metrics—assessing lexical quality, semantic similarity, and corpus diversity—to ensure generated reviews meet research standards. Preliminary experiments using LADy-kap demonstrate that LLM-generated synthetic reviews achieve up to 93.2% of real human-annotated dataset performance on implicit aspect detection tasks.

Keywords: Synthetic Data Generation, Aspect-Based Sentiment Analysis, Large Language Models, Controllable Text Generation
 
Thesis Committee:
Internal Reader: Dr. Arunita Jaekel         
External Reader: Dr. Mahsa Hosseini      
Advisor: Dr. Luis Rueda

Vector Institute Logo