The School of Computer Science is pleased to present…
CERA: Context-Engineered Reviews Architecture for Synthetic ABSA Dataset Generation
Aspect-Based Sentiment Analysis (ABSA) models require large scale annotated datasets that are scarce, expensive to create, and suffer from class imbalance. While Large Language Models (LLMs) offer promising synthetic data generation, existing approaches lack factual grounding, struggle with the “polite phenomenon”—the tendency to generate overly positive content—and provide limited aspect-level control. We propose CERA (Context-Engineered Reviews Architecture), a training-free three-phase framework for generating realistic, controllable synthetic review datasets with implicit aspect structure suitable for semi-supervised ABSA training via pseudo-labeling. CERA integrates: (1) a Composition Phase with a Subject Intelligence Layer for factual grounding, (2) a Generation Phase with configurable polarity balance and temperature control, and (3) an Evaluation Phase using multi-dimensional metrics—assessing lexical quality, semantic similarity, and corpus diversity—to ensure generated reviews meet research standards. Preliminary experiments using LADy-kap demonstrate that LLM-generated synthetic reviews achieve up to 93.2% of real human-annotated dataset performance on implicit aspect detection tasks.
