The School of Computer Science at the University of Windsor is pleased to present …
Summary Augmenter: a Text Augmentation Framework to Improve Summarization Quality
PhD. Seminar by: Ala Alam Falaki
Date: Friday December 1st, 2023
Time: 12:00 PM to 1:00 PM
Location: Essex Hall Room 105
Data augmentation in Natural Language Processing (NLP) faces various challenges that hinder its widespread adoption, unlike its ever-present usage in the field of vision. It is even more the case for the text summarization task where one should focus on both article and summary. We review the effect of back translation augmentation, present the diverse beam search decoding strategy, and masking as a method to generate synthetic data for text summarization. The approaches is evaluated by ROUGE score, novelty, and summary length to analyze their effectiveness. Our proposed framework is based on multiple combinations of back translation and masking for articles, along with diverse augmentation for summaries. Although applicable to networks of any size, we decided to use BART-large, a relatively small model, in order to conduct a larger number of experiments. The experiments demonstrated superior performance across all specified metrics when compared to fine-tuning BART-large on the CNN/Dailymail dataset. Specifically, we showed a significant improvement in novelty; 158% and 56% increase rate for bigrams and unigrams, respectively. Our approach could reduce some copyright concerns caused by generated content similar to human writing.
Keywords: Data Augmentation; Automatic Text Summarization; Natural Language Processing.
PhD Doctoral Committee:
Internal Reader: Dr. Luis Rueda
Internal Reader: Dr. Dan Wu
External Reader: Dr. Jonathan Wu
Advisor (s): Dr. Robin Gras