The School of Computer Science is pleased to present…
Mitigating The Shortcomings of Language Models: Strategies for Handling Memorization & Adversarial Attacks
MSc Thesis Defense by Aly Kassem
Date: Wednesday, November 1, 2023
Time: 10AM-11AM
Location: Essex Hall Room 105
Abstract:
Deep learning models have recently achieved remarkable progress in Natural Language Processing (NLP), specifically in classification, question-answering, and machine translation. However, NLP models face challenges related to performance and privacy. Performance-wise, even small perturbations in the input can significantly impact a model's prediction. This highlights the importance of generating natural adversarial attacks to analyze the weaknesses of NLP models and bolster their robustness through adversarial training (AT). Conversely, Large Language Models (LLMs) are trained on vast amounts of data, which may include sensitive information. If exposed, this poses a risk to personal privacy. LLMs have exhibited the ability to memorize portions of their training data and reproduce them verbatim when prompted by adversaries. To address these limitations, we propose an End-to-End framework. The framework employs a proximal policy gradient, a reinforcement learning approach, to learn a self-learned policy. The language model (LM) acts as a policy learner, enabling it to generate attacks in response to adversarial attacks and learn a "Dememorization Privacy Policy" to mitigate the risks associated with memorization. Our results show that our framework has proven effective in generating adversarial attacks and learning a policy to mitigate privacy risks in language models, outperforming the state-of-the-art baseline methods in the literature.
Thesis Committee:
Internal Reader: Dr. Luis Rueda
External Reader: Dr. Mitra Mirhassani
Advisor: Dr. Sherif Saad
Chair: Shafaq Khan