MSc Thesis Defense: Mitigating The Shortcomings of Language Models: Strategies for Handling Memorization & Adversarial Attacks by Aly Kassem

Wednesday, November 1, 2023 - 10:00

The School of Computer Science is pleased to present…

Mitigating The Shortcomings of Language Models: Strategies for Handling Memorization & Adversarial Attacks

MSc Thesis Defense by Aly Kassem

Date: Wednesday, November 1, 2023

Time: 10AM-11AM

Location: Essex Hall Room 105

Abstract:

Deep learning models have recently achieved remarkable progress in Natural Language Processing (NLP), specifically in classification, question-answering, and machine translation. However, NLP models face challenges related to performance and privacy. Performance-wise, even small perturbations in the input can significantly impact a model's prediction. This highlights the importance of generating natural adversarial attacks to analyze the weaknesses of NLP models and bolster their robustness through adversarial training (AT). Conversely, Large Language Models (LLMs) are trained on vast amounts of data, which may include sensitive information. If exposed, this poses a risk to personal privacy. LLMs have exhibited the ability to memorize portions of their training data and reproduce them verbatim when prompted by adversaries. To address these limitations, we propose an End-to-End framework. The framework employs a proximal policy gradient, a reinforcement learning approach, to learn a self-learned policy. The language model (LM) acts as a policy learner, enabling it to generate attacks in response to adversarial attacks and learn a "Dememorization Privacy Policy" to mitigate the risks associated with memorization. Our results show that our framework has proven effective in generating adversarial attacks and learning a policy to mitigate privacy risks in language models, outperforming the state-of-the-art baseline methods in the literature.

Thesis Committee:

Internal Reader: Dr. Luis Rueda

External Reader: Dr. Mitra Mirhassani

Advisor: Dr. Sherif Saad

Chair: Shafaq Khan