MSc Thesis Proposal of Aly Kassem - Mitigating the Shortcomings of Language Models: Strategies For Handling Memorization & Adversarial Attacks

Monday, July 3, 2023 - 10:30 to 11:30

SCHOOL OF COMPUTER SCIENCE

The School of Computer Science is pleased to present…

Mitigating the Shortcomings of Language Models: Strategies For Handling Memorization & Adversarial Attacks

MSc Thesis Proposal by:

Aly M. Kassem

Date: Monday, July 3rd, 2023

Time: 10.30 AM - 11.30 AM

Location: Essex Hall 122

Reminders: 1. Two-part attendance mandatory (sign-in sheet, QR Code) 2. Arrive 5-10 minutes prior to event starting - LATECOMERS WILL NOT BE ADMITTED. Note that due to demand, if the room has reached capacity, even if you are "early" admission is not guaranteed. 3. Please be respectful of the presenter by NOT knocking on the door for admittance once the door has been closed whether the presentation has begun or not (If the room is at capacity, overflow is not permitted (ie. sitting on floors) as this is a violation of the Fire Safety code). 4. Be respectful of the decision of the advisor/host of the event if you are not given admittance. The School of Computer Science has numerous events occurring soon

Abstract:

Deep learning models have recently achieved remarkable progress in Natural Language Processing (NLP), specifically in classification, question-answering, and machine translation. However, NLP models face challenges related to performance and privacy. Performance-wise, even small perturbations in the input can significantly impact a model's prediction. This highlights the importance of generating natural adversarial attacks to analyze the weaknesses of NLP models and bolster their robustness through adversarial training (AT). Conversely, Large Language Models (LLMs) are trained on vast amounts of data, which may include sensitive information. If exposed, this poses a risk to personal privacy. LLMs have exhibited the ability to memorize portions of their training data and reproduce them verbatim when prompted by adversaries. To address these limitations, we propose an End-to-End framework. This framework employs a proximal policy gradient, a reinforcement learning approach, to learn a self-learned policy. The language model (LM) acts as a policy learner, enabling it to generate attacks in response to adversarial attacks and learn a "Dememorization Privacy Policy" to mitigate the risks associated with memorization. Our results show that our framework has proven effective in generating adversarial attacks and learning a policy to mitigate privacy risks in language models.

Thesis Committee:

Internal Reader: Dr. Luis Rueda

External Reader: Dr. Mitra Mirhassani

Advisor: Dr. Sherif Saad