PhD. Thesis Defense: Privacy Vulnerabilities and Defenses Strategies in Genomic Large Language Model Embeddings: An Analysis of Reconstruction Attacks and Privacy-Preserving Approaches By Reem Al-Saidi

Friday, February 6, 2026 - 11:00

Privacy Vulnerabilities and Defenses Strategies in Genomic Large Language Model Embeddings: An Analysis of Reconstruction Attacks and Privacy-Preserving Approaches

PhD. Thesis Defense by: Reem Al-Saidi

Date: 6th Feb 2026

Time: 11-1 AM

Location: EH 186

Abstract

Transformer-based large language models have enabled powerful approaches to DNA sequence analysis through high-dimensional embeddings. However, these embeddings retain sufficient genetic information to pose serious privacy risks, potentially exposing disease susceptibility, ancestry, or familial relationships. This research work addresses the challenge of balancing utility with genetic privacy against single-nucleotide reconstruction attacks across three directions: Direction 1 Systematic Vulnerability Assessment: we analyze single-nucleotide reconstruction attacks across various transformer architectures from adapted general-purpose models (BERT, GPT-2, RoBERTa, XLNet) to specialized genomic architectures (DNABERT, Nucleotide Transformer, DNAGPT) and hybrid CNN-Transformer systems (Enformer, EpiGePT, SegmentNT). Unlike Pan et al.'s prior work examining only pre-trained embeddings, we evaluate both pre-trained and fine-tuned states, revealing that fine-tuning can improve privacy in certain architectures. Direction 2 Advanced Evaluation Metrics: we introduce novel leakage quantification metrics beyond conventional accuracy measures. Our Error-Based Privacy Gain metric captures nuanced changes in reconstruction difficulty through logarithmic error ratios, while our Pareto-based Privacy-Delta Score enables systematic privacy-utility trade-off assessment across model configurations. Direction 3 Privacy Preservation Mechanisms: we propose Vulnerability-Enhanced Selective Privacy Adaptation (VESPA), a novel position-aware defense framework that selectively applies protection based on model vulnerability, biological relevance, and nucleotide correlations overcoming the limitation in the current exiting privacy preservation protection that apply a uniform protection across all embeddings. VESPA achieves 57–64% reduction in reconstruction attack success while maintaining over 96% downstream task accuracy. This research work provides a comprehensive foundation for secure, responsible use of genomic embeddings in clinical and research settings

PhD Doctoral Committee:

PhD External Examiner: Yifeng Li, Brock University

Internal Reader: Dr. Pooya Moradian Zadeh

 Internal Reader: Dr.Saeed Samet

 External Reader: Dr.Mitra Mirhassani

Advisor (s): Dr.Ziad Kobti

Chair: Dr. Zhenzong Ma, Odette School of Business

Registration Link (only MAC students need to pre-register)