Technical Workshop - Synthetic Health Data: A Double-Edged Sword for Privacy and Utility in Genomic (2nd Offering) by: Reem Al-Saidi

Tuesday, May 27, 2025 - 10:00

The School of Computer Science presents...

Synthetic Health Data: A Double-Edged Sword for Privacy and Utility in Genomic (2^nd Offering)

Presenter: Reem Al-Saidi

Date: Tuesday, May 27, 2025

Time: 10:00 am

Location: 4^th Floor (Workshop space) at 300 Ouellette Avenue (School of Computer Science Advanced Computing Hub)

Abstract:

This workshop focuses on the use of synthetic data generation techniques to enable responsible sharing of sensitive genomic and health data. With increasing worries about privacy risks and data misuse, synthetic data has surfaced as a promising approach to advance research while reducing the risks of individual reidentification. Nevertheless, doubts persist regarding the ability of synthetic data to genuinely safeguard privacy, particularly when produced from robust models trained on confidential genomic sequences. The workshop explores state-of-the-art techniques used to generate synthetic genomic and health datasets, including generative adversarial networks (GANs), variational autoencoders (VAEs), and differentially private mechanisms. It also examines the limitations of these methods, particularly concerning rare variant leakage, memorization risks, and utility degradation in downstream analysis such as clustering and classification. Participants will gain practical insight into the privacy–utility trade-off, learn how to assess both privacy leakage and data utility, and understand the legal and ethical frameworks necessary for working with synthetic genomic data.

Workshop Outline:

Part 1: Introduction to Synthetic Genomics and Health Data

Rationale for using synthetic data in genomics and health research
Regulatory pressures and trust considerations

Part 2: Generation Techniques

Generative models: GANs, VAEs, transformers
Differential privacy integration in synthetic generation
Applications to genetic datasets and patient health records

Part 3: Limitations and Privacy Risks

Memorization and reconstruction attacks
Membership inference in synthetic datasets
Representation challenges in diverse genomes

Part 4: Measuring Utility and Privacy

Utility metrics: predictive accuracy, clustering validity, biological relevance
Privacy metrics: distance-based tests, DP bounds, adversarial risk evaluations
Case studies with benchmark datasets

Part 5: Future Outlook

Emerging defenses and privacy audits
Use of synthetic data in regulated environments
Open problems and research directions

Prerequisites:

Foundational understanding of genomic data structures and privacy concerns in health data
Basic familiarity with generative AI models.
Interest or experience in analyzing genomic or electronic health record (EHR) data
Awareness of ethical and legal responsibilities in handling sensitive patient-level data

Biography:

Reem is a Ph.D. student at the University of Windsor in the School of Computer Science. She focuses on applying different privacy and security techniques in AI tools, providing trust and reputation in various AI applications, and assessing bias and fairness in NLP models.

Technical Workshop - Synthetic Health Data: A Double-Edged Sword for Privacy and Utility in Genomic (2nd Offering) by: Reem Al-Saidi

Registration Link (only MAC students need to pre-register)