Molecular Representation Learning Based on Contrastive Learning By Ali Forooghi

Wednesday, March 25, 2026 - 13:00

The School of Computer Science would like to present… 

Molecular Representation Learning Based on Contrastive Learning

PhD Dissertation Proposal by: Ali Forooghi

Date: Wednesday, 25 March 2026

Time: 1pm – 2:30pm

Location: MH109 (Memorial Hall)

Abstract:

Large language models (LLMs) provide a promising direction for learning molecular representations from text-like inputs, yet most molecular contrastive learning methods still rely on graph- or SMILES-level augmentations that may unintentionally distort chemical structure. We propose MolPACL, a prompt-augmentation-based supervised contrastive learning framework that incorporates high-level chemical semantics while preserving molecular identity. MolPACL generates multiple semantically consistent prompt views for each molecule from its SMILES string and physicochemical descriptors using diverse templates and lightweight lexical perturbations. These views are combined with task-aware class positives and negatives to form contrastive batches, and the model is trained using a supervised objective based on the Soft Nearest Neighbor loss. Experiments on MoleculeNet benchmarks show that the proposed approach achieves strong performance on both classification and regression tasks while reducing training cost, requiring no additional molecular pretraining and using a relatively small pretrained LLM.

Keywords: Contrastive Learning, Molecular Representation

PhD Doctoral Committee:

External Reader: Dr. Mitra Mirhassani

Internal Reader: Dr. Dan Wu

Internal Reader: Dr. Pooya Moradian Zadeh

Advisor(s): Dr. Alioune Ngom, Dr. Luis Rueda

 

Registration Link (For MAC students Only) 

 

Vector institute Logo