PhD Seminar " Can Large Language Models Understand Molecules?" by: Shaghayegh Sadeghi

Wednesday, August 7, 2024 - 10:30

The School of Computer Science at the University of Windsor is pleased to present …

Can large language models understand molecules?

 

PhD. Seminar by:

Shaghayegh Sadeghi

 

Date: Wednesday, August 7th, 2024

Time: 10:30 AM

Location: Odette Building, Room B03

 

Abstract:
Large Language Models (LLMs) like OpenAI's GPT and Meta AI's Llama are gaining recognition in cheminformatics, particularly for understanding Simplified Molecular Input Line Entry System (SMILES), a method for representing chemical structures. These LLMs can decode SMILES strings into vector representations.

Inspired by SentEval, MTEB, and DeepChem, we introduce MolEval to evaluate LLM embeddings for molecular structures, which are traditionally costly to execute on standard hardware. MolEval offers a repository of pre-computed molecule embeddings and a platform for evaluating embeddings derived from molecular structures, streamlining the assessment process for researchers.

Our study focuses on the performance of LLMs in embedding SMILES strings for tasks such as molecular property prediction. This work lays the foundation for future advancements in using LLMs for molecular embeddings. We plan to expand MolEval with more tasks as consensus on optimal molecule embedding evaluations evolves, aiming to standardize research outputs.

 

PhD Doctoral Committee:

Internal Reader: Dr. Luis Rueda

Internal Reader: Dr. Saeed Samet

External Reader: Dr. Mohammad Hassanzadeh

Advisor (s): Dr. Jianguo Lu, Dr. Alioune Ngom

Vector Logo