CLIP-Enhanced CIR with Schemas

Tuesday, October 28, 2025 - 14:30

The School of Computer Science is pleased to present…

MSc Thesis Proposal by: Tina Aminian

Date: Tuesday, October 28, 2025

Time: 2:30 PM

Location: Dillon Hall, Room 354

Abstract:

Composed Image Retrieval (CIR) is the task of retrieving a similar image from a database given a reference image and a textual modification, thereby capturing the user’s intent regarding how the image should be changed. It has various applications, such as e-commerce and general image search engines. In recent years, numerous CIR models have been proposed, demonstrating strong performance, particularly when textual modifications are expressed as simple, attribute-based phrases. However, CIR remains challenging, especially in domains such as fashion retail, where the vocabulary used to express retrieval requirements is rich and fine-grained.
Built on top of the CLIP model, we propose an attribute-based approach to CIR in which the attribute schema is data-dependent. We constructed the training set by pairing and automatically annotating images from the Deep Fashion Multi-Modal dataset. This annotation process followed a specific schema and leveraged the capabilities of a Multimodal Large Language Model (Qwen2.5). For validation, we utilized a Large Language Model (LLM) to accurately interpret and extract the semantic meanings of the modification texts.
Based on this structured data, we propose a more precise methodology for filtering and fine-tuning. The initial experimental results indicate that this approach achieves improved performance over several existing CIR and zero-shot CIR baselines.

Thesis Committee:

Internal Reader: Dr. Jianguo Lu

External Reader: Dr. Muhammad Asaduzzaman

Advisor: Dr. Jessica Chen

Vector logo

CLIP-Enhanced CIR with Schemas - MSc Thesis Proposal by: Tina Aminian

CLIP-Enhanced CIR with Schemas

MSc Thesis Proposal by: Tina Aminian

Registration Link (only MAC students need to pre-register)