CFs-SETRec: LLM Model-based Generative Recommendation Capturing Short to Long-Term CF User-Item Dependencies - MSc Thesis Proposal by: Behrad Ghiasi

Friday, October 24, 2025 - 14:00

The School of Computer Science is pleased to present…

CFs-SETRec: LLM Model-based Generative Recommendation Capturing Short to Long-Term CF User-Item Dependencies

MSc Thesis Proposal by: Behrad Ghiasi

 

Date: Friday, October 24th, 2025

Time: 2:00 pm 

Location: Essex Hall, Room 122

 

Abstract:
Large Language Models (LLMs) are advanced neural networks trained on vast amounts of text, capable of understanding and generating language with remarkable flexibility. In recommendation systems, these models have enabled a new paradigm called LLM-based generative recommendation, which differs fundamentally from multi-stage recommenders (MSR) in its pipeline structure. Unlike conventional MSRs that separate item retrieval and ranking into distinct phases, generative recommendation employs a single generative AI model to directly produce recommended item identifiers, leveraging LLMs' ability to learn complex data from both user context (e.g., purchase history) and item information (e.g., product descriptions). For example, after a user buys a camera, a traditional system first retrieves candidates like lenses, tripods, and memory cards, then ranks them to place the lens at the top. A generative model instead encodes the history and descriptions directly and generates the lens in one step, bypassing retrieval and ranking. Unlike traditional multi-pipelines that risk losing information between stages, generative systems reduce error propagation, lower latency, and capture richer user–item dependencies. To enable this approach, each item requires an identifier, a set of one or more vectors that capture different aspects of the item, allowing user histories to be represented as token sequences for next-item prediction.

Current identifier approaches face significant limitations. Single-token identifier methods (e.g., DreamRec23, E4SRec23) represent each item with one learned vector, either collaborative filtering or semantic-based, which simplifies generation but compresses heterogeneous signals into a single embedding point. This under-expressiveness causes representation collapse, which is when different items end up with similar embeddings, weakens personalization, and exacerbates cold-start problems. Token-sequence identifiers (e.g., BIGRec23, IDGenRec24) expand representation capacity by describing items with multiple tokens, improving expressiveness and enabling finer item distinctions. However, they suffer from beam-search local optima during generation and latency issues due to step-by-step autoregressive decoding. Recent RQ-VAE-based methods like TIGER23 (coarse-to-fine codebooks) and LETTER24 (RQ-VAE with semantic and CF guidance) attempt to diversify item codes and mitigate collapse yet their success has remained limited.

To address these limitations, particularly embedding collapse, this thesis builds upon SETRec25, which assigns each item multiple tokens: one CF token from a pre-trained model and several semantic tokens from item descriptions. SETRec25 also uses sparse attention masking to prevent within-item token dependencies and query-guided simultaneous generation to produce all next-item tokens in parallel rather than sequentially. For example, if one semantic token represents colour and another represents size, there is no meaningful order between them; this order-agnostic nature is what the model leverages. Our contribution, CFs-SETRec25, enhances this foundation by incorporating another CF token from a different model: instead of single CF vectors, we compute two collaborative filtering signals, SASRec18 (capturing sequence-aware, short/medium-term patterns) and iALS08 (capturing global, long-term user affinities). This dual CF approach maintains representation diversity, reduces collapse risk, and captures broader behavioural patterns spanning recent trends and persistent preferences. This proves especially valuable under data sparsity and cold-start conditions. On Amazon Toys (cold-start evaluation), CFs-SETRec achieves +10% improvement in R@5 and +5% improvement in R@10 over SETRec25.

 

Keywords: Large Language Model (LLM), Recommender Systems, Collaborative Filtering, Semantic Embedding, Set Identifier
 
Thesis Committee:

Internal Reader: Dr. Muhammad Asaduzzaman 

External Reader: Dr. Afshin Rahimi

Advisor: Dr. Christie Ezeife

 

Registration Link (only MAC students need to pre-register)