MSc Thesis Proposal: LLM-guided Multi-intent Code Comment Generation with Stack Overflow Context by Jenish Modi

Friday, January 30, 2026 - 12:00

LLM-guided Multi-intent Code Comment Generation with Stack Overflow Context

MSc Thesis Proposal by:Jenish Modi

Date: 30th January, 2026

Time:  12:00 pm

Location: Lambton Tower Room 3105

Abstract:

Code comments are critical artifacts for software readability, maintainability, and developer productivity. However, developers struggle to write comments or end up with low-quality, partial, non-intent-revealing, or narrow-view comments. Existing automated comment generation systems produce short, one-intent outputs and often ignore the multiple dimensions of comments important to developers, including what, why, and how a code functions, as well as its contextual properties. Furthermore, the lack of large and intent-labeled corpora has hindered progress in multi-intent comment generation. We present MICoGen-R: Multi-Intent Code Comment Generator Agent (RAG-based on Stack Overflow examples), an innovative framework comprising four key components: (1) retrieval of semantically similar code–comment pairs from a curated Stack Overflow corpus, (2) combination with large language models (LLMs) using few-shot prompting to produce intent-aligned, context-aware comments, (3) automated intent labeling using a pre-trained LLM, and (4) a developer-in-the-loop feedback mechanism for iterative improvement of multi-intent comments.

We evaluated MICoGen-R with three LLMs: Gemini 2.5 Pro, CodeLlama 7B Instruct, and GPT-4o Mini, to analyze the effect of model variations on multi-intent comment generation. We employ BERTScore as the primary evaluation metric for comment quality, given its strong correlation with human judgment. Preliminary results show a promising mean BERTScore across models when compared with human-written intent-based comments in the MICoGen-R database, demonstrating the feasibility and practicality of multi-intent comment generation and indicating the potential of MICoGen-R to advance multi-intent comment generation for easing developer effort in software development.

Keywords: LLM, Code Comment Generation, Retrieval Augmented Generation, Stack Overflow

Thesis Committee:

Reader 1: Dr. Saeed Samet

Reader 2: Dr. Jessica Chen

Advisor: Dr. Muhammad Asaduzzaman