LLM-guided Multi-intent Code Comment Generation by leveraging Crowdsourcing Knowledge in Stack Overflow
MSc Thesis Defense by: Jenish Modi
Date: 8th May 2026
Time: 11 AM
Location: 122 Essex Hall
Abstract:
Code comments play a critical role in software development by supporting program comprehension, maintenance, and collaboration; however, in real-world systems, comments are often incomplete, outdated, or limited to a single aspect of code functionality. Existing automatic code comment generation approaches typically produce a single generic summary, failing to capture diverse developer needs such as rationale, usage guidance, and implementation details. To address the above mentioned issues, this thesis introduces MICoGen-R, a retrieval-augmented multi-intent code comment generation framework that integrates semantic retrieval and large language models (LLMs). The approach reformulates comment generation as a one-to-many problem, generating multiple intent-specific comments for a given code snippet, including What, Why, How-to-use, How-it-is-done, and Property. The framework leverages CodeBERT embeddings and FAISS indexing to retrieve semantically similar examples, which are incorporated into structured prompts to guide LLM-based code comment generation. Evaluation using BERTScore shows that MICoGen-R outperforms baseline approaches such as DOME and few-shot LLM methods. Manual validation and a human study further confirm that the generated comments are clear, relevant, and aligned with developer intent. Overall, the results demonstrate that combining retrieval-augmented generation with multi-intent modeling improves the quality and usefulness of automatically generated code comments.
Thesis Committee:
Reader 1: Dr. Saeed Samet
Reader 2: Dr. Jessica Chen
Advisor: Dr. Muhammad Asaduzzaman
Chair: Dr. Andreas Maniatis