PhD. Seminar: Automatic Query-Intent Annotation: A Log-Free Agentic LLM Framework by Zahra Taherikhonakdar

Wednesday, April 8, 2026 - 13:00

Automatic Query-Intent Annotation: A Log-Free Agentic LLM Framework

 

PhD. Seminar by: Zahra Taherikhonakdar

 

Date: Wednesday, April 8

Time: 1:00 PM

Location: Microsoft Teams

 

Abstract:

Short and ambiguous queries remain a major challenge for search engines, often reducing the effectiveness of information retrieval (IR) systems. Prior work shows that identifying a query’s intent, such as navigational, in formational, or transactional, can significantly improve retrieval accuracy and interpretability. However, most intent-labeling methods depend on user search-log histories, which not only require extensive collection and preprocessing but also introduce substantial privacy and data provenance concerns, limiting their availability to researchers and their deployability in privacy-sensitive domains. This paper introduces a log-free, agentic an notation framework that uses a pipeline of LLM-driven agents to automatically assign intent labels to queries. The system comprises four components: a loader, an LLM-based annotator (zero-shot or few-shot), a validator enforcing structural and confidence constraints, and an evaluator that compares predictions against gold labels when available or uses an LLM-as-judge otherwise. We adopt a five-class taxonomy (navigational, factual, transactional, instrumental, and abstain) and evaluate the approach on labeled data (orcas-i-2m) as well as unlabeled benchmarks including antique, clueweb09-b, robust04, gov2, and dbpedia. Empirical results show that the few-shot variant consistently outperforms zero-shot prompting, and that the end-to-end pipeline maintains stable cross-domain performance under GPT-5-based auditing. Because the framework operates without relying on user interaction logs, it significantly reduces privacy exposure while lowering annotation cost and improving scalability. Automating query-intent annotation avoids dependence on user search logs, thereby mitigating privacy risks while reducing the time and cost of labeling large-scale datasets.

Join Teams Meeting 

Meeting ID: 211 084 028 399 98
Passcode: D4Mo3Qm7

PhD Doctoral Committee:

Internal Reader: Dr. Jianguo Lu

Internal Reader: Dr. Luis Rueda

External Reader: Dr. Tanja Collet-Najem

Advisor (s): Dr. Ziad Kobti