MSc Thesis Proposal - “No Query Left Behind: Query Refinement via Language Backtranslation” by: Delaram Rajaei

Tuesday, March 5, 2024 - 13:00 to 14:30

The School of Computer Science is pleased to present…

“No Query Left Behind: Query Refinement via Language Backtranslation”

MSc Thesis Proposal by: Delaram Rajaei

 

Date: Tuesday, 05 Mar 2024

Time: 1:00 – 2:30 pm

Location: Odette School of Business, Room B02

 

Abstract:
Web queries are often brief and unclear due to users' uncertainty on reflecting their information needs, rendering it difficult for search engines to retrieve relevant documents. Query refinement aims to enhance the relevance of search results by modifying users' original queries to refined versions. State-of-the-art query refinement models have been trained on web query logs predisposed to topic drifts, resulting in suboptimal performance. To fill the gap, little work has been proposed to generate benchmark datasets of (query → refined query) pairs through an overwhelming application of unsupervised or supervised modifications to the original query while controlling topic drifts. In this paper, however, we propose leveraging natural language backtranslation, a round-trip translation of a query from a source language to target languages, as a simple yet effective approach to generate gold-standard benchmark datasets whose pairs are almost surely guaranteed to be in the same semantic context with no topic drift. Backtranslation can (1) uncover latent terms in a query due to being commonly known in the source language, (2) augment an original query with context-aware synonymous terms from target languages, and (3) help with the semantic disambiguation of polysemous terms and collocations. Our extensive experiments across five trec query sets and ten languages from seven language families, validated the effectiveness of our method. We open-sourced our research at https://github.com/fani-lab/RePair.
 
Keywords: Query Reformulation, Natural Language Backtranslation, Gold Standard Generation
 
Thesis Committee:
Internal Reader: Dr. Jianguo Lu (School of Computer Science)
External Reader: Dr. Tanja Collet-Najem (Department of Languages, Literatures and Cultures)
Advisor: Dr. Hossein Fani (School of Computer Science)
 
Vector Institute Logo