The School of Computer Science is pleased to present…
MSc Thesis Defense Announcement
Matches Made in Heaven or Somewhere: Personalized Query Refinement Gold Standard Generation
Using Transformers
MSc Thesis Defense by: Yogeswar Lakshmi Narayanan
Date: September 12th, 2023
Time: 12 PM to 2 PM
Location: Essex Hall, Room 122
Abstract:
The foremost means of information retrieval, search engines, have difficulty searching into knowledge repositories, e.g., the
web, because they are not tailored to the users' differing information needs. User queries are, more often than not, underspecified
or contain ambiguous terms that also retrieve irrelevant documents. Query refinement is the process of transforming
users' queries into new refined versions without semantic drift to enhance the relevance of search results. Prior query refiners
have been benchmarked on ad-hoc web retrieval datasets following weak assumptions that users' input queries improve
gradually within a search session. Existing methods also have employed additional metadata, such as session history or users'
click-throughs, to enrich the query context. However, one crucial contextual cue has been overlooked: the user context.
Moreover, personalized query refinement is vastly unexplored with the recent advancements in transformers and large language
models in general. To overcome the aforementioned problems, (i) We contribute RePair, an open-source configurable toolkit, to
generate large-scale gold standard benchmark datasets from a variety of domains for the task of query refinement. RePair takes
a dataset of queries and their relevance judgements (e.g., msmarco or aol), a sparse or dense information retrieval method (e.g.,
bm25, colbert), and an evaluation metric (e.g., map), and outputs refined versions of queries, each of which with the relevance
improvement guarantees under the retrieval method in terms of the evaluation metric. RePair benefits text-to-text-transfertransformer
(t5) to generate gold standard datasets for any input query set and is designed with extensibility in mind. Out of the
box, we have generated and publicly shared gold-standard datasets for aol and msmarco.passage whilst benchmarking these
gold standard datasets with state-of-the-art supervised query suggestions models and exploring t5 as an alternative model for
query suggestion. (ii) We propose leveraging t5 to incorporate user context by adding a user-tailored pretext to the input
sequence as prior conditions to generate personalized reformulation of queries in the output sequence. Our experiments on the
aol query log demonstrated the effectiveness of t5 in personalized query reformulation without any loss of generality to other
conditional transformers. Our codebase is publicly available at https://github.com/fani-lab/RePair.
Thesis Committee:
Internal Reader: Dr. Jianguo Lu
External Reader: Dr. Mohammad Hassanzadeh
Advisor: Dr. Hossein Fani
Chair: Dr. Dan Wu
