MSc Thesis Defense Announcement of Malita Dodti:"BERT for Question Answering in Biomedical Domain"

Thursday, December 8, 2022 - 14:00 to 16:00

SCHOOL OF COMPUTER SCIENCE

The School of Computer Science is pleased to present…

MSc Thesis Defense by: Malita Dodti

Date: Thursday December 8, 2022

Time: 2:00 pm - 4:00 pm

Location: Essex Hall, Room 122

Reminders: Two-part attendance is mandatory, arrive 5-10 minutes prior to the event starting - LATECOMERS WILL NOT BE ADMITTED once the door has been closed and the presentation has begun. Please be respectful of the presenter by NOT knocking on the door for admittance.

Abstract:

Question Answering (QA) is a complex Natural Language Processing (NLP) task. It involves understanding a question, retrieving relevant materials, and generating a suitable answer. Its major challenge is to create proper representations of the language and to produce a suitable answer to a given question. Pretraining neural language models has significantly improved many natural language processing tasks. In particular, BERT is a deeply bidirectional, pre-trained language representation that has performed well in NLP tasks including question answering. In this thesis work, we study the application of the BERT technique to automated response generation for biomedical text mining. This application comes from the consideration that, due to the growth of the volume of biomedical papers, biomedical text mining is demanding better techniques to automate the extraction and the summarization of the biomedical information and to automate the responses to the queries. To be successful in answering biomedical questions, the lack of the availability of large expert-annotated biomedical datasets must be addressed. In the present thesis work, we consider augmenting the data samples from existing ones by varying context lengths. We have studied how dynamic changes in the passage length affect the performance of the models. This provides us with a better understanding of the optimal context lengths. To learn about the behaviour of the models when unanswerable questions are present, datasets with various ratios of answerable and unanswerable questions are used and the experiments show a significant range of the behaviour of the prediction models on different training and testing sets. During the experiments, a new span selection technique is implemented for predicting the answers. According to the experiments, it offers satisfactory improvement to the effectiveness of the state-of-the-art techniques for performing question-answering tasks in the context of biomedical text mining.

Keywords: Question Answering, Natural Language Processing, BERT (Bidirectional Encoder Representations from Transformers), Pre-training, Fine-tuning technique, Domain-specific model, Language representation, Text mining.

MSc Thesis Committee:

Internal Reader: Dr. Luis Rueda

External Reader: Dr. Huiming Zhang

Advisor: Dr. Jessica Chen

Chair: Dr. Asish Mukhopadhyay

MSc Thesis Defense Announcement

5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 csgradinfo@uwindsor.ca