Transformer-Based Sentence Classification in the Medical Domain: Evaluation of Pretrained and PubMed-Fine-Tuned Models - MSc Thesis Defense by: Tamanna Kaiser

Thursday, May 8, 2025 - 09:00

The School of Computer Science is pleased to present…

Transformer-Based Sentence Classification in the Medical Domain: Evaluation of Pretrained and PubMed-Fine-Tuned Models
MSc Thesis Defense by: Tamanna Kaiser

 

Date: Thursday, May 8th, 2025

Time:  9:00 AM

Location: Essex Hall, Room 186

 

Abstract:

Automatically performing sentence-level classification in medical documents can improve information organization and enhance clinical decision-making efficiency. This study evaluates the effectiveness of eight transformer-based large language models for biomedical sentence classification. These models were fine-tuned on the PubMed 20k RCT training set using a composite loss function that integrates cross-entropy, focal loss, and dice loss to address class imbalance and improve generalization.

 
We began with eight pretrained transformer-based language models, which were subsequently fine-tuned using the training split of the PubMed 20k RCT dataset. For evaluation, both the pretrained and PubMed-fine-tuned versions of each model were tested on two datasets. First, performance was measured on the official PubMed 20k RCT test set to assess in-domain effectiveness. Second, the MTSamples dataset was used to evaluate model generalizability to unstructured clinical narratives. To simulate real-world conditions, we constructed two test subsets from MTSamples: one balanced to ensure equal class representation, and one imbalanced to reflect the natural label distribution. All evaluations on MTSamples were conducted without any additional training, allowing us to assess performance across both balanced and imbalanced clinical settings.

 
Results show that PubMed-fine-tuned domain-specific models, particularly ClinicalBERT, outperform general-purpose counterparts. ClinicalBERT achieved 97.15% accuracy and 96.93% F1-score on the PubMed 20k RCT test set, and 95.20% accuracy and 95.10% F1-score on the balanced MTSamples subset. On the imbalanced MTSamples subset, ClinicalBERT maintained strong performance with 91.80% accuracy and 90.60% F1-score, demonstrating resilience to real-world class distribution skew. Ablation studies further confirmed the effectiveness of the composite loss design in enhancing model robustness.

These findings demonstrate that transformer-based models, when fine-tuned on domain-specific data and trained using optimized loss functions, offer a reliable and scalable solution across structured and unstructured clinical text.

 

Keywords: Medical Text Classification, Transformer-Based Models, Sentence Classification
 
Thesis Committee:

Internal Reader:  Dr. Muhammad Asaduzzaman

External Reader: Dr. Abdul A. Hussein

Advisor: Dr. Dan Wu

Chair: Dr. Christie Ezeife

 

Vector Logo

 

Registration Link ( only MAC students need to pre-register)