The School of Computer Science is pleased to present…
MSc Thesis Defense by: Tamanna Kaiser
Date: Thursday, May 8th, 2025
Time: 9:00 AM
Location: Essex Hall, Room 186
Automatically performing sentence-level classification in medical documents can improve information organization and enhance clinical decision-making efficiency. This study evaluates the effectiveness of eight transformer-based large language models for biomedical sentence classification. These models were fine-tuned on the PubMed 20k RCT training set using a composite loss function that integrates cross-entropy, focal loss, and dice loss to address class imbalance and improve generalization.
We began with eight pretrained transformer-based language models, which were subsequently fine-tuned using the training split of the PubMed 20k RCT dataset. For evaluation, both the pretrained and PubMed-fine-tuned versions of each model were tested on two datasets. First, performance was measured on the official PubMed 20k RCT test set to assess in-domain effectiveness. Second, the MTSamples dataset was used to evaluate model generalizability to unstructured clinical narratives. To simulate real-world conditions, we constructed two test subsets from MTSamples: one balanced to ensure equal class representation, and one imbalanced to reflect the natural label distribution. All evaluations on MTSamples were conducted without any additional training, allowing us to assess performance across both balanced and imbalanced clinical settings.
Results show that PubMed-fine-tuned domain-specific models, particularly ClinicalBERT, outperform general-purpose counterparts. ClinicalBERT achieved 97.15% accuracy and 96.93% F1-score on the PubMed 20k RCT test set, and 95.20% accuracy and 95.10% F1-score on the balanced MTSamples subset. On the imbalanced MTSamples subset, ClinicalBERT maintained strong performance with 91.80% accuracy and 90.60% F1-score, demonstrating resilience to real-world class distribution skew. Ablation studies further confirmed the effectiveness of the composite loss design in enhancing model robustness.
These findings demonstrate that transformer-based models, when fine-tuned on domain-specific data and trained using optimized loss functions, offer a reliable and scalable solution across structured and unstructured clinical text.
Internal Reader: Dr. Muhammad Asaduzzaman
External Reader: Dr. Abdul A. Hussein
Advisor: Dr. Dan Wu
Chair: Dr. Christie Ezeife