School of Computer Science
Technical Workshop Series: Introduction to Hugging Face Libraries for Natural Language Processing
Presenter: Ali Abbasi Tadi
Date: Friday, December 1, 2023
Time: 3:30 pm -4:30 pm
Location: Advanced Computing Hub 4th Floor (Workshop space) at 300 Ouellette Avenue (School of Computer Science Advanced Computing Hub)
LATECOMERS WILL NOT BE ADMITTED once the presentation has begun.
Abstract:
This talk provides a comprehensive introduction to the Hugging Face library and its various components, making it an excellent starting point for anyone interested in Natural Language Processing (NLP). The Transformers library is a powerful tool for NLP that uses attention mechanisms instead of sequential computation to solve real-life problems. The library provides almost 2000 datasets and layered APIs, allowing programmers to easily interact with those models using almost 31 libraries. The Datasets library provides a collection of datasets that can be used for training and testing models. The Tokenizers library is used to preprocess text data before feeding it into a model. Fine-tuning is the process of taking a pre-trained model and adapting it to a new task by training it on a new dataset. We are providing an overview of all the abovementioned libraries through a practical example.
Workshop Outline:
hugging Face setup
Introduction to Transformers Library, Encoders and Decoders
Introduction to Datasets Library
Introduction to Tokenizers Library
Fine-tuning models
Prerequisites:
Torch library, Neural Networks, Natural Language Processing concepts.
Biography:
Ali is pursuing his Ph.D. in computer science at the University of Windsor. His main research interest is security/privacy in machine learning. He has publications on private clustering in top conferences and peer-reviewed journals. He has received various scholarships from the University of Windsor and got 5th place in iDash 2022 competition. He has been invited as a speaker at the Advanced Computing Hub at the University of Windsor. He is currently developing various ways for secure computation of transcriptomics data.