Mining Git Repositories with PyDriller – Part I: Understanding Git and Repository Mining Basics (2ndOffering) - JLR Challenge #4 Technical Workshop by: Mehedi Hasan Shanto

Tuesday, November 4, 2025 - 15:00
School of Computer Science – JLR Challenge #4 Technical Workshop

 

Mining Git Repositories with PyDriller – Part I: Understanding Git and Repository Mining Basics (2nd Offering)

Presenter: Mehedi Hasan Shanto

 

Date: Tuesday, November 4th, 2025

Time: 3:00 PM

Location: Workshop Space, 4th Floor - 300 Ouellette Ave., School of Computer Science Advanced Computing Hub

 

Abstract

This workshop provides a hands-on introduction to mining and analyzing software repositories using PyDriller, a Python framework that simplifies access to Git data. Participants will first explore how Git records the evolution of software projects through commits, authors, timestamps, and code changes. The session will then demonstrate how PyDriller converts raw commit logs into structured, analyzable Python objects—making it easier to extract insights such as developer activity, commit frequency, and project evolution patterns.

Through interactive examples, attendees will learn how to connect to a GitHub repository, traverse commit histories, and extract essential metadata for software analytics. The workshop will emphasize the importance of version control data in empirical software engineering, showcasing how it can be used to support research, automate reporting, and drive evidence-based development practices. This first session sets the stage for deeper repository analysis on Day 2, where participants will move from basic mining to advanced metrics and developer behaviour analytics.

 

Workshop Outline:

1. Introduction to Repository Mining
   Why studying version control data matters
   Applications in research and industry

2. Git Fundamentals
   How Git tracks code history and collaboration
   Key commands and underlying concepts

3. Introduction to PyDriller
    Overview, installation, and core capabilities
   Understanding commits, authors, and metadata

4. Hands-on Demonstration
   Connecting to a repository
   Extracting commit messages, authors, and timestamps

5. Interactive Exercise & Discussion
   Running PyDriller on real repositories
   Exploring developer activity patterns

6. Wrap-Up & Next Steps
    Key takeaways and preview of advanced analysis in Part II

 

Prerequisites:

Basic understanding of Python programming and familiarity with Git concepts.
Participants should have access to Jupyter Notebook or Google Colab for the live demo.

 

Biography

Mehedi Hasan Shanto is a Ph.D. student in the School of Computer Science at the University of Windsor, specializing in software engineering, large language models (LLMs), and repository mining. His research focuses on understanding how AI and empirical methods can evaluate, predict, and automate software development activities. He has experience working with GitHub data, software analytics, and LLM-based evaluation frameworks. Shanto’s passion lies in bridging the gap between software repository data and intelligent automation, helping developers and researchers turn raw version control history into actionable insights.

 

Registration Link (Only for MAC students to pre-register)