GPU Performance Metrics for CNNs: Profiling and Optimization (1st Offering)
Presenter: Farzaneh Kazemzadeh
Date: Monday, November 24th, 2025
Time: 2:00 PM
Location: 4th Floor - 300 Ouellette Ave., School of Computer Science, Advanced Computing Hub
Efficiently deploying Convolutional Neural Networks (CNNs) on GPUs requires not only implementation knowledge but also a solid understanding of performance profiling. This workshop focuses on GPU-level performance metrics that determine the speed and scalability of CNN models. Participants will learn about key profiling indicators such as kernel execution time, GPU occupancy, memory throughput, and tensor-shape impact on runtime. The session also explores how NVIDIA Nsight and Triton profiling tools can be used to interpret GPU utilization and identify bottlenecks. By the end of this workshop, attendees will have a clear understanding of how to analyze CNN performance beyond model accuracy, aligning with real-world practices in CUDA and Triton optimization.
- - Overview of CNN computation on GPUs
- - GPU performance indicators: execution time, SM usage, memory throughput
- - Introduction to CUDA profiling and Nsight tools
- - Understanding tensor dimensions and batch-size effects
- - Comparing kernel fusion and optimization techniques in CUDA and Triton
- - Discussion on interpreting profiling results and performance improvement strategies
- - Basic understanding of CNN architecture and GPU computing
- - Familiarity with CUDA programming concepts
Farzaneh Kazemzadeh is a PhD student in Computer Science at the University of Windsor. Her research focuses on trustworthy AI, particularly on privacy-preserving machine learning, with applications in genomics and social networks. Her current work explores memorization and privacy risks in large language models.
Registration Link (only MAC students need to pre-register)