MSc Thesis Defense of Daniel Obawole:"Comparative Study of Reinforcement Learning Methods in Path Planning "

Thursday, September 23, 2021 - 11:30 to 13:30


The School of Computer Science is pleased to present… 

MSc Thesis Defense by: Daniel Obawole 

Date: Thursday September 23rd, 2021  
Time:  11:30 am – 1:30pm 
Passcode: If interested in attending this event, contact the Graduate Secretary at with suffient notice before the event to obtain the passcode.


In order to perform a large variety of tasks and achieve human-level performance in complex real-world environments, an intelligent agent must be able to learn from its dynamically changing environment. Generally speaking, agents have limitations in obtaining an accurate description of the environment from what they perceive because they many not have all the information about the environment. The present research is focused on reinforcement learning algorithms that represent a defined category in the field of machine learning because of their unique approach based on a trial-error basis. Reinforcement leaning is used to solve control problems based on received rewards. The core of its learning task is defined by a reward function where an unsuitable choice of action results in more negative rewards. The reinforcement learning framework comprises of the notion of cumulative rewards over time, to enable an agent to select actions that promote long-term results. Q-learning and SARSA are two popular methods along this approach. These two methods are similar except that Q-learning follows an off-policy strategy while SARSA is an on-policy algorithm. In this thesis, we present the comparison of Q-learning and SARSA algorithms for the global path planning of an agent in a grid-world game environment in order to verify the efficiency in different scenarios. In this thesis, simulation was performed in the grid-world environment comprising of static obstacles with a density of 30%. The results demonstrate that both approaches reach the optimal policy with a complete success rate of the learning episodes in the test cases. The comparison shows that Q-learning algorithm outperforms the SARSA algorithm by 34% in terms of the computation time as both approaches tend closer to negative rewards while arriving at the optimal path. However, with 12% higher convergence ratio, the SARSA approach better avoids large penalties from exploratory moves which in turn proposes a safer route as the optimal path.
Keywords:Machine learning, Reinforcement learning, Q-learning, SARSA algorithm

MSc Thesis Committee:  

Internal Reader: Dr. Dan Wu            
External Reader: Dr. Myron Hlynka 
Advisor: Dr. Jessica Chen 
Chair: Dr. Scott Goodwin 

 MSc Thesis Defense Announcement     Vector Institute in artificial intelligence, artificial intelligence approved logo

5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 (working remotely)