MSc Thesis Proposal - Crafting High-Quality Lateral Movement Dataset by Anas Mabrouk

Monday, March 4, 2024 - 10:00 to 12:00

The School of Computer Science is pleased to present…

Crafting High-Quality Lateral Movement Dataset

MSc Thesis Proposal by: Anas Mabrouk


Date: Monday, 04 Mar 2024

Time: 10:00 am – 12:00 pm

Location: Essex Hall, Room 122


In Cybersecurity, the persistent evolution of cyber threats poses an ongoing challenge for organizations and individuals. Among the array of sophisticated tactics employed by threat actors, the concept of ”lateral movement” has emerged as a pivotal strategy for adversaries seeking to maneuver within compromised network environments. Most recent research in Detection Models, including Lateral Movement, relies on Machine Learning. Therefore, high-quality datasets are essential for effective training and evaluation. We examined the extant open-source datasets employed in Lateral Movement detection, APT detection, Intrusion Detection, and Threat hunting to ascertain the presence of LM attacks. We have identified various shortcomings in the existing datasets, including simplistic architectures that fail to capture the complexity of enterprise networks. Furthermore, many datasets offer only partial data sources, such as network flow or authentication data. The literature lacks emphasis on the methodologies used to collect attack records from system and network logs. Additionally, there is a shortage of Lateral Movement (LM) instances in existing datasets, with limited diversity in employed techniques. Typically, LM instances consist of a few hops and occur over short periods. Moreover, most datasets are outdated and do not reflect recent attack patterns. Our primary contribution involves the development of a benchmark dataset that addresses the limitations of existing datasets. We have constructed a realistic architecture and are gathering comprehensive system logs and network traffic data. We have also devised realistic attack patterns encompassing recent tactics, techniques, and procedures. Furthermore, we have implemented a robust data labeling technique to extract attack-related records accurately. Lastly, we have established a methodology for generating realistic benign data.
Thesis Committee:
Internal Reader: Dr. Saeed Samet
External Reader: Dr. Ning Zhang
Advisor: Dr. Sherif Saad and Dr. Mohammad Mamun
Vector Institute Logo