PhD Comprehensive Examination Announcement by Abdulrauf Aremu Gidado:"Influence Maximization for Product Recommendation with Sequential Patterns on Big Data"

Thursday, September 1, 2022 - 14:30 to 16:30

PhD. Comprehensive Exam by: Abdulrauf Aremu Gidado

Date: Thursday, September 1st, 2022

Time: 2:30 PM to 4:30 PM

Location: Lambton Tower, Room 3105

Reminder: ***NEW*** Mandatory two-part attendance: Scan provided QR code to record attendance PLUS write information on attendance sheet.

Abstract:

The discovery, exploitation, and maximization of influential entities (e.g., users, products, or services) in recommender systems (e.g., Amazon, Facebook Marketplace, Walmart, etc.) and the proliferation of Bigdata emerging from social software such as Facebook, Amazon, Google have been a trending topic in recent times. These recommender systems produce unstructured data (such as video, audio, emojis and text) which are large (i.e., in size such as thousands of Petabytes) and complex to analyze, manage, and process by existing traditional data mining techniques such as association rule mining and classification due to the large size and non-structuredness referred to as bigdata. Hence, we formally define bigdata as given a non-empty set (where is the unstructured large datasets), a function is called a big operation , if and only if for any is then defined as a collection of several unstructured or semi structured data stores as such NoSQL (i.e., Not Only SQL) database and non-textual data (usually stored in files such as images and videos) that cannot be handled by traditional data mining techniques used by existing recommender systems. Therefore, the pertinent part of the operation is to transform the set of recommendation tasks into a bigdata scenario with containing all or subset of the features of bigdata (such as volume, variety, veracity, velocity, value, visualization, etc.). The need for bigdata producing systems (e.g., Facebook, Twitter, etc.) to enhance their data management needs that could not be handled by the traditional relational databases led to the introduction of the NoSQL (i.e., Not Only SQL) database such as Cassandra CouchDB, Elasticsearch, Bigtable and Dynamo. These NoSQL databases are designed as distributed databases tailored toward handling semi-structured data with a focus on availability, high performance, scalability, and data replication, unlike the traditional relational databases that emphasize on quick data consistency, powerful query languages and structured data storage. Furthermore, the bigdata producing systems with their respective marketplaces (i.e., e-commerce) such as Facebook Marketplace, Instagram Marketplace are regarded as systems with influences due to the relationships between users on the systems.

The problem of influence maximization is defined by given (i) a directed social network (such as a graph G), (ii) a set of weights (or nodes such as users U in the graph or interconnected products in a recommender system) associated with edges (such as vertices V showing linkage or relationships between nodes/users/products), representing strengths or probabilities of influence among users or products, (iii) a stochastic influence propagation model that governs how a certain behaviour would diffuse among users, and (iv) a cardinality budget constraint of k (i.e., representing the minimum cost to find the k called the “seed nodes”) that maximizes the expected number of influenced nodes in the network. This is the foundational idea behind today’s viral marketing technologies.

Existing systems such as MRPFP, kDDP-Miner, BigPFM, and NCF that attempt to mine patterns on bigdata databases with influences only consider mining from large datasets with influences using an extension of the popular MapReduce algorithm that divides computational tasks and executes them simultaneously or mining top-k products from temporal databases. Recent systems such as NCF also attempt to model implicit influences or interactions in recommender systems using neural networks. However, the performance of these systems is limited or in some scenarios completely inapplicable when dealing with unstructured data. This is because all existing recommendation systems both for bigdata and traditional databases have their defined database schemas. Also, one of the major constraints restricting sequential pattern mining on bigdata databases, particularly document-based NoSQL is the inability to store data uniquely on NoSQL stores leading to data redundancy (i.e., multiple copies of the same data stored in the same database)

To allow efficient pattern mining on NoSQL databases for adequate recommendation and allow existing corporations with a large relational database to take advantage of NoSQL databases, this thesis (i) proposes a Block as a Value (BaaV) framework for extracting data and mapping from relational schema into NoSQL to enable faster data retrieval for existing large relational databases, (ii) model item influences on document-based NoSQL databases (iii) propose an influence maximization technique for sequential pattern mining on NoSQL database with influences (i.e., influential nodes) for product recommendation.

Keywords: Bigdata, NoSQL Databases, Recommendation Systems, Sequential Pattern Mining, Influence Maximization, Block as a Value, E-Commerce.

PhD Doctoral Committee:

Internal Reader: Dr. Alioune Ngom

Internal Reader: Dr. Curtis Bright

External Reader: Dr. Christian Trudeau (Department of Economics)

Advisor: Dr. Christie Ezeife

5113 Lambton Tower 401 Sunset Ave. Windsor ON, N9B 3P4 (519) 253-3000 Ext. 3716 csgradinfo@uwindsor.ca