Friday, April 5, 2024 - 12:00
The School of Computer Science is pleased to present...
Mining for Product Recommendation on Document-Based NoSQL Big Data
PhD Dissertation Defense by: Abdulrauf Aremu Gidado
Date: Friday, 05 April 2024
Time: 12:00 pm
Location: Essex Hall, Room 122
Abstract:
Majority of large corporations such as Amazon and Facebook still have their core solutions (e.g., payments) on relational databases but only use non-relational Bigdata (i.e., NoSQL) database management systems for their non-core systems (e.g., shopping cart) that favor availability and scalability through partitioning while trading off consistency. NoSQL systems are built based on the CAP (Consistency, Availability and Partitioning) database theorem, which satisfies two of these features while trading off one. The need for systems availability and scalability drives the use of NoSQL models, while the lack of consistency and robust query engines as obtainable in relational databases impede their usage. To mitigate these drawbacks, researchers and companies like Amazon, Google and Facebook developed 'SQL over NoSQL' systems such as Amazon’s Dynamo, Google's Spanner, Facebook’s Memcache, Zidian2019, Apache Hive and SparkSQL. These systems create an SQL-like query engine layer over NoSQL systems but suffer from data redundancy due to processing of unnormalized NoSQL database (e.g., Document) which lack consistency obtainable in relational databases. Their query engine is also not relationally complete because they cannot process all relational algebra-based queries as obtainable in a relational database. This thesis presents a ‘NoSQL over SQL system’, an inverse of existing ‘SQL over NoSQL’ Big data processing approaches such as Zidian2019 that transforms data into a key-value format before building an SQL query engine layer on the NoSQL data. Thesis approach is motivated by (i) the need for existing systems to fully deploy NoSQL data store functionalities without the limitation of building an extra SQL layer for querying, and (ii) the ability to integrate images similarities into the ecommerce mining process by taking advantage of the ease of retrieval and storage of storing images as text on document-oriented NoSQL databases.
To allow appropriate storage and retrieval of data on document-based NoSQL databases without data redundancy and inconsistency while encouraging both horizontal and vertical partitioning, this work proposes NoSQL over SQL Block as a Value (BaaV) data storage strategy. Unlike relational database model where a relation is represented as R(k,A1,A2,...,An) with a key attribute k=k1,k2...,kn and ki is the primary key to some set of attributes (block),Ai i = 1,2,...,m from the original relation R, in our NoSQL BaaV model (represented as a block of tuples (K,B) where K is the block key attribute from the original set k, and B is a block of relations from the original relations retrievable with K). NoSQL BaaV represent a relation as R(K,r1,r2,...rm), with a key attribute K and a set of n relations (i.e., r) called blocks B and each r E B contains a set of its own attributes and is denoted as r(k,A1,A2,...,Ap), with a key attribute k and a set of p attributes typical to a relational model. The relations r1,r2,...,rp in R of NoSQL Baav database are related through foreign key relationships. Thesis also solves data inconsistency problem of existing NoSQL-based stores using NoSQL Baav model and by using a leader node strategy in the NoSQL stores cluster for read/write operations while retaining an in-sync replica node similar to Apache Kafka data replication strategy. Additionally, we vectorized items image and integrate item-item image similarity scores into e-commerce customer historical purchase database to enhance sequential pattern recommendation on e-commerce with a proposed Image Enhanced Historical Sequential Pattern Recommendation (iHSPRec) system.
To enhance accurate pattern mining on NoSQL databases for adequate recommendation and allow existing corporations with a large relational database to take advantage of NoSQL databases, this thesis (i) proposes a Block as a Value (BaaV) framework for extracting data and mapping from NoSQL into relational schema to enable faster data retrieval for existing large relational databases, (ii) Integrate item-item image similarity scores into customers purchase history for enhance sequential pattern recommendation by using items images stored on document-based NoSQL database (iii) propose a sequential pattern mining technique on NoSQL BaaV document-oriented database. Using existing benchmark systems of ‘SQL over NoSQL’, relational databases and real-life datasets for our experiments, we demonstrated that our NoSQL over SQL system outperforms existing relational databases, SQL over NoSQL systems and is novel in ensuring data consistency, scalability, query execution and improving data storage and retrieval in large database systems without data loss and enhancing improved performance on NoSQL database.
Keywords: Bigdata, Document-Oriented NoSQL Databases, Recommendation Systems, Sequential Pattern Mining, Block as a Value, E-Commerce.
Doctoral Committee:
Internal Reader: Dr. Alioune Ngom
Internal Reader: Dr. Curtis Bright
External Reader: Dr. Christian Trudeau
External Examiner: Dr. Carson Kai-Sang Leung
Advisor: Dr. Christie Ezeife
Chair: Dr. Andrew Swan