The School of Computer Science is pleased to present...
Enhancing E-commerce Dataset recommendation using BERT and Named Entity Recognition
MSc Thesis Proposal by: Ayomide Elijah Oduba
Date: August 8th, 2024
Time: 10 am – 11:30 am
Location: Essex Hall, room 122
Abstract:
Increase in e-commerce activities and research has led to an increasing demand for effective e-commerce dataset recommendation system to aid in data-driven decision-making such as pricing strategy adjustments and improving customer satisfaction. For instance, a business analyst might need to analyze the relationship between pricing changes and customer feedback on electronic products to improve pricing strategies and customer satisfaction. However, existing such dataset recommendation systems as ZhangRec23, WangRec22, and GDS19, exhibit several limitations, including a lack of focus on e-commerce datasets, difficulty handling complex queries and insufficient domain dataset metadata quality (descriptive information about the dataset as title and description). Accurate and comprehensive metadata is crucial for identifying and retrieving relevant datasets. A search for datasets to analyze “the impact of seasonal sales on customer reviews for electronic products” might receive incomplete recommendations that fail to integrate these aspects effectively. This limitation results in less effective recommendations such as Dataset A contains seasonal sales data for electronic products but no information on customer reviews, and Dataset B contains customer reviews for electronic products but no information on seasonal sales. ZhangRec23 is primarily designed for biomedical datasets and, when adapted for e-commerce, heavily relies on the quality and completeness of metadata, which can be inconsistent or incomplete in many e-commerce datasets. WangRec22 employs collaborative filtering techniques but does not handle complex queries well, often resulting in recommendations that only partially address the query. GDS19 is a keyword-based approach for recommending datasets but lacks the capability to understand the semantic context of the query.
This thesis proposes the E-commerce Datasets Mining Recommendation System (EDMRec), an extension of the ZhangRec23 system, which aims to address these shortcomings by providing accurate and relevant e-commerce dataset recommendations. EDMRec utilizes content-based filtering, advanced data processing, and machine learning techniques, structured into three primary layers: Data Collection, Data Processing, and Query Processing. Advanced data processing techniques include Named Entity Recognition (NER) which enhances incomplete metadata by extracting and adding missing contextual information and the use of TF-IDF and BERT a deep learning model designed to understand the context of words in a text to convert textual data into numerical vectors. The recommendation technique combines keyword relevance through Term Frequency-Inverse Document Frequency (TF-IDF) and semantic relevance through Bidirectional Encoder Representations from Transformers (BERT) embeddings, ensuring precise and contextually appropriate recommendations. This approach enhances the accuracy and utility of dataset recommendations, facilitating more effective utilization of e-commerce data for various analytical purposes.
Keywords — Data mining, Content-Based filtering, E-commerce Dataset Recommendation systems, Natural Language Processing.
Thesis Committee:
Internal Reader: Dr. Muhammad Asaduzzaman
External Reader: Dr. Yahong Zhang
Advisor: Dr. Christie Ezeife
Chair: Dr. Usama Mir