The concept of massive data generation nowadays affects several domains such as marketing including electronic invoices (e-invoices) of large retailers, web access log files, healthcare, life sciences and so on. Datasets dimensions grow up, due to the availability of several cheap connected devices, such as mobile devices, RFID and wireless sensors networks, from which to collect data. Often, the collected data need to be gathered into a consistent, integrated and comprehensive form, to be used for knowledge discovery. Without adequately cleaning, transforming and structuring the data before the analysis, it is hard to mine useful knowledge. Thus, users by using data mining can extract knowledge from large invoices documents. In this paper, a pipeline for preprocessing and mining association rules from large retailers commercial documents has been proposed. The preprocessing provides merging, cleaning, formatting and summarization. The methodology can improve the quality of large retailers data by reducing the quantity of irrelevant data, making the remaining data suitable to mine association rules (ARM). Analyzing a real invoices dataset (provided by an Italian retailer) by using the proposed methodology, it was possible to extract 36 significant association rules, highlighting the customers’ behavior in the purchase of goods.

A pipeline for mining association rules from large datasets of retailers invoices

Agapito G.;Calabrese B.;Guzzi P. H.;Cannataro M.
2019-01-01

Abstract

The concept of massive data generation nowadays affects several domains such as marketing including electronic invoices (e-invoices) of large retailers, web access log files, healthcare, life sciences and so on. Datasets dimensions grow up, due to the availability of several cheap connected devices, such as mobile devices, RFID and wireless sensors networks, from which to collect data. Often, the collected data need to be gathered into a consistent, integrated and comprehensive form, to be used for knowledge discovery. Without adequately cleaning, transforming and structuring the data before the analysis, it is hard to mine useful knowledge. Thus, users by using data mining can extract knowledge from large invoices documents. In this paper, a pipeline for preprocessing and mining association rules from large retailers commercial documents has been proposed. The preprocessing provides merging, cleaning, formatting and summarization. The methodology can improve the quality of large retailers data by reducing the quantity of irrelevant data, making the remaining data suitable to mine association rules (ARM). Analyzing a real invoices dataset (provided by an Italian retailer) by using the proposed methodology, it was possible to extract 36 significant association rules, highlighting the customers’ behavior in the purchase of goods.
2019
9781450360852
Association Rules Mining
Data mining
Electronic Invoice
Machine Learning
Prepro-cessing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/62313
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact