Association rule mining (ARM) is largely employed in several scientific areas and application domains, and many different algorithms for learning association rules from databases have been introduced. Despite the presence of many existing algorithms, there is still room for the introduction of novel approaches tailored for novel kinds of datasets. Because often the efficiency of such algorithms depends on the type of analyzed dataset. For instance, classical ARM algorithms present some drawbacks for biological datasets produced by microarray technologies in particular containing Single Nucleotide Polymorphisms (SNPs). In particular classical algorithms require large execution times also with small datasets. Therefore the possibility to improve the performance of such algorithms by leveraging parallel computing is a growing research area. The main contributions of this paper are: a comparison among different sequential, parallels and distributed ARM techniques, and the presentation of a novel ARM algorithm, named Balanced Parallel Association Rule Extractor from SNPs (BPARES), that employs parallel computing and a novel balancing strategy to improve response time. BPARES improves performance without loosing in accuracy as well as it handles more efficiently the available computational power and reduces the memory consumption.

Parallel and distributed association rule mining in life science: A novel parallel algorithm to mine genomics data

Agapito G;Guzzi PH;Cannataro M
2021-01-01

Abstract

Association rule mining (ARM) is largely employed in several scientific areas and application domains, and many different algorithms for learning association rules from databases have been introduced. Despite the presence of many existing algorithms, there is still room for the introduction of novel approaches tailored for novel kinds of datasets. Because often the efficiency of such algorithms depends on the type of analyzed dataset. For instance, classical ARM algorithms present some drawbacks for biological datasets produced by microarray technologies in particular containing Single Nucleotide Polymorphisms (SNPs). In particular classical algorithms require large execution times also with small datasets. Therefore the possibility to improve the performance of such algorithms by leveraging parallel computing is a growing research area. The main contributions of this paper are: a comparison among different sequential, parallels and distributed ARM techniques, and the presentation of a novel ARM algorithm, named Balanced Parallel Association Rule Extractor from SNPs (BPARES), that employs parallel computing and a novel balancing strategy to improve response time. BPARES improves performance without loosing in accuracy as well as it handles more efficiently the available computational power and reduces the memory consumption.
2021
Association rules mining; Parallel data mining; Genomics data
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/9906
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 11
social impact