The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

Parallel processing of genomics data

Agapito G;Guzzi PH;Cannataro M
2016-01-01

Abstract

The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/19193
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact