A method for modelling and executing customized pipelines in serverless computing

IRIS

Serverless computing is an emerging cloud service for executing distributed applications on cloud architecture. The possibility of performing functions without the need to manage any type of infrastructure has made this methodology particularly adopted in several fields, e.g., data processing and above all in parallel computing. The processing of large-scale genomic data needs many computational resources, resulting highly time-consuming. Therefore, the need of higher computing capabilities has translated into the increasing use of this technology. In this paper, we present a method for modelling and executing customized pipelines in serverless computing. We applied this one to the transcript-level expression analysis of samples from RNA sequencing (RNA-seq), by focusing on the most computationally expensive step: the mapping of reads to a reference genome. Our method has been implemented as an Amazon Web Services (AWS) Lambda function, that is deployed within our own serverless architecture. The parallel instances invoked in AWS Lambda are with negligible latencies, being managed by the provider, therefore, the average computational time are similar among experiments on similar samples. We denoted a relevant advantage in running time, by measuring an improvement up to 79.84% and 90.10% on the concurrent analysis of 10 samples, compared to the local environments having the following specifications: CPU 3.8 GHz 8 vcores and CPU 3.8 GHz 16 vcores, respectively.

A method for modelling and executing customized pipelines in serverless computing

Cinaglia P.;Cannataro M.

2023-01-01

Abstract

Serverless computing is an emerging cloud service for executing distributed applications on cloud architecture. The possibility of performing functions without the need to manage any type of infrastructure has made this methodology particularly adopted in several fields, e.g., data processing and above all in parallel computing. The processing of large-scale genomic data needs many computational resources, resulting highly time-consuming. Therefore, the need of higher computing capabilities has translated into the increasing use of this technology. In this paper, we present a method for modelling and executing customized pipelines in serverless computing. We applied this one to the transcript-level expression analysis of samples from RNA sequencing (RNA-seq), by focusing on the most computationally expensive step: the mapping of reads to a reference genome. Our method has been implemented as an Amazon Web Services (AWS) Lambda function, that is deployed within our own serverless architecture. The parallel instances invoked in AWS Lambda are with negligible latencies, being managed by the provider, therefore, the average computational time are similar among experiments on similar samples. We denoted a relevant advantage in running time, by measuring an improvement up to 79.84% and 90.10% on the concurrent analysis of 10 samples, compared to the local environments having the following specifications: CPU 3.8 GHz 8 vcores and CPU 3.8 GHz 16 vcores, respectively.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				AWS
bioinformatics
parallel analysis
pipeline
serverless
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/92777

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

social impact