Real-world objects are usually defined in terms of their own relationships or connections. A graph (or network) naturally expresses this model though nodes and edges. In biology, depending on what the nodes and edges represent, we may classify several types of networks, gene–disease associations (GDAs) included. In this paper, we presented a solution based on a graph neural network (GNN) for the identification of candidate GDAs. We trained our model with an initial set of well-known and curated inter- and intra-relationships between genes and diseases. It was based on graph convolutions, making use of multiple convolutional layers and a point-wise non-linearity function following each layer. The embeddings were computed for the input network built on a set of GDAs to map each node into a vector of real numbers in a multidimensional space. Results showed an AUC of 95% for training, validation, and testing, that in the real case translated into a positive response for 93% of the Top-15 (highest dot product) candidate GDAs identified by our solution. The experimentation was conducted on the DisGeNET dataset, while the DiseaseGene Association Miner (DG-AssocMiner) dataset by Stanford’s BioSNAP was also processed for performance evaluation only.

Identifying Candidate Gene–Disease Associations via Graph Neural Networks

Cinaglia P.;Cannataro M.
2023-01-01

Abstract

Real-world objects are usually defined in terms of their own relationships or connections. A graph (or network) naturally expresses this model though nodes and edges. In biology, depending on what the nodes and edges represent, we may classify several types of networks, gene–disease associations (GDAs) included. In this paper, we presented a solution based on a graph neural network (GNN) for the identification of candidate GDAs. We trained our model with an initial set of well-known and curated inter- and intra-relationships between genes and diseases. It was based on graph convolutions, making use of multiple convolutional layers and a point-wise non-linearity function following each layer. The embeddings were computed for the input network built on a set of GDAs to map each node into a vector of real numbers in a multidimensional space. Results showed an AUC of 95% for training, validation, and testing, that in the real case translated into a positive response for 93% of the Top-15 (highest dot product) candidate GDAs identified by our solution. The experimentation was conducted on the DisGeNET dataset, while the DiseaseGene Association Miner (DG-AssocMiner) dataset by Stanford’s BioSNAP was also processed for performance evaluation only.
2023
deep learning
gene disease associations
graph neural network
link prediction
neural network
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12317/88537
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 7
social impact