A web-based tool to analyze semantic similarity networks

Milano, M; Guzzi, Ph; Cannataro, M; Veltri, Pierangelo

doi:10.1145/2649387.2660801

In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO). The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). A SSM takes in input two or more terms of GO and produces as output a numeric value in the [0..1] interval representing their similarity. The use of SSMs to evaluate the functional similarity among gene products annotated with GO terms is gaining a broad interest from researchers [2]. More recently, the extensive application of SSMs yielded to the introduction of the so called Semantic Similarity Networks (SSNs), i.e. edge-weighted graphs where the nodes are concepts (e.g. proteins), and each edge represents the semantic similarity among related pairs of nodes. These networks are constructed by computing some similarity values between genes or proteins and then linking nodes whose similarity is greater than zero. A possible bias consists of the presence of meaningless edges with low similarity scores. Thus, a thresholding preprocessing can improve the building of SSN. Many methods for networks thresholding exist, for example, methods based on global threshold, or based on local thresholds. However, internal characteristics of SSMs [3] bring to exclude the use of global thresholds since small regions of relatively low similarities may be due to the characteristics of measures while proteins or genes have high similarity. Whereas the use of local threshold may be influenced by the presence of local noise and in general may cause the presence of biases in different regions. In a previous work, we presented a novel hybrid thresholding method employing both local and global approaches. The choice of the threshold is made by considering the evidentiation of nearly-disconnected components. The evidence of the presence of these components is analyzed by calculating the eigenvalues of the Laplacian matrix. The choice of this simplification has a biological counterpart on the structure of biological networks. It has been proved in many works that these biological networks tend to have a modular structure in which hub proteins (i.e. relevant proteins) have many connections [1]. Hub proteins usually connect small modules (or communities), i.e. small dense regions with few link to other regions [4] in which proteins share a common function. For these aims, the need for the introduction of a tool able to manage and analyze SSN arises. Consequently we developed SSN-Analyzer a web-based tool able to build and preprocess SSN. SSN-Analyzer has been implemented using the Shiny framework and the R statistical language. The tool enables the calculation of semantic similarity from input genes/proteins dataset as well as the construction of SSN their preprocessing step. It provides a simple Graphical User Interface allowing the user an easy access to the tool functionalities as depicted in Figure 1. The user may give as input data list of proteins and the related annotations for each one and may select the organism of genes/proteins dataset, the ontology MF, BP, CC and a semantic similarity measure. Then he/she may choose one of the measures implemented in R package csbl.go such as Resnik, ResnikGraSM, Lin, Linwith the GraSM option (LinGraSM hereafter), JiangConrath, JiangConrath with the GraSM option (JiangConrath-GraSM hereafter), Relevance, Kappa, Cosine, WeightedJaccard, and Czekanowski Dice. The output file (Figure1)(c) is a semantic similarity matrix that represents the adjacent of SSN. Finally user may simplify networks by choosing two different thresholding methods.