Background

Protein-protein interactions (PPIs) are critical for virtually every biological process. Diverse experimental techniques for detecting PPIs have been developed and have improved dramatically in the last decade, i.e., yeast two hybrid (Y2H), affinity chromatography, co-immunoprecipitation (Co-IP), and fluorescence resonance energy transfer (FRET) [1, 2]. Advances in chip techniques also enabled the applicability of protein chips in detecting PPIs under diverse conditions in a high-throughput manner [1]. High-throughput screenings of PPI have also been carried out for various organisms, including yeast [2], worm [3], fruit fly [4], and human [5]. The large amount of data accumulated from various sources has posed a grand challenge in data reliability and the searching, analysis and filtering for PPI.

In order to facilitate PPI searching, a number of systems provide batch input and output functionality, such as Genes2Networks [6], Ulysses [7], T1DBase [8], and the Arabidopsis Interactions Viewer [9]. Genes2Networks provides a dynamic linkable three-color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the query lists. In Ulysses, users can project model organism gene properties onto homologous human genes to perform interolog analysis. T1DBase provides various aspects of information regarding type 1 diabetes and includes an interaction network viewer. In addition to the type 1 diabetes PPI network, this viewer can also be used to construct other networks of interest. The Arabidopsis Interactions Viewer mainly focuses on the Arabidopsis PPI information and is designed for an interactome of Arabidopsis predicted from interacting orthologs in yeast, worm, fruit fly, and human. Using these services and packages, networks in different species or conditions can be searched, downloaded and visualized.

The above described services can easily perform searches and construct networks from user-supplied queries. However, the analyses of these networks require other software packages, which may have incompatible input formats and complex interfaces. There are several network analysis tools for PPI network evaluation, such as Pajek [10], CentiBiN [11], and NetworkAnalyzer [12]. These tools support the calculation of node centralities, such as degree centrality, closeness centrality, betweenness centrality, and cluster coefficient, to name a few. The analysis of node centrality characteristics in a network serves as an efficient means to understand the relative roles and features of each node. Various studies have suggested that proteins with larger numbers of interactions (hubs) are more critical [13

Figure 2
figure 2

The analysis results and downloadable items provided by POINeT. In downloadable items, (A) attr-Query has the record of the input query of genes. The table ppi-AllPPI contains all the PPIs resulting from the query. The nodes involved in ppi-AllPPI will be identified and recorded in the attr-Interactor table. The nodes with degree >= 2 are defined as mediators and recorded in the attr-Hub table. The nodes of the attr-Hub table form a network, which is denoted as ppi-Degree2. If two interactors of one interaction were both present in the attr-Query table, this interaction will be documented in ppi-QQPPI. Interactors in the ppi-QQPPI network will be recorded in the attr-QQ table. POINeT will merge ppi-QQPPI, ppi-GOPPI, and ppi-InterologsPPI into the ppi-FilteredPPI. This network contains PPIs with higher reliabilities and certain biological significances. (B) A simple PPI network is provided to illustrate the components of the network. Query nodes are marked with red circles; mediators (nodes connecting more than two nodes) other than query nodes are marked with blue circles. QQPPI are shown in black lines. GOPPI are shown in red lines. InterologousPPI are shown in green lines.

Protein-protein interaction filtering component

Interaction Filtering Using Biological Characteristics

POINeT provides three types of PPIs, including PPIs among queries (Query-Query PPI), PPI in which interactors share the same GO terms (GO PPI), and interologs' PPI. Moreover, various literature references, i.e. [32], have shown that proteins sharing the same GO terms are more likely to interact with each other. POINeT has the option to match PPIs sharing the same GO terms. Using the ortholog information available for various species, PPI networks can be mapped to different model organisms. For every species available in POINeT, the interolog PPIs can be inferred from the experimental PPIs in other species. For example, predicted human PPIs can be inferred from the experimental PPIs of mouse, worm, fly, yeast, and even Arabidopsis (though the number of predicted PPIs from the latter is much smaller than those of the other model organisms). In short, POINeT provides functions to filter experimental PPIs and to infer interolog PPIs. Through these different settings, PPIs among proteins with similar biological functions can be filtered and revealed, permitting an in depth analysis of unsorted PPIs.

Interaction Filtering Using Tissue-Specific Expression Profiles

SymAtlas [33] has included tissue-specific expressions of 79 tissue types from human and mouse. The expression profiles of NCI60 cell lines from SymAtlas are also incorporated. SymAtlas used human and mouse U133A microarray from Affymetrix, along with custom-made chips, GNF1H (for human) and GNF1M (for mouse). Each probe on the microarray can be mapped to corresponding genes with conversion tables provided by Affymetrix and the Genome Institute of the Norvatis Research Foundation (GNF). With the information available, the expression levels of interactors (genes) in PPI networks can be presented in an integrative way based on user-selected tissues or cell lines. In addition to tissue-specific genes, tissue-specific PPIs can also be filtered and inferred with these expression profiles.

Protein filtering component

Protein Filtering Using Centralities

The analysis of node centrality characteristics in a network serves as an efficient means to understand the relative roles and features of each node. Several centrality measurements are available in POINeT, including degree centrality, closeness centrality, eccentricity, radiality, and centroid values. The meanings and detailed description of these centralities is available in textbook [34]. Degree centrality is the number of edges associated with a node, normalized to a quantity from 0 to 1 by dividing by the maximum associated edge number in the sub-network. High-degree nodes in a protein interaction network tend to correspond to proteins that are essential and may be a good predictor of their biological importance [13]. Closeness centrality (CC) can identify nodes closer to other nodes in the biological network [35]. In our implementation, larger values indicate that the paths between the given nodes to all other nodes are shorter. Eccentricity is the longest distance required for a given node to reach the entire network. In graph theory, the set of vertices with the minimum eccentricity is denoted as the center of a graph. Radiality centrality (RC) is similar to closeness centrality. The path lengths from one node to all other nodes are subtracted by the maximum shortest path length of the network, then summed and averaged, and the absolute value taken [36]. Compared to nodes with smaller radiality, nodes with larger values are closer to all other nodes. Centroid values identify optimal positions (nodes with positive values) in a network. Before the calculation of centrality values, POINeT will identify sub-networks included in the ppi-AllPPI. An individual sub-network can be selected for centrality analysis. Some centralities by definition can only be evaluated on connected graphs, such as CC, RC, and Centroid. The results of these calculations can all be downloaded directly from the web page. These centrality values can also be applied to prioritize nodes in the network.

Protein Filtering Using Sub-Network Specificity Scores

Biological networks are likely comprised of several sub-networks or functional modules contributing to various diverse biological processes [37]. A node may have negligible impact on the global network or global properties, yet is influential on a sub-network with specific functionality. Therefore, it is desirable to devise a measurement to reflect the sub-network specificity of nodes. Moreover, it has been shown that data fusion using rank combinations can improve the specificity of the ranking results [38].

Thus, two scores were proposed and merged in this work. One score is the ratio between the sub-network degree and the global degree of a given node:

where i is the designated node, is the degree of node i in sub-network N, and is the degree of node i in the global network. The score refers to the proportion of interactions contributed to the sub-network by node i. A larger score implies that the node has higher preference over the given sub-network.

The other score is based on the statistics of node degree distributions in randomly sampled sub-networks. A bootstrap method has been used to sample the degree of node i in 1000 random sub-networks with the same size as the designated one. The Z-score for the degree of node i is calculated as follows:

where μ is the mean of the node i degree distribution in random sub-networks, and σ is the standard deviation of the random degree distribution. The Z-score provides a statistical evaluation on the significance of the degree of node i, namely whether the degree of node i is likely to have resulted from the random sampling of sub-networks.

These two scores are highly correlated since they are based on the same concept – the differential distribution of node degrees in sub-networks and the global network. If most of the interactions of a node are contributed to a given sub-network, we assume that this node is significant to this sub-network and not to the other sub-networks or the global network. That is, the node is "specific" to the designated sub-network. However, there are minor disagreements on the local ranks given by these two scores. To make the most out of the two scores, a data fusion model has been applied to merge the two scores [38]:

where is the rank of node i by the score, and R(z i ) is the rank of node i by the z score. S3 refers to the "Sub-network Specificity Score," which is the rank with the combination of the two proposed scores.

Output component

The query results of POINeT can be downloaded in multiple formats, including Excel, sif (simple interaction format), and txt formats. Using the exported sif format, ppi-AllPPI, ppi-Degree2, ppi-FilteredPPI and all attributes can be downloaded. Tissue-specific expression profiles can also be exported into individual attribute files. The query results in sif format can be easily integrated with tissue-specific expression profiles, and visualized in CytoScape [39]. Also, plain text files can be downloaded as well. However, Excel and txt formats do not support the export of tissue-specific expression profiles.

Network viewer

POINeT provides a straightforward viewer with sufficient functionalities. No additional software installations are required. Networks and tissue-specific expression profiles can be visualized directly in the browser. The viewer supports zooming and panning of the networks. The concept of layers in geographic information system (GIS) [40] was adopted. Different output results were defined as different layers. Through the selection of different layers, ppi-QQPPI, ppi-GOPPI and ppi-InterologsPPI can be displayed individually or as a merged network, ppi-FilteredPPI. The labels on each layer can also be turned on/off, as can the labeling of selected nodes. Finally, nodes can be selected to display the associated interactions, PubMed IDs, and Gene Ontology annotations, and provide the links to external databases. Also, tissue-specific expression values are treated as attributes of the nodes in the network. Using the concept of layers adopted from GIS, different tissue expressions can be selected and displayed for the same nodes to facilitate the analysis and comparison of these expression profiles. The network viewer provided by POINeT permits users to observe gene expression levels of the same PPI network in different tissue types.