Background

Network modularity represents the degree to which a network is divided into modules of separate community structures. A highly modularized network has dense connectivity between the nodes within each module but sparse connectivity between the nodes of different modules. Many plugins based on the Cytoscape platform [1] have been developed for modularity analysis in biological networks. For example, clusterMaker [2] implemented several clustering algorithms such as k-means, k-medoid, SCPS, and AutoSOME to visualize a structure of modules within biological networks. GIANT [3] was proposed to investigate topological or functional relationships in a metabolic network by performing a clustering analysis and a functional cartography of nodes. Another well-known plugin is NeMo [4], which can identify diverse network communities by means of a neighbor-sharing score based on a hierarchical agglomerative clustering method. These plugins have a limitation, though, in that they focus only on the structural analysis of a network and its visualization, without any consideration of dynamics analysis. This restricts their use to undirected networks such as protein–protein networks, or to analysis of directed networks that ignores the direction information.

Herein we note previous studies showing that dynamical behaviors, particularly robustness, of biological networks can be highly affected by their modularity characteristics. For instance, a recent study reported that a modular organization of cancer signaling networks is associated with the patient survivability, which suggests a relationship between modularity and network robustness [5]. Also, the robustness against state perturbations of a human signaling network was negatively correlated to network modularity [6]. Modular stabilizing in protein–protein interaction networks can be recombined to create highly robust chimeric proteins in evolution [7]. It has been also argued that modularity reduces robustness against mutation in metabolic networks [8]. Because of the importance of network modularity and robustness, there is a pressing need to develop a tool that can analyze both simultaneously. Accordingly, we devised a novel Cytoscape app called MORO that can analyze a relationship between dynamical robustness and structural modularity in biological networks represented by directed graphs. In addition, to make it possible to analyze very large-scale networks, we implemented the robustness computation portion of the app as a parallel algorithm by using the OpenCL library. It was also designed to efficiently visualize how the detected modules are located relative to each other. Furthermore, it elucidates analysis results of centrality and gene ontology (GO) enrichment of modules. Moreover, it provides a batch-mode simulation function to validate whether a result observed in a biological network is consistently conserved in many randomly organized networks. In this study, we tested our app in a case study investigating large-scale signaling networks and observed that modularity and robustness are negatively correlated, similar to previous findings [6]. It was verified by means of batch-mode simulation that these findings hold in random networks. Moreover, we found some GO terms which are differently enriched between the largest module and the rest of the modules, and it was shown that the module size is positively correlated with five centrality values. In summary, our app can efficiently analyze the relationship between modularity and robustness in large signaling networks.

Methods

Figure 1 illustrates the main process of our app. Firstly, a directed network is loaded for analysis. Next, the app computes the modularity and robustness of the network. In particular, the robustness algorithm was implemented in parallel computation by using the OpenCL library. The results can be visualized in three modes: a detailed visualization mode, a brief visualization with absolute relations, and a brief visualization with relative relations. Also, the results can be summarized in tables that include centrality and gene ontology analyses. Details of this process are given in the following subsections.

Fig. 1
figure 1

The overall process to analyze the relationship between the network robustness and modularity in MORO. After a directed network is loaded for analysis, the network modularity and robustness are calculated. In particular, the time consuming part is processed in parallel by using multi-core CPU or GPU. The analysis result can be checked by three types of visualization modes and a summary table. The centrality values and GO analysis of modules are additionally provided

Network modularity

Given a network represented by a directed graph G(VA) where V and A are the sets of nodes and interactions, respectively, we employ the modularity measure introduced in a previous study [9]. A partition P = {V 1, V 2, …, V M } of V is a set of nonempty disjoint subsets of V that covers V (i.e. V i  ∩ V j  = ∅ for all i, j ∈ {1, 2, …, M} and i ≠ j, and U M i = 1 V i  = V). Then, the modularity of the partition M(P) is defined as \( M(P)={\displaystyle {\sum}_{i=1}^M\left(\frac{\omega_{V_i{V}_i}}{\omega }-\frac{\omega_{V_i}^{in}{\omega}_{V_i}^{out}}{\omega^2}\right)} \), where \( {\omega}_{V_i{V}_i} \) is the number of interactions whose starting and ending nodes are both included in module V i , \( {\omega}_{V_i}^{out} \), and \( {\omega}_{V_i}^{in} \) are the numbers of interactions whose starting or ending nodes only, respectively, are included in module V i , and ω is the total number of interactions in the network. Then, the modularity of the network is defined as M(G) = max P  M(P). However, it is difficult to obtain the optimal partition. In our app, the modularity value of a network is averaged over 30 trials by using an optimization algorithm proposed in a previous study [10].

Robustness dynamics in a Boolean network model

A Boolean network model has been used to examine robustness-related dynamics of signaling networks and has been employed to investigate the dynamics of various biological networks [1117]. A Boolean network is represented by a directed graph G(V, A) where V = {v 1v 2, …, v N } is a set of Boolean variables and A is a set of ordered pairs of Boolean variables called directed links. Each v i  ∈ V has a value of 1 (‘on’) or 0 (‘off’) that represents the state of the corresponding element. A directed link (v i , v j ) has a positive (‘activating’) or negative (‘inhibiting’) relationship from v i to v j . In this model, each state s(t) = (v 1(t), v 2(t), …, v N (t)) at time t transits to the next state s(t + 1) according to the set of update rules F = {f 1, f 2, …, f N }, i.e., s(t + 1) = F(s(t)), where we randomly choose either a logical conjunction or disjunction for f i with a uniform probability distribution. For instance, if a Boolean variable v has a positive relationship from v 1, a negative relationship from v 2 and a positive relationship from v 3, then the conjunction and disjunction update rules are \( v\left(t+1\right)={v}_1(t)\wedge {\overline{v}}_2(t)\wedge {v}_3(t) \) and \( v\left(t+1\right)={v}_1(t)\vee {\overline{v}}_2(t)\vee {v}_3(t) \), respectively. In the case of the conjunction, the value of v at time t + 1 is 1 only if the values of v 1, v 2 and v 3 at time t are 1, 0 and 1, respectively. A state of G is defined as a vector of values v 1 through v N . A state trajectory starts from an initial state s(0) and eventually converges to either a fixed-point or limit-cycle attractor. Because these attractors can represent diverse biological network behaviors such as multistability, homeostasis, and oscillation, a change in the converging attractor can be interpreted as a loss of robustness. We denote the attractor converged to starting from an initial state s(0) by 〈s〉. The network is considered to be robust against mutation at v i if 〈s〉 is equal to \( \left\langle {s}_{{\overline{v}}_i}\right\rangle \), where \( {\overline{v}}_i\left(=\neg {v}_i\right) \) indicates the state perturbation of s subjected to v i . This concept to measure robustness has been widely used [1820]. More specifically, the robustness of a network γ(G) is defined as follows:

$$ \gamma (G)=\frac{1}{N\cdot \left|S\right|}{\displaystyle \sum_{s\in S}}{\displaystyle \sum_{i=1}^N}I\left(\left\langle s\right\rangle =\left\langle {s}_{{\overline{v}}_i}\right\rangle \right), $$

where S is the set of whole states (i.e. S = 2N), and I(⋅) is an indicator function. Because |S| is a very large number, we used a sample subset \( \tilde{S}\subseteq S \) with \( \left|\tilde{S}\right|=2N \) instead of S to calculate γ(G). Given a partition P = {V 1, V 2, …, V M }, we employed the in-module and out-module robustness of a module V i , γ in (V i ), and γ out (V i ), respectively, defined in [6] as follows:

$$ {\gamma}_{in}\left({V}_i\right)=\frac{1}{\left|\tilde{S}\right|}{\displaystyle \sum_{s\in \tilde{S}}}{\displaystyle \sum_{v\in {V}_i}}\frac{H\left({{\displaystyle \prod}}_{V_i}\left\langle s\right\rangle, \kern0.75em {{\displaystyle \prod}}_{V_i}\left\langle {s}_{\overline{v}}\right\rangle \right)}{\left|{V}_i\right|} $$

and

$$ {\gamma}_{out}\left({V}_i\right)=\frac{1}{\left|\tilde{S}\right|}{\displaystyle \sum_{s\in \tilde{S}}}{\displaystyle \sum_{v\in {V}_i}}\frac{H\left({{\displaystyle \prod}}_{V\backslash {V}_i}\left\langle s\right\rangle, \kern0.75em {{\displaystyle \prod}}_{V\backslash {V}_i}\left\langle {s}_{\overline{v}}\right\rangle \right)}{\left|{V}_i\right|}, $$

where \( {{\displaystyle \prod}}_{V_i}\left\langle s\right\rangle \) represents a projection operator to extract the partial attractor of a given subset V i  ⊆ V from an attractor 〈s〉, and H(〈s〉, 〈s ′ 〉) denotes a similarity measure between two attractors 〈s〉 and 〈s ′ 〉. More particularly, given 〈s〉 = s 0 → s 1 → … → s l − 1 and \( \left\langle {s}^{\prime}\right\rangle ={s}_0^{\prime}\to {s}_1^{\prime}\to \dots \to {s}_{l^{\prime }-1}^{\prime } \) (1 ≤ l ≤ l′ is assumed without loss of generality), H(〈s〉, 〈s′〉) is defined as follows:

$$ H\left(\left\langle s\right\rangle, \left\langle {s}^{\prime}\right\rangle \right)=\frac{1}{l^{\prime }}{\displaystyle \sum_{j=0}^{l-1}}\left(1-\frac{h\left({s}_j,{s}_j^{\prime}\right)}{N}\right) $$

where h is the Hamming distance (i.e. the number of different bits between two binary sequences). Then, the in-module and out-module robustness of a network, γ in (V i ) and γ out (V i ), respectively, are defined as follows:

$$ {\gamma}_{in}(G)=\frac{1}{M}{\displaystyle \sum_{i=1}^M}{\gamma}_{in}\left({V}_i\right) $$

and

$$ {\gamma}_{out}(G)=\frac{1}{M}{\displaystyle \sum_{i=1}^M}{\gamma}_{out}\left({V}_i\right) $$

Parallel computation of robustness

In our app, we employed a Boolean network model to compute robustness. In particular, we further calculated in-module and out-module robustness which represent how much the module subject to a perturbation and the groups of other modules, respectively, are robust against the perturbation. Unfortunately, it is very time-consuming to compute robustness. To reduce the running time, we implemented the robustness calculation part of the app as a parallel algorithm by using the OpenCL library (see Additional file 1: Text S1).

A batch-mode simulation on random Boolean networks

We developed a function for a batch-mode simulation on random Boolean networks (RBNs) to examine if a finding in biological networks holds in RBNs or not similarly in a previous study [12, 19, 2126]. The batch-mode simulation requires two steps for configuring parameters. The first step is to select an RBN generation model from among five models: Barabási-Albert (BA) model [27], Erdős-Rényi (ER) model [28], an Erdős-Rényi variant model [29] and two shuffling models [23, 30, 31], and the second step is to set the number of considered initial-states and the type of update-rule schemes (see the subsection “Robustness dynamics in a Boolean network model” for details). Once computations of modularity and robustness are completed, all results are saved in a resulting file, “net_based_result.txt” which describes modularity and robustness results of each RBN (see Additional file 1: Text S2).

Visualization of relations between modules

Our app provides three types of visualizations to show the relationship between modules. The first type is a detailed visualization mode in which all nodes and interactions of the loaded network are shown and the nodes are grouped into modules placed by using the Cytoscape group attributes layout. The second type is a brief visualization mode with absolute relations, in which a group node corresponds to a detected module and the weight of a link between group nodes denotes the number of interactions between a pair of modules. The last mode is the same as the second mode except that the weight of a link denotes the ratio of the number of interactions between a pair of modules to the maximal possible number of interactions between them, that is w/(n 1 n 2), where w is the number of actual interactions between the pair of modules, and n 1 and n 2 are the numbers of nodes included in each of the modules.

Module centrality and GO analyses

Many previous studies have shown that the centrality properties of genes/proteins in biological networks are strongly related to their functional roles in a topological or dynamical sense. To extend this concept to module-based centrality analysis, we implemented a function to examine five centrality measures including degree [32], closeness [33], betweenness [34], stress [35] and eigenvector [36] of modules (see Additional file 1: Text S3). Besides, we developed a GO analysis function to compare the functional difference between two groups of modules. To this end, we first identify two groups of genes by selecting some modules of interest. Then, Entrez gene id is mapped to UniProtKB by utilizing the web service at http://www.uniprot.org/ [37], and all relevant GO terms are extracted by using the web service at http://www.ebi.ac.uk/QuickGO/ [38]. Finally, GO terms which are most differently enriched between the two gene groups are listed in a table or exported into a text file.

Results and discussion

In this section, we tested MORO with two large-scale signalling networks, the canonical cell signaling network (STKE; http://stke.sciencemag.org) and the human signal transduction network (HSN; http://www.bri.nrc.ca/wang) which consist of 754 proteins and 1624 interactions, and 5443 genes and 37,663 interactions, respectively.

Analysis of modularity and robustness

The analysis and visualization results of the STKE and HSN networks are shown in Fig. 2 and Additional file 2: Figure S1, respectively. In particular, Fig. 2(a) and in Additional file 2: Figure S1(a) explain various summarized results including the number of modules, modularity, robustness, in-/out-module robustness, and centrality values. Specifically, the number of modules were 16 and 22, the modularity values were 0.72825 and 0.54534, and the robustness values were 0.67721 and 0.75400 in the STKE and HSN networks, respectively. By selecting the visualization option, we can observe the relation between the detected modules in three different modes: a detailed mode (Fig. 2(b) and in Additional file 2: Figure S1(b)), a brief mode with absolute relations (Fig. 2(c) and in Additional file 2: Figure S1(c)), and a brief mode with relative relations (Fig. 2(d) and in Additional file 2: Figure S1(d)). In the detailed mode, each module is represented by a circular group of genes and all interactions between the genes are presented in the network. In other words, the visualized network is actually same with the first given network except that the genes belonging to a same module are located close to each other. On the other hand, each module is represented by a single node and a relation between modules is represented by a directed link in both of the brief modes. The only difference between the two brief modes is that the weight of a link means the number of interactions between a pair of modules in the brief mode with absolute relations, whereas it means the ratio of the number of interactions between a pair of modules to the maximal possible number of interactions between them. By properly specifying the appearance ratio parameter which is defined the ratio of the number of interactions to be visible over the total number of interactions between modules, we can retrieve more reduced information about the brief relations between modules. For example about the STKE network, Fig. 2(e) and (f) shows the visualization results reduced from Fig. 2(c) and (d), respectively, by specifying the appearance ratio to 0.3. Then, we can identified which module is strongly interacting with or isolated from other modules (see Additional file 2: Figure S1(e) and (f) for the result of the HSN network).

Fig. 2
figure 2

Analysis results of the STKE network by MORO. a A summary table. Modularity and robustness results in module and network levels are listed in the upper and the lower tables, respectively. b Result of the detailed visualization mode. We found a total of 16 modules each of which is represented by a circular list of genes. c-d Results of the brief visualization mode with absolute and relative relations, respectively. Each module is represented by a single group node whose radius is proportional to the number of nodes belonging to the module. The weight of a link denotes the number of interactions between the corresponding pair of modules and the ratio of the number of interactions between a pair of modules to the maximal possible number of interactions between them in (c) and (d), respectively. e-f The reduced visualization results. They are subnetworks induced from (c) and (d), respectively, by removing all links except about 30% of links with the highest weight values (This is performed by specifying the appearance ratio parameter in MORO)

To validate effectiveness of our app, we also conducted the same case study about large-scale signaling networks as in a previous study [6] which reported that the network modularity tends to be negatively correlated to the robustness against state perturbations. To reproduce such a negative relationship between network modularity and the robustness in this study, we generated 6400 random Boolean networks and computed the robustness and the modularity of each network by using MORO. We note that this extensive simulation could be conducted in a practical time by the parallel implementation of main functions in MORO. As a result, we could observe the same negative relationship between the modularity and the robustness, consistent to the result in [6] (see Additional file 2: Figure S2(a)). In addition, we observed that the results of STKE and HSN are very close to the trend line of the random Boolean networks. Moreover, we could also observe that the in-module robustness is clearly negatively correlated with the network modularity (Additional file 2: Figure S2(b)), whereas the out-module robustness is not (Additional file 2: Figure S2(c)). In addition, the in-module robustness was positively correlated with the network robustness (Additional file 2: Figure S2(d)), whereas the out-module robustness was not (Additional file 2: Figure S2(e)). As explained in the previous study, we could also conclude that the negative relationship between network robustness and modularity is mainly caused by the relationship between in-module robustness and network modularity through intensive simulations using our app.

Time performance analysis

To show the computational cost of MORO, we examined the running time in calculating robustness and modularity in the HSN and STKE networks. We tested the app on a system with an NVIDIA GeForece GTX 680 GPU with 1536 processor at 1GHz, seven-core Intel(R) Core i7-4770 K CPU 3.50 GHz, and 16 GB of memory. Table 1 shows the result. In case of the HSN network, the speedup by the GPU-based parallel computation over the single-CPU was slightly greater than that by multi-core CPU, and both speedups were proportional to the number of considered initial states. On the other hand, it is interesting that the speedup by multi-core CPU was greater than that by GPU, and both were not proportional to the number of initial states in case of the STKE network. We infer that the analysis of the STKE network was terminated before the parallel computation power is fully utilized due to the relatively small size of the network. Taken together, we can efficiently analyze the relation between robustness and modularity in large-scale networks by parallel computation with two options, multi-core CPU and GPU.

Table 1 Running time of MORO

Module centrality analysis

After we obtain the modular structure of a network, we can analyse the centrality of modules based on the brief mode visualization result. Specifically, we consider a module network where a node and a link represent a module and the set of interactions between a pair of modules, respectively. Then, we can examine five well-known centrality values such as degree, closeness, betweenness, stress, and eigenvector in the module network. In this case study, we examined the change of the centrality values against the module size, which is defined by the number genes belonging to a module, in the STKE (Fig. 3) and HSN (Additional file 2: Figure S3) networks. It is interesting that all centrality measures or all except closeness showed the positive relations with the module size in the STKE and HSN networks, respectively. In other words, the module was likely to be more central as the module size gets larger. To investigate if this property is reserved in random networks, we generated two groups of 100 random networks by shuffling interactions of the STKE and HSN networks while preserving a degree distribution, and examined the change of the centrality values against the module size (see Additional file 2: Figures S4 and S5). Similar to the result in the signaling networks, the module size was positively correlated with the centrality values in the random networks. This suggests that the hub modules might play an important role in the community network [3941]. Additionally, we examined the relationship between the in-/out-module robustness and the module centrality values in the STKE and HSN networks (see Additional file 2: Figures S6 and S7). Unlike the relation with the module size, the in-/out-module robustness was not significantly correlated with the centrality values. In other words, the centrality of modules cannot indicate the in-/out-module robustness in the signaling network.

Fig. 3
figure 3

Changes of module centrality values against the module size in the STKE network. a-e Results with respect to degree, closeness, betweenness, stress, and eigenvalue. The module size which is defined as the number of nodes belonging to the module showed positive relationships with all module centrality measures. The correlation coefficients in (a)-(e) were 0.75339, 0.564168, 0.599553, 0.657316, and 0.511411, respectively, with all p-value < 10−4

GO analysis

It is possible to analyze GO enrichment [42] by using MORO. To show this function, we first specified two groups of genes, which consist of the genes in the largest module (1042 genes) and the rest of genes (4401 genes), respectively, in the HSN network. Table 2 shows all GO terms which were more enriched in the largest module than in the others: cytoplasm, nucleus, and protein complex in cellular component terms; protein, metal ion, nucleotide, and DNA bindings in molecular function terms; gene expression, viral process, and regulation of DNA-templated transcription in biological processes terms. As a result, MORO can provide the useful information about GO analysis between any two groups of modules.

Table 2 GO analysis in the HSN network

Conclusions

Many recent reports have reported that robust behavior against mutations might be correlated to the modularity of a signaling network. Motivated by these results, we developed a novel Cytoscape app called MORO, which can analyze the relationship between network robustness and modularity. We implemented it in parallel by using the OpenCL library to allow application to very-large-scale networks. In addition, our app can provide information about topological relations between modules by means of various visualization modes and centrality analysis. MORO includes also five centrality measures which can examine how centrally each module is positioned in terms of relations among the modules. Moreover, it can conveniently analyze the gene ontology enrichment of modules only if Entrez id of gene is given. A batch-mode simulation function was also included to allow verification of whether a finding is a design principle of random networks. In the future, MORO will be extended to consider various types of mutations such as a knockout and edge mutation, and to analyze publicly-available signaling networks represented by ordinary differential equations by devising a conversion method from continuous models to Boolean networks.