Introduction

The reduction of large complex systems to elementary units and their interactions is one of the primary modes of operation in science. Much of our understanding of the natural world stems from our knowledge on such basic building blocks; ranging from elementary particles, to atoms and molecules, to living cells and organisms. Typically elementary units at smaller scales combine in specific patterns to form larger units of increasing complexity and variety.

There are also many indications that higher order structures play an important role in the structural and functional organization of complex networks. For instance, many real–world networks contain certain small connectivity patterns, known as network motifs, in much larger numbers than expected in random graphs with conditionally independent edges1. In biological and technological networks these small recurring circuit elements appear to contribute to their function by performing modular tasks which is further supported by the observation that network motifs are broadly shared among networks representing systems with similar functions2.

One of the major difficulties in quantifying higher order network structures such as network motifs is the sheer number of potential interaction patterns that can exist in groups of even moderate size. For instance in undirected networks there are 11,117 different ways of connecting 8 vertices and in directed networks there are 9364 such motifs on just 5 vertices. As a result counting subgraphs quickly becomes an ineffective way of describing local network structures as the size of the subgraphs is increased. Moreover, subgraphs are coupled through a complex web of dependencies which further complicates the problem of formulating concise yet informative descriptors of the local structure of networks. As the size and hence potential variety of the patterns included in descriptions of local network structures is increased one is further faced with computational challenges due to the graph isomorphism problem and the complexity of subgraph enumeration.

In this article we introduce a nonparametric method that is based on the assumption that networks are made of higher order interactions that can take the form of any simply connected motif. Consequently, we consider generative models where networks are constructed using not only edges but also explicit copies of higher order subgraphs3,4,5. More specifically we focus on maximum entropy models that result from constraining the types and distributions of atomic subgraphs used to construct the network5. In contrast to exponential random graphs6 that result from maximizing the entropy under constraints on expected subgraph counts, these model can be studied analytically in terms of their properties3,4,5. Generative models with higher order motifs are also closely related to other representations and models of higher order interactions namely hypergraphs and simplicial complexes. Recent extensive research efforts in the area of hypergraphs and simplicial complexes7 have yielded methods and models for analysing the structural and dynamical properties of networked systems that comprise higher order interactions. For instance, recent studies have shown that the presence of higher order interactions can lead to markedly different behaviour in dynamical systems8,9,10. Despite this recent surge in interest in higher order networks most empirical data sets only contain pairwise interactions. Consequently, methods for extracting higher order interactions from graphs11 and other types of data12,13 are of much current interest.

In hypergraphs and simplicial complexes higher order interactions are clique-like i.e. nodes participating in interactions are assumed to be interacting uniformly with all other nodes participating in such interactions. Although restricting higher order interactions to cliques greatly simplifies models of higher order interactions, from an inference perspective, allowing only cliques is problematic since in principle higher order interactions could take the form of any motif. This is especially relevant, for instance, for directed networks as these usually do not contain large numbers of fully connected subgraphs. In order to circumvent these restrictions we consider higher order network representations, called subgraph configurations5, that can be understood as generalized hypergraphs where higher order interactions can take the form of any simply connected subgraph/motif. Nevertheless the method will in many networks identify cliques as higher order interactions as it implicitly selects motifs based on their density, symmetry and frequency making cliques primary candidates for higher order interactions if present in the network under consideration.

From a methodological standpoint the presented method is similar to inference based methods for community detection that use the Stochastic Block Models (SBMs) as generative models. The SBM14 and its degree corrected variant15 have been at the center of much of the progress in the area of community detection and approaches based on the statistical inference of SBMs15,16,17,18,19. Statistical inference methods have also been applied to time–dependent and multilayer networks20 and have found use in methods for the reconstruction of networks from noisy data21,22 and dynamical processes on graphs23. Although inference–based methods have mostly concentrated on the SBM and its variants more recently methods that consider alternative generative models have started to emerge. For instance11 introduced a method for inferring hypergraph representations from graph data that relies on random hypergraphs as generative models.

Our method differs from other approaches to motifs analysis1 in that it seeks to find explicit decompositions of networks into recurring subgraphs rather than comparing counts of subgraphs in the network against expected counts under a null model. In other words, instead of defining network motifs in terms of deviations from a null model of the network, we treat motifs as statistically significant higher order interaction patterns that are to be inferred from the data. For this we follow a nonparametric Bayesian approach which naturally balances goodness of fit and parametric complexity and, does not require ad-hoc assumptions regarding the types or frequencies of higher order motifs present in the data. This approach, being based on the principle of parsimony, naturally balances goodness of fit and model complexity making it possible to infer concise sets of higher order interactions from within large sets of candidate motifs. The presented approach builds upon previous work24 by formulating the problem in a more formal Bayesian framework and using more realistic degree corrected models that not only constrain the types and frequencies atomic subgraphs but also their distributions across the nodes of the network. As shall be demonstrated later, degree corrected models are in general a better fit for real world networks and hence result in a significantly improved method. Overall our results demonstrate that many empirical networks can be represented more parsimoniously by including higher order interaction in their representations. Moreover, being based on statistical inference the method also allow us to fit networks to generative models that more accurately reflect their higher order structure3,4,5. These models are higher order generalizations of the configuration models and are amenable to analytic and numeric studies of the topological and dynamical implications of higher order structures.

Results

Generative models for networks with higher–order motifs

We base our approach on generative models where vertices are connected not only by single edges but also copies of higher order subgraphs, that we call “atoms”. Assuming such a generative model leads us to consider latent spaces that correspond the atomic subgraphs added to the graph during the generation process. We call these latent states subgraph configurations. Subgraph configurations are decompositions of a network into smaller subgraphs and can be thought of as generalized hypergraphs where hyper edges have the form of any connected motif. More formally a subgraph configuration C on a set of vertices V is a set of subgraphs of the complete graph KV on V. Given such a C we define the atoms of C, M(C), as the set of motifs occurring in C. For a graphical illustration of a subgraph configuration see Fig. 1. Note that any subgraph configuration C can be transformed into a graph G by simply taking the union of the edges contained in the subgraphs in C i.e. E(G) = ⋃SCE(S). We says a subgraph configuration C covers G if E(G) = ⋃SCE(S).

Fig. 1: Example of a subgraph configuration.
figure 1

A subgraph configuration consisting two single edges, a triangle, a 4-cycle and a diamond. Edges are coloured according to the type of higher order subgraphs.

In the formulation of our method we use maximum entropy ensembles of subgraph configurations that results from imposing hard constraints on the counts and distributions of atomic subgraphs used to construct the network5. In these models, which are also known as microcanonical ensembles, all subgraph configurations that satisfy the given constraints are equi–probable. Consequently, the likelihood of configurations can be computed by simply counting the number of configurations that are compatible with the given constraints. Another advantage of using microcanonical models is that every configuration is compatible with a single set of parameters which allows marginal probabilities to be computed in closed form without computing costly integrals.

Motifs, automorphisms and orbits

First, we review some basic concepts that are essential in formulating the generative model starting with graph isomorphism. Two graphs G and H, are said to be isomorphic if there exists a bijection ϕ: V(G) → V(H) such that \((v,v{\prime} )\in E(G)\ \iff \ (\phi (v),\phi (v{\prime} ))\in E(H)\). Being isomorphic is an equivalence relation of which the equivalence classes correspond to unlabelled graphs also known as motifs. We denote motifs using lower-case letters and write G ≃ g. Automorphisms are special types of isomorphisms which map a graph to itself. Automorphisms are essentially vertex permutations that leave the structure of the graph unchanged. The set of all automorphisms of G form a group which we denote as Aut(G). Finally, the orbits of a graph are subsets of vertices that can be mapped onto each other by Aut(G) and correspond to classes of structurally identical vertices of G (see Fig. 2). The ith orbit of G is denoted as OG,i.

Fig. 2: Examples of motifs and their orbits.
figure 2

Some undirected and directed motifs with vertex colours indicating different orbits.

In the remainder of the text we shall not explicitly distinguish between undirected and directed graphs because in the context of subgraph configurations this distinction can be reduced to the set of atomic subgraphs and the character of their symmetries. The only modification required for the directed case is in the definition of graph isomorphisms which for directed graphs are required to also conserve edge directions.

Microcanonical models with fixed subgraph counts

The simplest type of subgraph configuration model (SGCM) on N vertices with atoms M = {m} is the ensemble of all subgraph configurations that contain exactly nm subgraphs of type m ∈ M. In the microcanonical ensemble all such configurations have equal probability which can be computed by simply counting the total number of such configurations:

$$P(C| {{{{{\bf{n}}}}}}_{m},M)={\left(\prod\limits_{m\in M}\left(\begin{array}{c}| {{{{{\mathcal{H}}}}}}_{N,m}| \\ {n}_{m}\end{array}\right)\right)}^{-1},$$
(1)

where \({{{{{\mathcal{H}}}}}}_{N,m}\) is set of all m-subgraphs of the complete graph KN. It follows from the definition of the automorphism group that:

$$| {{{{{\mathcal{H}}}}}}_{N,m}| =\frac{N!}{(N-| m| )!| Aut(m)| }.$$
(2)

In such models atomic subgraphs are distributed uniformly over the vertices and hence we shall refer to these types of models as homogeneous SGCMs. When M contains only the single edge motif, homogeneous models reduce to the classical Erdös Rényi random graph GN,e where all graphs with N vertices and e edges are equi–probable. Throughout this article we assume that networks are sparse i.e. nm = O(N) for which \(\log (P)\) in Eq. (1) can be approximated using Stirling’s formula24.

Subgraph configuration models and random graphs

Distributions over subgraph configurations such as the one introduced above can be turned into distributions over graphs by projecting subgraph configurations onto graphs. In general we shall assume that the graphs under consideration are simple i.e. that they do not contain parallel edges or self loops. In this case subgraph configurations can be projected onto graphs by simply taking the union of the edges of the subgraphs in a given configuration C. Using this projection any distribution over subgraph configurations {C}, denoted P(C), can be mapped onto a distribution over graphs where the probability of graph G is given by the sum of the probabilities of the subgraph configurations of which the projection is G, i.e.:

$$P(G)=\sum\limits_{C| {\bigcup }_{s\in C}E(s)=E(G)}P(C).$$
(3)

Note that subgraph configurations differ from other types of latent states, such as community assignments in the SBM, in that each subgraph configuration maps to a unique graph. Therefore, subgraph configurations can be seen as a general class of exact/lossless graph representations that includes many known graph representations including edge lists, adjacency list and bipartite representations.

Degree–corrected models

In homogeneous SGCMs atomic subgraphs are distributed uniformly over the vertices which in turn results in graphs with Poisson type degree distributions3. Therefore homogeneous models are in general not a good fit for empirical networks that have highly heterogeneous or power-law type degree distributions. Thus we consider degree–corrected subgraph configuration models where one not only constrains the counts of atomic subgraphs but also the number of atomic subgraphs attached to each vertex. For this we consider orbit degree sequences of configurations where dm,i(C)(v) specifies the number m-subgraphs in C that are attached to v at orbit i. For instance the central vertex of the configuration in Fig. 1 has edge degree 1, triangle degree 1, 4-cycle degree 1. Whereas these motifs all have a single orbit the diamond has two orbits i.e. one corresponding to the vertices with degree 2 and one corresponding to the vertices with degree 3 for which the central vertex of the configuration in Fig. 1 has degrees 0 and 1, respectively. Note that the orbit degree sequence is defined with respect to a subgraph configuration C and should not be confused with the number of subgraphs of type m containing a certain vertex v in the graph. As in the case of fixed subgraph counts the microcanonical ensemble of subgraph configurations having orbit degree sequence dm,i(v) is defined as the uniform ensemble over all configurations having orbit degree sequence dm,i(C)(v). As a result the probability of configurations can be calculated by counting the number of configurations with orbit degree sequence dm,i(C)(v)5. A brief derivation of the likelihood can be found in the Methods section. The degree–corrected subgraph configuration model is closely related to the model by Karrer and Newman introduced in4 which is defined in terms of a stub-matching process similar to the (edge) configuration model.

Note that the degree of a vertex v (counting parallel edges) is fully determined by the orbit degree:

$$d(v)=\sum\limits_{m,i}{d}_{m,i}(v)d({O}_{m,i}),$$
(4)

where d(Om,i) is the degree of vertices in orbit Om,i and hence the degree corrected model can model networks with heterogeneous degree distributions. Although we discard parallel-edges in our formulation the expected number of such multi-edges is O(1) for sparse models.

Coarse grained degree corrected models

Constraining the atomic degrees at the level of orbits in certain cases significantly increases the model complexity of DC-SGCMs due to the fact that the model requires a degree sequence of length ∣V∣ to be specified for every orbit of each atom in M. Although the Bayesian approach safeguards against over–fitting by balancing goodness of fit and parametric complexity, using models with high parametric complexity can inhibit the detection of regularities, i.e. lead to under-fitting, due to the high complexity cost associated to including such regularities in the model. Thus, in the context of statistical inference degree–corrected models that have lower model complexity for the same set of atoms need to be considered. It should however be iterated that the final choice should be made on the basis of how well each model fits the data, for instance as we shall do in this paper, via the posterior odds ratio.

The parametric complexity of DC-SGCMs can be reduced by aggregating components of the orbit degree sequence at different levels of granularity. One such option is to constrain atomic degrees at the level of motifs i.e. by the fixing the sum dm(v) = ∑idm,i(v) instead of each dm,i(v) separately which results in a model that requires ∣M∣ degree sequences. The parametric complexity of the model can be even further reduced by only constraining the total number of atomic subgraphs attached to each vertex i.e. dt(v) = ∑midm,i(v) which results in a model that only requires a single degree sequence. In the case of directed graphs we also consider a model where orbits are grouped according to the their in– and out–degrees because grouping orbits according to atoms will in general lead to a mixing of in– and out–degrees which in turn results in models where edge directions are random. For this we group orbits in to three groups corresponding to orbits that only have incoming edges (\({d}_{in}(v)={\sum }_{m,i| {d}_{out}({O}_{m,i}) = 0}{d}_{m,i}\)), only out going edges and have both in– and out–going edges, respectively. In this model the atomic degree sequence has 3 components and hence can be seen as a generalization of the configuration model where the in–, out– and mutual–edge degrees are conserved separately1. The likelihood of coarse grained ensembles can be obtained in closed form by the same procedure as the orbit degree model5.

Inference

Our goal is to infer a subgraph configuration C for given the graph G. Hence, we consider P(CG) which can be obtained via Bayes’ theorem:

$$P(C| G)=\frac{P(G| C)P(C)}{P(G)},$$
(5)

where P(GC) = 1 if ⋃sCE(s) = E(G) and 0 otherwise, and P(C) is given by the prior over subgraph configurations. Note that subgraph configurations are latent structures that fully determine the graph and therefore differ from other latent spaces such as node partitions in the SBM that do not uniquely determine the graph.

We first consider the case of degree-corrected subgraph configuration models. For an atomic degree sequence dm,o having motif set M and count vectors nm we assume a nested prior with a conditional dependence structure of the following form:

$$P(C)=P(C| {{{{{\bf{d}}}}}}_{{{{{\bf{m}}}}},{{{{\bf{i}}}}}})P({{{{{\bf{d}}}}}}_{{{{{\bf{m}}}}},{{{{\bf{i}}}}}}| {{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}})P({{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}}| M,e(G))P(M),$$
(6)

where e(G) is the number of edges in G. The above form applies to all variants of the degree–corrected SGCM, the only difference being the number of components in the degree sequence. Similarly, in the case of the homogeneous SGCMs we consider a prior that has the following form:

$$P(C)=P(C| {{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}})P({{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}}| M,e(G))P(M).$$

We seek to identify a subgraph configuration with maximum posterior probability (MAP) and will call such a configuration a MAP-configuration. Note that this is equivalent to inferring the latent state of the model rather than the parameters of the model. However, the two problems are closely related through Eq. (3).

Model selection

The model that is most suitable for describing a given network can be selected via posterior odds ratio. For this we first find the MAP-configuration for each model variant of the subgraph configuration model including homogeneous models. In the degree corrected case we consider models that constrain the distributions of atomic subgraphs at the level of orbits, atoms and total number of subgraphs. For directed networks in addition to these we also consider the directed orbit degree model. We then select the model-configuration combination that has highest posterior probability, which in turn is equivalent to selecting the configuration with the shortest description length. A more detail explanation of the model selection procedure is given in the methods section.

Minimum description length

The Bayesian formulation outlined above is equivalent to finding a subgraph configuration that has Minimum Description Length (MDL)25. Although the equivalence of MDL and Bayesian approaches holds more broadly, in our case it is more directly evident due to the discrete nature of the parameters.

$$\Sigma (C)=S(C| {{{{\bf{d}}}}},{{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}},M)+\epsilon ({{{{\bf{d}}}}},{{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}},M),$$
(7)

where Σ(C) is the description length (DL), \(S(C)=-\log (P(C| {{{{\bf{d}}}}},{{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}},M))\) the entropy which corresponds to the information required to specify the positions of the subgraphs given the model parameters and \(\epsilon ({{{{\bf{d}}}}},{{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}},M)=-\log (P({{{{\bf{d}}}}},{{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}},M))\) is the information required to describe the parameters of the model i.e. the model complexity. Consequently, finding a MAP-configuration is equivalent to finding a configuration with minimum description length.

The atomic structure of real world networks

Due to the computational complexity of the problem we restrict the set of motifs included in the analysis to connected motifs up to size 5 for directed networks and to motifs up to size 8 for undirected networks. There are 9578 such motifs in the directed case and 12112 in the undirected case. Despite the large number of potential motifs included in the analysis our method is able to identify concise sets of atoms (see ∣Mopt∣ in Table 1) that cover a large proportion of the edges in each of these networks.

Table 1 Description lengths of MAP-configurations

Model selection

In this section we compare different subgraph configuration models, and their fit to data. For this, we consider different models introduced in the text so far, namely the homogeneous micro-canonical ensemble, the orbit degree–corrected model, the atomic degree model and the total degree model. For directed networks we also consider directed degree model where orbits are aggregated based on edge directions. The MAP configurations corresponding to different types of degree–corrected models tend to be quite similar both in terms of the atoms and subgraphs they contain. In general though, the number of atoms in the MAP configuration tends to increase as the level of detail at which the atomic degree sequence is conserved decreases since this decreases the model complexity cost of incorporating new atoms into the configuration.

Once we obtain the MAP-configuration for each model we select the optimal configuration-model pair by comparing their posterior odds ratio/description lengths, see Table 1. We find that for undirected networks the total degree model results in the shortest description length, except for the Malaria Genes network. For the directed networks we find that in general the directed degree model or total degree model result in the shortest description lengths.

Our results show that the inclusion of higher order interactions in representations of real world networks leads to much more parsimonious representations when compared to their counterparts that only include dyadic interactions (i.e. single edges) with reductions in DL ranging from 297 to 13,007 nats. Note that this reduction in description length directly relates to the posterior odds ratio of the MAP configuration CM and the configuration consisting of all single edges i.e. E(G) :

$$\frac{P({C}_{M}| G)}{P(E(G)| G)}=\exp (\Sigma (E(G))-\Sigma ({C}_{M})),$$

where Σ(E(G)) is the DL of E(G) as given by the edge only configuration model. In other words we find that the networks in Table 1 are exponentially more likely to have been generated by the higher order models corresponding to their MAP configurations than by the edge only configuration model. Similarly, for real world networks we analysed, we find that degree corrected models in general result in significantly shorter description lengths than non–degree corrected models indicating that degree corrected models are an overall better fit to the data.

In the following section we provide a more detailed discussion of the MAP configurations for some of the networks in Table 1. The results for the remaining networks are discussed in Supplementary Note 5.

Co-authorship network of network scientists

We first consider a widely studied co-authorship network between network scientists26. We find that the MAP configuration (see Figure 3) of this network contains 15 nontrivial atoms other than the single edge that cover over 85% of the edges (see Table 2). As expected in a collaboration network the MAP configuration contains many cliques corresponding to publications with more than two authors. In addition to cliques we also find higher order interactions where two high degree coauthors collaborate with the same group of researchers (Table 2 Atoms 3 and 12) as well as patterns where two high degree authors collaborate with the same group of lower degree nodes but not with each other (Table 2 Atoms 5,7,8,13 and 14).

Fig. 3: Visualization of the MAP-configuration.
figure 3

Visualization of the largest connected component of the collaboration network between network scientists. Colours represent different atomic subgraph types in the MAP-configuration. A high resolution plot of the full network can be found in Supplementary Fig. 1.

Table 2 Atoms of the scientific collaboration network

Human connectome

Next we consider a directed connectome of the human brain from ref. 27. The network has 1015 vertices and 4,008 edges. We find that the MAP configuration contains 20 non-trivial patterns which cover ~75% of the edges. The MAP configuration contains a large number of feed-forward loops (FFL) (Atom 1 in Table 3) and bi–fan motifs (Atom 5 in Table 3) usually associated with neuronal networks. We also find a large number of acyclic square shaped motifs (Atoms 2 and 3 in Table 3) that can be interpreted as 4-node generalizations of the FFL, where an input node is connected to an output node via both direct connections and indirect connections in the form of longer directed paths. In addition to these basic types we also recover more complex atoms that can be interpreted as various combinations of these lower order atoms. Such larger arrangements of smaller motifs have recently been studied in ref. 28. However, our result indicate that in the connectome lower order motifs combine in denser and more complex patterns than the pairwise combinations of lower order motifs studied in ref. 28 (see Atoms 12-20 in Table 3). Interestingly, all atoms of the directed connectome contain at least one output node in the form of a node that has only incoming links.

Table 3 Atoms of the directed human connectome

E.coli Metabolic network

Next we consider the metabolic network of E.coli which is a part of a collection of 43 metabolic networks from29. For the E.coli metabolic network the MAP-configuration contains 11 non-trivial atoms (Table 4) which cover about 97% of the edges. The MAP configuration for the E.coli metabolic network has strong resemblance to a decomposition of the network into metabolic pathways. We find that the atoms in the MAP configuration of the metabolic network capture important features of metabolism and include motifs commonly used in biochemistry to describe metabolic pathways. For instance cycles (Atoms 3 and 9 in Table 4) and long bidirectional paths (Atom 10 in Table 4) are known to play an crucial role in regulating core metabolic pathways such as Gluconeogenesis30 and substrate cycles31. We also find atoms that can be interpreted as irreversible pathways (Atoms 2 and 6 in Table 4) where one metabolite is transformed into another in a controlled step-way fashion and atoms that can be interpreted as breaking down complex molecules into simpler ones which are then in turn combined to construct other larger molecules (Atoms 4 and 8 in Table 4). We also analysed the remaining 42 metabolic networks in the data set29 and found that the method finds very similar atoms across the networks in the data set (see Supplementary Note 5).

Table 4 Atoms of the metabolic network

C.elegans neuronal network of synaptic connections

Finally, we consider the network of synaptic connections in the adult hermaphrodite worms C. elegans32. This network has 454 nodes and 4,841 directed edges. The network corresponding to male adults and the undirected networks representing gap junctions are discussed in Supplementary Note 5. The MAP configuration for the synaptic network 24 nontrivial atoms that cover approximately 70% of edges. We find that the MAP configuration contains a large number of bi-fan motifs (Atom 4 in Table 5) and multiple motifs that correspond to various combinations of feed–forward–loops (Atoms 6,7,8 and 10 in Table 5) indicating that feed–forward–loops usually associated to neuronal networks are organized into larger atoms in network. In addition to these the MAP configuration also contains multiple atoms resulting from symmetric combinations of the triangular connection pattern (A ↔ B, C → A, C → B) i.e. atoms 12, 14, 18 and 19 in Table 5. The combinations we find in general contain three or more smaller motifs in specific combinations suggesting the presence of motifs than go beyond pairwise combinations of lower order motifs as studied in28. Finally, we also observe atoms containing chains and cycles of bidirectional edges such as the atoms 1, 2, 15, 17, 20 and 21 in Table 5 and directed cliques. We also find the same general classes of atoms in the MAP configuration of the network of the male C.elegans for which the results can be found in Supplementary Note 5.

Table 5 Atoms of the synaptic connection network of C. elegans

Discussion

Despite the ubiquity and importance of network data, many problems are still outstanding in the modelling and analysis of networks, especially in terms of local or microscopic features. The quantitative analysis of local network structures poses theoretical and computational challenges, not the least due to the large variety of potential local structures and their highly interdependent nature. In this paper, we addressed some of these problems by introducing a nonparametric method that is based on the assumption that networks are made of not only edges but also higher order building blocks. When considering real data this assumption naturally leads to the questions whether the network under consideration can be effectively decomposed into higher order building blocks and what shape or types of these building blocks occurring in such an optimal decomposition are? We address this problem by introducing an objective function in the form of a prior that allows evaluation/comparison of decompositions of networks into small subgraphs on the basis of the statistical evidence provided by the data. By considering large sets of potential higher order interactions simultaneously in a holistic fashion the method is able to infer concise sets of patterns from within thousands of candidates that for many empirical networks include higher order interactions of known structural and functional significance.

Due to its non-parametric nature the method does not require any prior assumptions on the types or frequencies of higher order interactions allowing these to be inferred from the data. Furthermore the method produces not merely a set of statistically significant patterns but an explicit decomposition of the network into higher order subgraphs that can be further used to examine the higher order structure of the network and its topological and dynamical implications. For instance the use of these representations in detection methods for higher order communities such as33 could provide insights into the interplay between local and large scale network structures.

The method also addresses the long standing problem of modelling the prevalence of triangles and other highly connected subgraphs observed in many real world networks by providing a fit of the network to analytically tractable models that better reflect its local structure than models based on conditionally independent dyadic interactions. Our empirical results show that inclusion of higher order interactions leads to more parsimonious representations for many real world networks providing empirical support for the inclusion of higher order interactions in models and representations of complex networks even when the network is initially given in terms of dyadic interactions only.

Our analysis still leaves some important questions open. For instance the generative models we use implicitly assume that pairs of atomic subgraphs very rarely intersect at more than one vertex. Consequently, models that incorporate more general intersections and coupling between higher order subgraphs and their use in inference based methods could lead to further refinement of our approach. Another possibility would be to incorporate community structures into generative models which could potentially lead to models and representations that can capture structures over multiple scales.

Finally, while in this article we opted for a greedy heuristic to find an optimal representation, developing alternative inference algorithms is an area for interest for future research. Especially MCMC algorithms that could potentially provide a more refined view of the posterior distribution. Ideally, such algorithms would combine optimization with subgraph discovery which is especially relevant for large networks where exhaustively finding subgraphs poses computational challenges.

Methods

Likelihood for degree-corrected SGCMs

In the degree corrected microcanonical ensemble5 the likelihood is given by: \(\log (P(C| {{{{{\bf{d}}}}}}_{m,i}))=-\log (\Omega ({{{{{\bf{d}}}}}}_{m,i}))\) where Ω(dm,i) is the number of all configurations with atomic degree sequence dm,i. In order to compute the likelihood we consider the model by Karrer and Newman4 which generalizes the edge configuration model to the case of general sets of atoms. In this model atomic stubs or partial subgraphs are attached to each vertex v reflecting its orbit degree dm,i(v). These are then matched at random in appropriate combinations to generate a subgraph configuration with desired orbit degree sequence. The matching processes for different atoms are independent and the number of possible matchings for a given m with atomic degree sequence dm,i is:

$${\Omega }_{m}({{{{{\bf{d}}}}}}_{m,i})=\frac{{\prod }_{i}(| {O}_{m,i}| {n}_{m})!}{{({\prod }_{i}| {O}_{m,i}| !)}^{{n}_{m}}{n}_{m}!{\prod }_{i,v}{d}_{m,i}(v)!}{\mu }_{m}^{{n}_{m}},$$
(8)

where nm is the number of m subgraphs and

$${\mu }_{m}=\frac{{\prod }_{i}| {O}_{i,m}| !}{| Aut(m)| }$$

is the number of distinct m-subgraphs that can be formed given the orbit memberships of its vertices.

However, Eq. (8) does include cases where two or more stubs corresponding to the same vertex are matched together resulting in a subgraph where two or more vertices are contracted into a single vertex. Moreover the expression also includes cases where a given subgraph is created more than once. In order to discount for such cases one can consider the probability (Pv(dm,i)) that no stubs attached to the same vertex are matched together and the probability P1(dm,i) that no subgraph is created more than once during the generation process. Resulting in the following expression for \(P(C| {{{{{\bf{d}}}}}}_{m,i},M)={\prod }_{m}{({\Omega }_{m}({{{{{\bf{d}}}}}}_{m,i}){P}_{v}({{{{{\bf{d}}}}}}_{m,i}){P}_{1}({{{{{\bf{d}}}}}}_{m,i}))}^{-1}\):

$$\log (P(C| {{{{{\bf{d}}}}}}_{m,i},M)) = -\sum\limits_{m}\left[-\log ({n}_{m}!)-{n}_{m}\log (| {{{{\rm{Aut(m)}}}}}| )\right.\\ +\sum\limits_{i}\left[\log ((| {O}_{m,i}| {n}_{m})!)-\sum\limits_{v}\log ({d}_{m,i}(v)!)\right]\\ -\frac{| {{{{\rm{Aut(m)}}}}}| {n}_{m}^{2}}{2}\prod\limits_{i}\frac{1}{{({n}_{m}| {O}_{m,i}| )}^{| {O}_{m,i}| }}{\left(\frac{\langle {d}_{m,i}^{2}\rangle }{\langle {d}_{m,i}\rangle }-1\right)}^{| {O}_{m,i}| }\\ \left.-\frac{1}{2}\left[ | m| \left(\frac{ \big\langle {\big({\sum }_{i}{d}_{m,i}\big)}^{2}\big\rangle }{ \big\langle {\sum }_{i}{d}_{m,i}\big\rangle }-1\right)-\sum\limits_{i}\left(\frac{\big\langle {d}_{m,i}^{2}\big\rangle }{\big\langle {d}_{m,i}\big\rangle }-1\right)\right]\right].$$
(9)

When the only atom is the single edge the above calculations are equivalent to counting number of graphs with a given degree sequence and the expressions above reduces to known results given in34,35 when M only contains the single edge in both the undirected and directed case. Similar closed form expressions can be calculated for the other variants of the model following the same techniques5. A brief derivation of these likelihoods is given in Supplementary Note 1.

Priors on atoms

Maybe the most challenging aspect of finding an optimal subgraph configuration is to identify a suitable set of atoms. For this, one requires priors over subsets of atoms/motifs. Although ideally the set of potential atoms should be kept as general as possible, in practice one is faced with the fact that the number of motifs increases super-exponentially with size as well as the computational complexity of finding subgraphs. Therefore, in most practical settings one is forced to restrict the set of potential atoms included in the analysis. For such finite sets of candidate motifs \({{{{\mathcal{M}}}}}\) one could in principle use an uninformative prior that assigns the same probability to every non empty subset M of \({{{{\mathcal{M}}}}}\) i.e. \(P(M)={({2}^{| {{{{\mathcal{M}}}}}| }-1)}^{-1}\). However, such a prior would assign each atom the same prior probability regardless of its size or structure whereas intuitively we would expect larger and more complex atoms to be less likely than simpler ones such as the single edge. Therefore we consider a prior which assumes that motifs occur independently with probabilities 0 < pm < 1. Resulting in a prior of the form:

$$P(M)=\prod\limits_{m\in M}{p}_{m}\times \prod\limits_{m\notin M}(1-{p}_{m}).$$
(10)

In order to assign the probabilities to motifs we consider a prior that is inspired by Rissanen’s universal prior for integers36. For this we initially assume that the set of candidate motifs \({{{{\mathcal{M}}}}}\) is infinite. To ensure that the prior is proper we require that \({\sum }_{m\in {{{{\mathcal{M}}}}}}{p}_{m}=\alpha \, < \, \infty \) for which the prior can be written as:

$$P(M)=\frac{1}{Z}\times \prod\limits_{m\in M}\frac{{p}_{m}}{1-{p}_{m}},$$
(11)

where \(Z={\prod }_{m\in {{{{\mathcal{M}}}}}}{(1-{p}_{m})}^{-1}\) and Z is well defined as a result of \({\sum }_{m\in {{{{\mathcal{M}}}}}}{p}_{m}=\alpha \, < \, \infty \). Following Rissanen’s construction for the integers we want the prior to be maximally non–informative which in this translates to the condition that the entropy diverges to infinity. The entropy is given by:

$$H({P}_{M})=\sum h({p}_{m}),$$
(12)

where \(h({p}_{m})=-{p}_{m}\log ({p}_{m})-(1-{p}_{m})\log (1-{p}_{m})\) is the binary entropy. Making use of the expansion \(h(p)=-p\log (\frac{p}{e})+O({p}^{2})\) for pm << 1 the divergence condition becomes equivalent to \({\sum }_{m\in {{{{\mathcal{M}}}}}}-{p}_{m}\log ({p}_{m})\) being divergent. Without loss of generality we assume that motifs in \({{{{\mathcal{M}}}}}\) are partially ordered according to pm (i.e. \(m{\prime} \le m\ \iff \ {p}_{m^{\prime} } \, \le \, {p}_{m}\)). Consequently, putting aside the normalization this reduces to the definition of Rissanen36 of a universal prior. The above equations have no unique solution but such priors can be shown to agree with respect to their leading order term. Consequently, one way of constructing a prior that satisfies the above conditions is to order motifs (i.e. map them on to integers) in a way that reflects prior expectations regarding the likelihoods of motifs and set pm to be equal to the probability of the integer index n(m) of m under a universal prior for the integers for instance \({p}_{m}={2}^{-{\log }_{2}^{* }(n(m)) + c}\). Here \({\log }_{2}^{* }(\cdot )\) is the iterated logarithm which is given by the sum \({\log }_{2}^{* }(x)={\log }_{2}(x)+{\log }_{2}({\log }_{2}(x))+...\) over positive terms and c a normalizing constant. Approaches to ordering motifs are discussed in Supplementary Note 2.

Prior for count of atoms

For the number nm of atomic subgraphs we consider the maximum entropy prior subject to the condition

$$\left\langle \sum\limits_{m}{n}_{m}{e}_{m}\right\rangle =E,$$
(13)

where E is the number of edges in G and em is the number of edges of m. This results in a prior that has the following form:

$$P({{{{{\bf{n}}}}}}_{{{{{\bf{m}}}}}}| E,M)={Z}^{-1}{e}^{-\lambda {\sum }_{m}{n}_{m}{e}_{m}}.$$
(14)

Where Z is the normalizing constant:

$$Z=\sum\limits_{{{{{\bf{n}}}}}\in {{\mathbb{Z}}}^{+| M| }}{e}^{-\lambda {\sum }_{m}{n}_{m}{e}_{m}}=\prod\limits_{m}\frac{1}{{e}^{\lambda {e}_{m}}-1}.$$

Imposing the constrains given by Eq.(13) we get:

$$E = -\frac{\partial \log (Z)}{\partial \lambda }\\ = {\sum}_{m}\frac{{e}_{m}}{1-{e}^{-\lambda {e}_{m}}},$$

which can be solved numerically for λ.

Priors for atomic degree sequences

The number of components in the atomic degree sequence depends on the set of atoms and the type of model under consideration. Moreover, various components of the atomic degree sequence can in principle differ significantly with respect to their distribution. In order to ensure that the prior effectively captures potential regularities in different types of atomic degree sequences we consider two potential priors. Then given a component of an atomic degree sequence we evaluate both priors and choose the one which assigns the degree sequence under consideration the highest probability.

We first discuss the priors for one dimensional degree sequences which form the basis for our prior for multi dimensional degree sequences. Given the sum over degrees ∑vd(v) = D of the degree sequence d(v) on N vertices one can simply assume that every degree sequence that is compatible with this condition is equally likely with probability:

$${P}_{1}(d| D)={\left(\left(\begin{array}{c}N\\ D\end{array}\right)\right)}^{-1},$$
(15)

where \(\left(\left(\begin{array}{c}m\\ n\end{array}\right)\right)=\left(\begin{array}{c}n+m-1\\ m\end{array}\right)\) is the number of m combinations with repetitions from a set of size n.

However, this prior favours Poisson type degree distributions in contrast to highly heterogeneous degree distributions found in many real world networks. Hence we also consider the hyper-prior introduced in ref. 16 that is compatible with more general degree distributions. In this prior the degree sequence is conditioned on a degree distribution η = {nk} where nk is the number of vertices having degree k, i.e:

$${P}_{2}(d| D)=P(d| {{{{\eta }}}})P({{{{\eta }}}}| D).$$
(16)

Assuming uniform priors for both factors:

$$P(d| \eta )=\prod\limits_{m,i}\frac{{\prod }_{k}{n}_{k}!}{N!},$$

and

$$P(\eta | D)=\prod\limits_{m,i}q{(D,N)}^{-1},$$

where q(m, n) is the number of restricted partitions of the integer m into at most n non-zero parts. Although no exact closed form expression is known for q(m, n) in practice it can be efficiently approximated to a high degree of accuracy, as described in16.

Multi dimensional atomic degree sequences

The simplest way of generating multi dimensional degree sequences is to assume that each component of the sequence is generated independently by a one dimensional prior P(dkDk) where P is chosen to be whichever one of P1 (Eq.(15)) and P2 (Eq.(16)) assigns dk a higher probability:

$$P({{{{{\bf{d}}}}}}_{k}| {{{{{\bf{D}}}}}}_{k})=\prod\limits_{k}\max [{P}_{1}({d}_{k}| {D}_{k}),{P}_{2}({d}_{k}| {D}_{k})].$$

Here the range of k depends on the model under consideration. For instance, if consider, the orbit degree model k runs over the orbits of the atoms of the model whereas if we consider the atomic degree model k runs over the atoms of the model.

Inference algorithm

Finding a subgraph configuration that maximizes the posterior is a set covering problem37 where we seek to cover the edges of G using subgraphs of G while maximizing the posterior. Covering problems are known to be NP-complete and hence it is unlikely that the problem can be solved exactly in polynomial time. Therefore we consider a greedy heuristic that, starting from an empty configuration, at each step identifies the atom that is most effective in covering edges where the effectiveness of an atom is measured in terms of the description length per edge. The effectiveness of atom m is determined by finding a set of m-subgraphs Cm that minimizes:

$${\sigma }_{m,t}=\frac{\Delta {\Sigma }_{t}({C}_{m})}{| (E(G)-E({C}_{t}))\cap E({C}_{m})| },$$
(17)

where E(Ct) is the set of edges covered by the configuration Ct at iteration t, E(Cm) is the set of edges contained in the subgraphs in Cm and ΔΣt(C) is the change in Σ when Cm is added to the current state i.e. Σ(Ct ∪ Cm) − Σ(Ct). Finding a set of m-subgraphs that minimizes σ is in itself a non-trivial problem which we approximate using a heuristic that seeks to identify a set of non-intersecting m-subgraphs that has maximal cardinality and contains no edges covered by Ct. For this, the algorithm iteratively finds m-subgraphs, on the set of edges not yet covered, where the search for such subgraphs first visits nodes having lower degrees in order to increase the number of non-intersecting subgraphs in Cm. At each step of the algorithm the Cm that minimizes σm,t is selected and added to the configuration until all edges are covered which in general occurs when the most effective atom is the single edge. Our implementation uses the LAD subgraph isomorphism algorithm38. A more detailed description of the inference algorithm is given in Supplementary Note 3.

Synthetic networks

We tested the algorithm on synthetic networks generated from various types of configuration models covering a wide range of atoms and atomic degree distributions. We found that the method in general correctly recovers the set atoms in synthetic examples and that the inferred and true (ground truth) subgraph configurations agree to a high degree of accuracy with respect to the subgraphs they contain. In all cases the model selection procedure was able to correctly identify the model that was used to generate the data. We also tested the method on networks generated by the edge only configuration model and found that the method correctly identifies that these do not contain any atoms other than the single edge. Moreover, we found that if based on non-degree corrected models the method sometimes identifies spurious atomic subgraphs in networks with heavy tailed degree distributions generated using the edge configuration model highlighting the importance of using degree–corrected models. Details of results and procedures used to generate synthetic networks can be found in Supplementary Note 4.

Here we should also note that as with almost any type of inference procedure there exist cases where our method would fail to identify all atomic substructures either due to sub-optimal solutions found by the inference algorithm or due to detection limits for substructures similar to the detection limits for SBMs39. In the context of subgraph configuration models detection limits are likely to occur when the models only contain a few copies of irregular/non-symmetric and sparse atoms within an otherwise relatively dense random graph.

Model selection using posterior odds ratio

We compare alternative models using posterior odds ratio. Given that all model variants have common parameter spaces of the same type i.e. atoms, counts and degree distributions setting up all models with the same prior allows for an objective comparison between models. Given two models M1 and M2 and MAP-configurations corresponding to these models CΣ,1 and CΣ,2 we consider the following posterior odds ration:

$${\Lambda }_{{C}_{\Sigma ,1},{C}_{\Sigma ,2}}=\frac{P({C}_{\Sigma ,1},{M}_{1}| G)}{P({C}_{\Sigma ,2},{M}_{2}| G)}$$
(18)
$$=\frac{P(G| {C}_{\Sigma ,1})P({C}_{\Sigma ,1}| \theta ({C}_{\Sigma ,1}))P(\theta ({C}_{\Sigma ,1}))P({M}_{1})}{P(G| {C}_{\Sigma ,2})P({C}_{\Sigma ,2}| \theta ({C}_{\Sigma ,2}))P(\theta ({C}_{\Sigma ,2}))P({M}_{2})}$$
(19)
$$=\exp ({\Sigma }_{2}({C}_{\Sigma ,2})-{\Sigma }_{1}({C}_{\Sigma ,1})),$$
(20)

where both models are assumed to have equal prior probability (P(M1) = P(M2) = 1/2) and P(GCΣ,1) = P(GCΣ,2) = 1 as both C1 and C2 are covers of G. Hence, \({\Lambda }_{{C}_{\Sigma ,1},{C}_{\Sigma ,2}}\) gives us a measure as to which configuration is more likely under the assumption that both models have equal prior probability. Note that this differs from direct model selection where the goal is to find which class of generative models is more likely given the data which would require the evaluation of \({\Lambda }_{{M}_{1},{M}_{2}}=\frac{P({M}_{1}| G)}{P({M}_{2}| G)}\). In general this requires the evaluation a sum over all covers of G for both models. However, in the context where the goal is to select between subgraph configurations and corresponding generative models \({\Lambda }_{{C}_{\Sigma ,1},{C}_{\Sigma ,2}}\) is the more informative quantity.