Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data

Kim, Daniel; Tran, Andy; Kim, Hani Jieun; Lin, Yingxin; Yang, Jean Yee Hwa; Yang, Pengyi

doi:10.1038/s41540-023-00312-6

Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data

Review Article
Open access
Published: 19 October 2023

Volume 9, article number 51, (2023)
Cite this article

Download PDF

You have full access to this open access article

npj Systems Biology and Applications

Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data

Download PDF

Daniel Kim^1,2,3^na1,
Andy Tran^1,3,4^na1,
Hani Jieun Kim^2,3,
Yingxin Lin^1,3,4,
Jean Yee Hwa Yang ORCID: orcid.org/0000-0002-5271-2603^1,3,4 &
…
Pengyi Yang ORCID: orcid.org/0000-0003-1098-3138^1,2,3,4

13k Accesses
9 Citations
2 Altmetric
Explore all metrics

Abstract

Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.

Gene regulatory network inference in the era of single-cell multi-omics

Article 26 June 2023

A review on gene regulatory network reconstruction algorithms based on single cell RNA sequencing

Article 30 November 2023

Towards precise reconstruction of gene regulatory networks by data integration

Article 16 May 2018

Introduction

The transcriptional regulation of genes underpins all essential cellular processes and is orchestrated by the intricate interplay of many molecular regulators¹. At the forefront of gene regulation are transcription factors (TFs), which interact with specific regions of DNA called cis-regulatory elements (CREs), such as promoters and enhancers^2,3. Together, the interactions between TFs, CREs, and genes form gene regulatory networks (GRNs), which govern cell identity and cell fate decisions⁴ and play an important role in the development and progression of various diseases⁵. With the advancement of high-throughput omics technologies, it has become possible to profile the many molecular features involved in gene regulation. However, the reconstruction of these networks possess significant challenges that necessitate the development of powerful and efficient computational tools to unravel the regulatory interactions of GRNs.

The earliest computational GRN inference methods were developed to leverage data from microarray and RNA-sequencing (RNA-seq) technologies, which quantitatively measure the RNA expression of whole cell populations (Fig. 1)⁶. These methods identified potential regulatory relationships by identifying co-expressed genes using measures of association, such as mutual information and correlation^7,8. However, these methods were unable to incorporate information of the epigenetic changes that drive gene regulation, restricting their ability to assess the accessibility of regulatory binding sites, including those of TFs. These limitations were alleviated by the expansion from bulk transcriptomics to bulk multi-omics (Fig. 1) sequencing technologies such as ATAC-seq, which can be employed to identify accessible regions of chromatin that may be bound by TFs either upstream or downstream of target genes;⁹ Hi-C, a technique for measuring genome-wide chromatin conformation to capture structural changes and chromatin interactions;¹⁰ and ChIP-seq, which captures genome-wide protein to DNA interactions, including TF binding sites of enhancers and promoters¹¹. Yet, despite their ability to uncover mechanistic insights to capture regulatory relationships more reliably, bulk sequencing technologies lack the ability to capture cell type and/or state-specific information.

**Fig. 1: Schematic illustration of the parallel development and evolution of GRN inference and sequencing technologies.**

The advent of single-cell omics technologies has revolutionized our ability to uncover cellular heterogeneity at the single-cell resolution (Fig. 1)¹². Data generated by techniques such as single-cell RNA-seq (scRNA-seq)¹³, single-cell ATAC-seq (scATAC-seq)⁹, single-cell Hi-C (scHi-C)¹⁴, and single-cell ChIP-seq (scChIP-seq)¹⁵ have led to a renewed interest in developing a new generation of computational methods that can now infer regulatory relationships between regulators and their target genes at the cell type, cell state, and single-cell level^16,17,18. Additionally, single-cell omics technologies have evolved from profiling single modalities (e.g., scRNA-seq, scATAC-seq) towards capturing multiple modalities at the single-cell resolution (i.e., “single-cell multi-omics”)¹⁹. In particular, a range of novel sequencing platforms have the ability to simultaneously profile RNA and CRE accessibility within a single cell, such as SHARE-seq and 10x Multiome^20,21. Consequently, these technologies have led to the development of new GRN inference methods that exploit these data to further comprehensively recapitulate regulatory networks at the cell type and cell state level^22,23.

However, navigating through the multitude of GRN inference methods and understanding how they infer regulatory connections can be a challenging task, particularly for researchers who may not have a quantitative background. Furthermore, the sheer number of available GRN inference methods can make it difficult to determine the most suitable method for a given research question of interest. To this end, we aim to assist both researchers and method developers by reviewing the methodological underpinning of GRN inference by categorizing the latest GRN inference methods developed for paired scRNA-seq and scATAC-seq data. We start by briefly describing the history of GRN inference methods and their evolution from bulk to single-cell sequencing technologies, including the underlying theoretical foundations for GRN inference that are commonly employed. For a more comprehensive overview, readers are encouraged to read previous reviews that have extensively covered earlier GRN inference methods^5,24,25,26. Thus, we provide a detailed review of recent methods that reconstruct GRNs using single-cell paired multi-omic data, including their strengths and potential limitations. Finally, we discuss the current challenges of GRN inference methods and the potential directions that we hope will inspire future method development in this field.

Methodological foundations of GRN inference

GRN inference relies on statistical and algorithmic principles to uncover regulatory connections between genes and their regulators. By leveraging various techniques such as correlation, regression, probabilistic models, dynamical systems and deep learning (Fig. 2), researchers can effectively model and infer regulatory architectures underlying biological systems. Here, we briefly discuss the frequently used statistical approaches and the underlying assumptions of the current GRN inference methods for paired multi-omic data.

**Fig. 2: The major classes of methods for paired single-cell multi-omics GRN inference methods.**

Correlation-based approaches

One of the most common approaches for reconstructing GRNs is motivated by the concept of “guilt by association”. In other words, genes that are co-expressed are assumed to be functionally related or co-regulated. For example, the co-expression of a TF and its putative target gene may suggest a regulatory relationship between the two. Similarly, CREs and their target genes can be determined by correlating the accessibility of CREs and expression levels of putative target genes. Commonly used measures of association include the parametric Pearson’s correlation and the non-parametric Spearman’s correlation, which can capture linear and nonlinear associations respectively (Fig. 2). Linear correlation can effectively detect relationships where an increase in TF expression or CRE accessibility leads to a proportional change in gene expression. However, nonlinear correlation can capture more complex relationships, which may better recapitulate the regulatory interactions between TFs, CREs and genes²⁷. Other approaches include mutual information, a non-parametric method based on information theory, which measures the dependence between two variables⁸.

While correlation analysis may provide valuable insights into potential regulatory relationships, it is important to note that correlation alone has clear limitations. For example, correlation cannot identify which is the regulator and target if the expression levels of two TFs are correlated, nor exclude the possibility of their regulation by a third TF. Furthermore, the correlation measure will have difficulty in distinguishing direct or indirect relationships, including when confounders may be present. However, incorporating information from other modalities, such as ATAC-seq, holds the potential to alleviate these limitations as they provide additional evidence that a directional relationship between a regulator and downstream target gene exists, i.e., TF must bind to an accessible region of chromatin to regulate its target gene.

Regression models

Regression offers an approach to capture the relationship between a response variable with multiple predictor variables. In the context of GRN inference, the response variable could be the expression of a gene, regressed on the expression or accessibility of multiple TFs and CREs, respectively (Fig. 2). By explicitly estimating the effect of each predictor onto the response (e.g., gene expression), the coefficients (e.g., TFs or CREs) from the regression model may be interpretable as strength of the association, while the sign of the coefficient can be used to infer the direction of the regulatory interactions.

In the context of inferring a GRN with ordinary least squares regression, the data can contain thousands of TFs or CREs, depending on the distance that is searched from the target gene’s transcription start site²⁸. Importantly, the inclusion of a large number of predictors can often lead to overfitting, where the model becomes overly complex and generalizes poorly. Moreover, regression models can become unstable if there are correlated predictors, which is likely in a biological context given TFs can regulate each other. To address these concerns, more modern penalized regression methods such as LASSO introduces an additional penalty term based on the absolute size of the coefficients, that effectively shinks selected coefficients towards zero and thus reduces the complexity of the final estimated regulatory network. Furthermore, non-parametric approaches, such as tree-based regression, do not assume any fixed structure in the data but can be less interpretable and more computationally intensive to construct.

Probabilistic models

Probabilistic models for GRN inference generally take the form of a graphical model, which captures the dependence between variables, such as TFs and their target genes. These approaches generally aim to model the existence and/or strength of a regulatory relationship between each TF and their putative target genes, which is estimated by finding the most probable relationships that could explain the given training data. These probabilistic measures allow for filtering and prioritization of regulatory interactions before downstream analyses, enabling more targeted investigations. However, these methods often assume that gene expression follows a specific distribution, such as a Gaussian distribution, which may not be an appropriate assumption for all genes²⁹.

Dynamical systems

While regression and probabilistic-based approaches model a response variable directly from predictor variables, dynamical systems-based approaches attempt to model the behavior of systems that evolve over time. In the case of GRN inference, one may be interested in estimating the expression of a gene with respect to various factors such as the regulatory effect of TFs, basal transcription, and general stochasticity over time (Fig. 2). These effects can be modeled as parameters in a differential equation which can be estimated from the data or literature³⁰.

Dynamical-systems models carry a distinct advantage compared to previously discussed methods as they capture a diverse range of factors that can affect gene expression and its stochasticity. The estimated models are interpretable, where each parameter corresponds to a specific property. However, the complexity of larger networks and dependence on prior domain specific knowledge can make these models less scalable and prone to publication bias^31,32.

Deep learning models

Deep learning models are a class of machine learning techniques that have gained significant attention in recent years across a wide array of subjects, including bioinformatics³³. These models are based on artificial neural networks which can be used in versatile architectures to perform various tasks (Fig. 2)^33,34,35. For example, a multi-layer perceptron can solve regression-style problems to estimate a function, while an autoencoder can be used for dimension reduction. In particular, autoencoders can have multiple types of inputs and learn the common connections between them, representing potential regulatory relationships³⁶.

However, the flexibility of deep learning approaches comes at a cost, often requiring very large training data sets as they make minimal modeling assumptions. Additionally, the constructed models can often consist of a large number of parameters, which require a substantial amount of computational resources to be estimated. Deep learning approaches are also generally considered less interpretable compared to traditional statistical models, as the fitted coefficients typically do not have a clear interpretation³⁷. However, a range of recent approaches, such as saliency, aim to rectify this by identifying the important features in the overall model, which can be used to identify candidate TF regulators³⁸.

GRN inference in bulk omics era

Bulk transcriptomics

High-throughput profiling methods such as microarray and RNA sequencing (RNA-seq) were among the first experimental methods to capture the global transcriptomic profile of a sample³⁹. In response, computational methods were developed to unravel the potential regulatory connections between transcription factors and their target genes by analyzing the expression patterns of thousands of genes⁴⁰. Notable examples include ARCANE, CLR, and MRNet, which leverage association metrics like mutual information to quantify the relationship between a TF and its target gene^41,42,43. However, a key constraint of these methods lies in their pairwise calculation of association, failing to model gene expression as a function of multiple regulators. Regression-based methods, such as GENIE3, address this constraint by modeling gene expression as a function of multiple regulators, which may model regulatory relationships between regulators and target genes more accurately^44,45. Nevertheless, an important limitation of these methods is their sole reliance on transcriptomics data, thus overlooking epigenetic modifications which are known to play a crucial role in gene regulation.

Bulk multi-omics

The process of gene regulation and transcription has many molecular mechanisms and players, such as epigenetic modifiers, which engage in complex interactions to regulate gene expression. These molecular regulators play important roles in initiating, promoting, enhancing, and modulating gene transcription. Thus, to construct more comprehensive GRNs, it is important to include additional regulatory factors and DNA elements, such as enhancers and silencers, and structural information including chromatin conformation. For example, ATAC-seq can be used to generate more comprehensive GRNs, as used by GRaNIE, PECA, and TimeReg^46,47,48. Methods such as DISTILLER and ChIP-Array 2 integrate both RNA and ChIP-seq data to identify the TFs and regulatory sequences of target genes^{46,47,49,50,51}. Hi-C can also be used to capture the conformation of DNA and be integrated with both ATAC-seq and RNA-seq data to construct multi-omic GRNs^11,52. Overall, the integration of various multi-omic datasets and the use of statistical models have the potential to enhance our understanding of gene regulation and uncover the dynamic interactions between TFs and their target genes in different biological contexts^30,53,54.

Despite their advantages, both bulk transcriptomics and bulk multi-omics GRN inference methods share common limitations. Any analysis based on bulk data alone makes it challenging to infer cell type-specific information, as the omics profiles are averaged across a population of cells, thereby eliminating any signals of cellular heterogeneity⁵⁵. However, it is well-established that various diseases, such as diabetes and cancers, are wholly or partly driven by specific cell type populations^56,57.

GRN inference in the single-cell era

Single-cell omics

Many of the limitations in GRN inference from bulk omics technologies were alleviated by the birth of single-cell omics technologies. These techniques have provided a detailed glimpse into the cellular and molecular composition of diverse tissues, surpassing the capabilities of bulk sequencing methods^13,58,59,60. Transcriptomics was the first to move to the single-cell level with scRNA-sequencing. Many popular GRN methods have been designed to leverage scRNA-seq data including approaches based on regression (SCENIC, scTenifoldNet), dynamical systems (SCODE) and information theory (PIDC)^22,61,62,63.

Today, sequencing technologies enable the quantification of other modalities via scATAC-seq, scHi-C, and scChIP-seq, facilitating a comprehensive capture of the inter-molecular dynamics within cells^9,15,59. Methods, such as DeepTFni, have been developed to independently leverage these additional modalities to provide an alternate approach to GRN inference⁶⁴. Other methods aim to combine information from multiple modalities. For example, CellOracle, MICA, and IReNA use scRNA-seq and scATAC-seq separately in two stages which involves filtering putative regulatory links and then constructing the final GRN or vice versa^65,66,67. Alternatively, separate GRNs can be constructed from different modalities and then combined to produce a single integrated GRN⁶⁸.

A range of other approaches have been developed to integrate multi-omic data profiled from different cells and simultaneously learn the shared relationships between the different modalities to reconstruct regulatory networks. This includes DC3, scREG and scAI that use matrix factorization techniques to project the unmatched multi-omics data into a low-dimensional representation, thus integrating them together^69,70,71. Similarly, GLUE and scTIE integrate multi-omics data by projecting the different modalities to a low-dimensional embedding, but they use an autoencoder, a deep learning-based technique that can infer complex structures from the data^72,73. Once the low-dimensional representation that captures the shared patterns between the omics layers has been learnt, these methods use the mapping to extract multi-omic features to infer interactions (e.g., between CREs and genes), which can be used to reconstruct a GRN. These methods can also be applied to matched scRNA-seq and scATAC-seq data, by treating them as separate cell populations. However, as their main purpose is not for GRN inference, we do not review them in this article.

Towards matched single-cell multi-omics

As the evolution from bulk RNA to bulk multi-omics involved the development and integration of additional modalities, multimodal single-cell omics technologies have led to new a wave of technologies that can profile different modalities within the same cell, often referred to as matched or paired data⁷⁴. These technologies include SNARE-seq, which allows for the joint profiling of the transcriptome and chromatin accessibility⁷⁵; CITE-seq, a method for capturing the transcriptome and cell surface protein markers²¹; Paired-tag, a high-throughput method for the simultaneous profiling of histone modifications and the transcriptomes⁷⁶; and ASAP-seq, which captures the transcriptome, chromatin landscape, and protein marker expression at the single-cell resolution⁷⁷. Importantly, these advances in sequencing technologies provide an opportunity to harness the information embedded in multimodal data that may be unattainable when integrating unmatched multi-omic data. Nevertheless, a range of computational techniques have been developed to match single cells from different modalities, or impute missing modalities, thereby increasing the availability and accessibility of multimodal single-cell data^78,79.

The latest GRN inference methods are designed to exploit these new data to build a more holistic model of gene regulation, thus inferring more robust and sophisticated regulatory networks. However, they vary in their approaches and complexity and not all single-cell multi-omic GRN inference methods reconstruct cell type or state -specific regulatory networks. As a result, it may be difficult to understand their differences and applicability for various contexts. Here, we categorize the latest GRN methods for paired multi-omic data into five main classes (correlation, regression, probabilistic models, dynamical systems, and deep learning) and discuss their common and distinct features. It is important to acknowledge that the categorizations do not fully encapsulate the entire statistical and methodological frameworks employed by each method, as many approaches combine multiple techniques to reconstruct GRNs. Nevertheless, by simplifying the categorizations, we intend to provide readers with a broad and accessible understanding of the underlying principles guiding these methods. A list of the methods is presented in Fig. 3. We hope that this comprehensive overview will aid researchers in navigating the current GRN inference methodological developments and facilitate informed decision-making regarding their applications.

**Fig. 3: Summary of current GRN inference methods for paired multi-omic data included in this review.**

Correlation-based methods

These methods use correlation to infer potential regulatory relationships between pairs of regulatory elements, such as CRE vs genes or TF vs CREs (Fig. 4). Only CREs within a user-specified distance from the TSS of putative target genes are considered and inference of TF-CRE connections often include TF motif enrichment analysis (Fig. 4). While the correlation-based methods may seem similar at a glance, they have some key differences with respect to their choice of correlation metric, and implementation. For example, STREAM and scMEGA use Pearson’s correlation to capture linear relationships, whereas FigR and TRIPOD use Spearman’s correlation to capture non-linear relationships^33,59,80,81.

FigR and STREAM aim to identify regulatory modules, which capture the key processes in a cell type or state. Briefly, FigR filters for genes with domains of regulatory chromatin (DORCs), defined as genes with a user-defined number of significantly associated CREs. Thus, FigR produces GRNs specifically composed of DORCs. Similarly, STREAM constructs networks where the modules are composed of co-expressed genes and co-accessible CREs. The most likely regulatory TF for these modules are then identified via motif enrichment analysis.

Alternatively, scMEGA and TRIPOD aim to identify individual regulatory links that make up the overall GRN. scMEGA uses TF motif enrichment and Pearson’s correlation between CRE accessibility and gene expression, including TF expression and gene expression, to select candidate TF-gene regulatory pairs. TRIPOD however, aims to find regulatory trios of TF-CRE-genes. The trios are determined by calculating the correlation of gene expression with both TF expression and CRE accessibility, while conditioning the identified CRE-gene and TF-gene associations on the other component. More precisely, CRE-gene relationships are conditioned on TF expression by matching pairs of cells with the closest TF expression values, and the differences in CRE accessibility and gene expression are used for the correlation analysis. As a result, the detected CRE-gene links will not be confounded by TF expression. Likewise, TF-gene relationships are conditioned on CRE accessibility to account for the effect that different CRE accessibilities would vary the ability for a TF to bind and thus regulate gene expression⁵².

Regression-based methods

Accounting for the fact that genes may have multiple TF regulators and vice versa, DIRECT-NET, SCENIC + , Pando, scREMOTE, and RENIN utilize regression to model gene expression as a function of multiple regulators. These methods can be further split into those that employ parametric (Pando, scREMOTE, RENIN) and non-parametric regression, such as tree-based regression (DIRECT-NET and SCENIC + ) (Fig. 5).

One approach is ordinary least squares regression, which in its simplest form assumes a linear relationship between genes and their regulators. Pando and scREMOTE model gene expression as a linear function of TF expression and CRE accessibility^23,82. Pando estimates the regulatory effect of each TF on a gene by regressing gene expression directly on the product of CRE accessibility and TF expression while scREMOTE includes a regulation potential as a weight in the regression which is estimated from TF motif enrichment, CRE accessibility and chromatin conformation. Alternatively, RENIN uses two models with an adaptive elastic-net estimator, a regularization technique which penalizes large coefficients, resulting in a sparser regulatory network and fewer false positives⁸³. The first model captures the relationship between CRE accessibility and gene expression to identify CREs that may be regulating target genes. The second models TF expression and gene expression, which incorporates the results of the first model to identify TF-gene links. In all cases, the inferred coefficients of the linear model can be interpreted as the regulatory effect of a TF on a target gene, constituting the GRN. Importantly, a clear disadvantage of Pando, scREMOTE, and RENIN is that they are limited to identifying linear relationships between regulators such as TFs and CREs and their target genes.

DIRECT-NET and SCENIC+ may mitigate this limitation as they can capture non-linear relationships by using a tree-based regression algorithm called gradient tree boosting^17,22,52. DIRECT-NET offers a valuable functionality as it calculates the importance of each CRE’s accessibility in predicting gene expression and subsequently labels them as high, medium, or low confidence CREs before inferring TF-gene links. This allows for greater control, as only CREs of high-confidence may be kept for further downstream analyses. While both DIRECT-NET and SCENIC+ use TF motif enrichment to establish the TF-gene pairs, SCENIC+ uses an in-house generated motif compendium containing over 30,000 unique position weight matrices, where each TF has an average of 5 assigned motifs. This may have a distinct advantage in predicting TF binding sites compared to collapsing them into consensus sequences such as those used in typical motif enrichment analysis, as it may capture a broader range of TFs.

Probabilistic models

Unlike the methods discussed thus far, which consider target genes independently of others, probabilistic models can model the covariance between genes. To this end, Single-cell Multi-Task Network Inference (scMTNI), aims to reconstruct cell type- or condition-specific GRNs by employing a Bayesian framework and incorporating prior knowledge of the regulatory relationships when estimating cell type-specific regulatory networks (Fig. 6)⁸⁴.

scMTNI uses a cell lineage tree to incorporate the assumption that related cell types should have similar GRNs, as well as corresponding scATAC-seq data to prioritize TF regulators that have a motif in an accessible promoter region of the target gene. The TF-gene network is inferred with a probabilistic graphical model, considering the expression of each gene as a random variable, conditioned on a set of TF regulators. The model is estimated by starting with an empty list of TF-gene regulations and iteratively adding in regulatory connections that most likely explains the expression of the target gene. Two tunable parameters allow users to constrain the number of edges in the inferred GRN and weight the importance of a TF motif in the promoter region of the gene. The final output is a GRN for each cell type in the user provided cell lineage tree. It is important to note that scMTNI assumes that gene expressions follow a Gaussian distribution, which may not be representative of biological reality⁸⁵. Furthermore, the output of Bayesian-based approaches can be sensitive to the choice of priors, potentially limiting the robustness of the inferred GRNs⁸⁶.

Dynamical system-based methods

The GRN inference methods discussed so far generally assume that the cell population of interest is sufficiently homogenous, and any variation is due to noise. However, variation among individual cells may be biologically meaningful and influenced by cell cycle and their environment. Incorporating these factors in the GRN inference process could have distinct advantages as it accounts for the dynamic nature of gene regulation and environmental interactions. In this context, Dictys is designed to capture both static and time-resolved GRNs over a trajectory using pseudo-time analysis (Fig. 7)⁸⁰.

As opposed to the previous methods discussed so far, Dictys targets both TF footprints and motifs to establish TF-CRE links⁸⁰, where TF footprints are smaller and thus less prone to being detected as false positives⁸⁰. The potential regulators for each gene are then filtered to the TFs that can bind to a nearby CRE. The relationship between TFs and putative target genes are then modeled by an empirical linear model as a stochastic differential equation, where the final fitted coefficients represent the regulatory effect of the TFs on their target genes. Notably, Dictys recovers both differential regulation (logFC) and differential expression (CPM). Using differential regulation can help model changes in regulatory activity between TFs and their target genes that are not solely dependent on gene expression levels. Consequently, as Dictys models the expression over time, it may be better suited for studying differential regulatory changes within GRNs, particularly in continuous processes like cell differentiation. Additionally, Dictys may be robust to high variability due to low number of observations as it uses kernel smoothing to construct its regulatory models. However, it is important to note that like linear regression, Dictys estimates the total regulatory effect as a linear combination of individual TF expressions, which may be an oversimplification of true biological relationships, which are often more complex⁸¹.

Deep learning-based methods

Deep learning models have gained significant attention due to their ability to learn complex non-linear patterns and shown great success in diverse domains, such as biomedical imaging, protein structure prediction, and protein function prediction^33,34. Recent works employ deep learning models to leverage the recently available single-cell paired multi-omic data to infer regulatory networks, including DeepMAPS, MTLRank and LINGER (Fig. 8)^38,87,88. In contrast to the other reviewed GRN inference methods, DeepMAPS and MTLRank incorporate RNA velocity, defined as the ratio of spliced and unspliced messenger RNA, which estimates the rate of change of gene expression for a given gene at the time of sequencing⁸⁹. The regulatory impact of TFs on their target genes is rarely instantaneous and involves a cascade of regulatory events (recruitment of co-regulatory proteins and chromatin remodeling) that eventually lead to changes in gene expression. Thus, incorporating RNA velocity as a proxy for changes in gene expression over time can provide a more accurate approximation when estimating and establishing the regulatory effect of TFs on their target genes.

DeepMAPS estimates a regulatory potential for each gene in each cell by aggregating the accessibility of CREs and their proximity to the gene’s transcription start site. The regulatory potential and RNA velocity are then summarized into a gene activity matrix that captures the dynamic nature of each gene in each cell. A graph autoencoder is then used to learn a lower-dimensional embedding of both the genes and cells which is used to group clusters of cells and genes with similar gene activities. Regulatory links between genes are then established for each cluster of cells. Like most methods discussed so far, DeepMAPS uses TF motif enrichment across the CREs to infer the regulatory TFs for these clusters. In contrast, MTLRank calculates a TF activity score from ChIP-seq and scATAC-seq data to estimate the regulatory effect between each TF and gene in a cell. The TF activity is then combined with TF expression to predict the RNA velocity using a multi-layer neural network. MTLRank then ranks the TFs based on their impact on its putative target gene’s RNA velocity, which can be used to infer regulatory relationships and thus reconstruct the GRN.

Alternatively, LINGER directly uses TF expression and CRE accessibility to predict gene expression using a multi-layer neural network, incorporating TF motif enrichment. LINGER first trains the network on bulk data, which has the advantage of leveraging knowledge from atlas-scale data across many contexts. The network is then refined using the matched scRNA-seq and scATAC-seq data. Similarly to MTLRank, the regulatory importance of TFs and CREs is estimated by their impact on their putative target gene’s expression levels. Furthermore, the TF-CRE links can be inferred by the correlation between their weights in the first layer of the neural network, which enables LINGER to construct all TF-CRE, CRE-gene and TF-gene links to reconstruct the GRN.

Challenges and opportunities

In spite of the significant advancements in GRN inference algorithms, several key limitations remain. Here, we discuss these challenges and potential opportunities for future improvements.

Data sparsity

Single-cell data are often characterized by pronounced sparsity and noise compared to bulk data, which may impact the construction of robust GRNs⁹⁰. For example, while the proportion of zeros in bulk data has been estimated to be around 10%-40%⁹¹, the proportion of zeros in single-cell data can be as high as 90%⁹². The sparsity in single-cell data can be partially attributed to technical reasons such as inefficient library preparation and sequence amplification⁹³. Additionally, single-cell technologies aim to capture the expression profiles of individual cells which often exhibit low expression levels for many genes, resulting in a limited number of captured RNA transcripts. In contrast, bulk sequencing technologies aggregate the molecular expression profiles of many cells, allowing them to capture more counts but at the expense of losing heterogeneous information at the cell type level. Importantly, the presence of a high proportion of zeros in single-cell data can lead to biased and unstable estimations of gene expression correlations, further complicating the accurate inference of GRNs⁹⁴. Many GRN inference methods aim to address these issues by aggregating multiple similar cells into metacells (averaged expression profile of multiple similar cells). However, this can lead to inflated correlations, potentially resulting in the inference of erroneous regulatory relationships^95,96. Other strategies include imputation, where missing values are estimated using various approaches, including probabilistic models and latent space embeddings. Yet, most existing imputation approaches have been largely designed for imputation of scRNA-seq data, with limited options available for other data modalities⁹⁷. Nevertheless, we expect significant developments in this area with the continued advancement of sequencing technologies, resulting in improved sequencing depths. Additionally, many statistical and bioinformatics methods have emerged specifically designed to handle sparse data, demonstrating the methodological advancements to manage data sparsity in GRN inference^38,98.

Establishing causality

Another significant challenge in GRN inference is establishing causal relationships between regulators and their target genes. A majority of methods infer regulatory relationships by some measure of association, such as correlation⁹⁹. Similarly, regression and probabilistic approaches model the strength and direction of associations between variables¹⁰⁰. Yet these metrics and models alone are insufficient to establish causal regulatory relationships due to possible confounding factors. However, integrating multiple modalities that capture different aspects of gene regulation, such as chromatin accessibility and conformation, can provide further evidence for true regulatory links. For instance, the presence of a chromatin loop between a TF binding site and its target gene suggests a regulatory relationship as it indicates that the TF can physically bind with the target gene’s regulatory regions, such as the promoter or enhancer region⁵⁴. Additionally, experimental methods, such as perturbation or time-series experiments, offer a more direct approach for inferring regulatory links by perturbing regulators and observing changes in their respective target gene expression levels over time^101,102. For example, it is more likely that a regulatory relationship between a TF and its target gene exists if perturbing the TF results in the repression or activation of its target gene’s expression levels. Capturing these signals within the same cells highlights the advantages of matched multi-omic data, as the relationships between the different modalities are drawn from the same biological context, enhancing the quality and accuracy of regulatory connections made.

Validation

The validation of GRNs is a critical and open challenge given that the reconstructed GRNs aim to recapitulate biological processes of interest. Thus, GRN validation requires a thorough investigation of the concordance between the reconstructed GRNs and ‘ground truth’. To achieve this, ground truth regulatory networks inferred from wet lab experiments, such as functional perturbation experiments, are critical¹⁰³. Loss and gain of function experiments are approaches typically used to more confidently establish regulatory connections by observing whether changes in the expression levels of a regulator results in the activation or repression of its putative target gene^101,104. The advent of CRISPR-cas9 technologies has allowed for high-throughput screening of these regulatory interactions, significantly improving the efficiency and output of perturbation experiments¹⁰⁵. Non-coding regions, such as enhancers, can also be targeted to quantify how changes in CREs might impact downstream target genes using CRISPRi enhancer tiled screens, thereby providing a means to establishing true regulatory links between CREs and target genes¹⁰⁶. It is important to note that experimental validation can be costly and time consuming, and this is particularly true for matched profiling technologies. Nevertheless, advances in sequencing technologies, such as ISSAAC-seq, provide more affordable options for the joint profiling of single-cell modalities and pave the way for improved access to matched profiling technologies¹⁰⁷. Thus, we expect the experimental validation of reconstructed GRNs to become more commonplace as the cost of sequencing decreases as a result of improved efficiency and sensitivity.

Benchmarking

In the same vein, there is a need to validate and benchmark GRN inference methods to improve current limitations. GRN inference methods show considerable diversity in their reconstructed regulatory networks which is particularly evident in methods designed for single-cell data. For example, benchmarking studies of single-cell GRN inference methods have highlighted their poor accuracy and consensus on both experimental and in silico (simulated) data, particularly when increasing the number of genes considered in the inference process^24,108,109. Not surprisingly, some methods perform better when applied to in silico compared to experimental datasets, which may be explained by the fact that in silico networks have simpler network architectures compared to true biological GRNs¹¹⁰. However, given the lack of gold standard experiments for establishing the ground truth, the use of in silico GRNs is a good intermediary option and currently a popular strategy for validating and benchmarking GRN inference methods.

The efficacy of in silico GRNs as surrogates for ground truth models is dependent on their ability to accurately model the complex direct and indirect relationships between TFs, CREs, and genes²³. This remains a significant challenge as the underlying assumptions used to generate in silico GRNs are often oversimplifications of the underlying regulatory connections in true biological networks¹¹⁰. In silico multi-omic GRNs are also lacking, with the exception of some recent work by Li and colleagues who proposed a multi-omic GRN simulation method (scMultiSim), which aims to capture regulatory interactions between different omics layers (RNA and ATAC). While this is a significant step towards constructing more biologically accurate in silico GRNs, there are some important limitations, including the lack of output for accessible regions of chromatin. As such, there are no links between genes and regulatory domains that can act as ground truths when benchmarking multi-omic GRN inference methods. Additionally, given the absence of accessible regulatory regions and their respective sequences, it is not possible to perform TF motif enrichment analysis to infer and validate TF-CRE interactions in a reconstructed GRN.

From another perspective, evaluating reconstructed GRNs and benchmarking GRN inference methods are closely intertwined. A reliable model is one that effectively captures the characteristics of the observed data and should thus be able to produce simulated data that closely approximates the ground truth. Thus, in the context of GRN inference, an effective model should be able to generate data that accurately models the regulatory relationships between TFs, CREs, and genes. Put simply, generating a robust in silico GRN hinges on the capacity of GRN inference methods to faithfully model the ground truth, which can also be guided by experimentally validated knowledge. The current inability to achieve this suggests that the assumptions and approaches in GRN inference are not yet adequate for capturing the true complexity of GRNs. While all models inherently entail limitations and assumptions, we recommend researchers consider whether the assumptions driving the inference process of their methods are necessary and make biologically sense. This will not only improve the generalizability and accuracy of future GRN inference methods but enhance our capacity to accurately simulate the structure of single-cell multi-omic data.

Conclusion

The parallel development of single-cell multi-omic technologies and GRN inference methods has resulted in a unique opportunity to comprehensively characterize cell type and cell-state gene regulatory relationships. As the complexity of available data increases, more powerful GRN inference methods have been developed to harness this data. In this review, we have categorized and summarized the latest state-of-the-art GRN inference methods. Correlation-based methods capture linear (scMEGA, STREAM) or nonlinear (FigR, TRIPOD) pairwise regulatory relationships. Similarly, regression-based methods identify the key TFs that explain the expression of a target gene, using linear (Pando, scREMOTE, RENIN) or nonlinear (DIRECT-NET, SCENIC + ) models. Probabilistic models (scMTNI) can incorporate prior information to identify the most likely regulators for each gene. Dynamical systems-based approaches (Dictys) incorporate external factors to model changes in gene expression over time. Finally, deep learning methods use artificial neural networks to discover complex regulatory relationships between different omics layers (DeepMAPS, MTLRank, LINGER).

GRN inference is a dynamic and rapidly evolving research field, as evidenced by the recent surge of new single-cell multi-omic GRN inference methods. Both technological advancements and algorithmic innovations will continue to drive the development of more powerful tools, leading to the discovery of novel regulatory interactions which play a crucial role in understanding the regulatory networks driving cellular identity and disease. However, while the current GRN inference methods are more advanced than previous methods, there is still work that must be done to mitigate the current limitations and improve the robustness and accuracy of inferred GRNs. Nevertheless, it is clear that both single-cell sequencing technologies and GRN inference methods have made great advances and will continue to develop to further accurately reconstruct multi-modal regulatory relationships, which will have implications for broad research areas, including health and disease.

References

Lee, T. I. & Young, R. A. Transcriptional regulation and its misregulation in disease. Cell. 152, 1237–1251 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lambert, S. A. et al. The human transcription factors. Cell. 172, 650–665 (2018).
Article CAS PubMed Google Scholar
Spitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).
Article CAS PubMed Google Scholar
Almeida, N. et al. Employing core regulatory circuits to define cell identity. EMBO J. (2021). https://onlinelibrary.wiley.com/doi/10.15252/embj.2020106785.
Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).
Article CAS PubMed Google Scholar
Bar-Joseph, Z. et al. Computational discovery of gene modules and regulatory networks. Nat. Biotechnol. 21, 1337–1342 (2003).
Article CAS PubMed Google Scholar
Ruan, J., Dean, A. K. & Zhang, W. A general co-expression network-based approach to gene expression analysis: comparison and applications. BMC Syst. Biol. 4, 8 (2010).
Article PubMed PubMed Central Google Scholar
Song, L., Langfelder, P. & Horvath, S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinform. 13, 328 (2012).
Article CAS Google Scholar
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central Google Scholar
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).
Article CAS PubMed Google Scholar
Lee, J., Hyeon, D. Y. & Hwang, D. Single-cell multiomics: technologies and data analysis methods. Exp. Mol. Med. 52, 1428–1442 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Article CAS PubMed Google Scholar
Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013).
Article CAS PubMed Google Scholar
Rotem, A. et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat. Biotechnol. 33, 1165–1172 (2015).
Article CAS PubMed PubMed Central Google Scholar
Cha, J. & Lee, I. Single-cell network biology for resolving cellular heterogeneity in human diseases. Exp. Mol. Med. 52, 1798–1808 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L., Zhang, J. & Nie, Q. DIRECT-NET: An efficient method to discover cis-regulatory elements and construct regulatory networks from single-cell multiomics data. Sci. Adv. 8, eabl7393 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, S. Y. & Stumpf, M. P. H. Learning cell-specific networks from dynamical single cell data. Preprint https://doi.org/10.1101/2023.01.08.523176 (2023).
Ogbeide, S., Giannese, F., Mincarelli, L. & Macaulay, I. C. Into the multiverse: advances in single-cell multiomic profiling. Trends Genet. TIG 38, 831–843 (2022).
Article CAS PubMed Google Scholar
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116.e20 (2020).
Article CAS PubMed PubMed Central Google Scholar
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
Article CAS PubMed PubMed Central Google Scholar
González-Blas, C. B. et al. SCENIC + : single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).
Article Google Scholar
Tran, A., Yang, P., Yang, J. Y. H. & Ormerod, J. T. scREMOTE: Using multimodal single cell data to predict regulatory gene relationships and to build a computational cell reprogramming model. NAR Genomics Bioinform. 4, lqac023 (2022).
Article Google Scholar
Chen, S. & Mar, J. C. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinform. 19, 232 (2018).
Article Google Scholar
Mercatelli, D., Scalambra, L., Triboli, L., Ray, F. & Giorgi, F. M. Gene regulatory network inference resources: A practical overview. Biochim. Biophys. Acta BBA - Gene Regul. Mech. 1863, 194430 (2020).
Article CAS Google Scholar
Badia-i-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).
Article CAS PubMed Google Scholar
Zuin, J. et al. Nonlinear control of transcription through enhancer–promoter interactions. Nature 604, 571–577 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yang, P., Huang, H. & Liu, C. Feature selection revisited in the single-cell era. Genome Biol. 22, 321 (2021).
Article PubMed PubMed Central Google Scholar
Huynh-Thu, V. A. & Sanguinetti, G. Gene regulatory network inference: an introductory survey. in Gene regulatory networks: Methods and protocols (eds Sanguinetti, G. & Huynh-Thu, V. A.). 1–23 (Springer, 2019). https://doi.org/10.1007/978-1-4939-8882-2_1.
Hecker, M., Lambeck, S., Toepfer, S., Van Someren, E. & Guthke, R. Gene regulatory network inference: Data integration in dynamic models—A review. Biosystems 96, 86–103 (2009).
Article CAS PubMed Google Scholar
Polynikis, A., Hogan, S. J. & Di Bernardo, M. Comparing different ODE modelling approaches for gene regulatory networks. J. Theor. Biol. 261, 511–530 (2009).
Article CAS PubMed Google Scholar
Yaghoobi, H., Haghipour, S., Hamzeiy, H. & Asadi-Khiavi, M. A review of modeling techniques for genetic regulatory networks. J. Med. Signals Sens. 2, 61–70 (2012).
Article PubMed PubMed Central Google Scholar
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869 (2016).
Google Scholar
Sapoval, N. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun. 13, 1728 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cao, Y., Geddes, T. A., Yang, J. Y. H. & Yang, P. Ensemble deep learning in bioinformatics. Nat. Mach. Intell. 2, 500–508 (2020).
Article Google Scholar
Liu, C., Huang, H. & Yang, P. Multi-task learning from multimodal single-cell omics with Matilda. Nucleic Acids Res. 51, e45 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ma, Q. & Xu, D. Deep learning shapes single-cell data analysis. Nat. Rev. Mol. Cell Biol. 23, 303–304 (2022).
Article CAS PubMed PubMed Central Google Scholar
Song, Q., Ruffalo, M. & Bar-Joseph, Z. Using single cell atlas data to reconstruct regulatory networks. Nucleic Acids Res. 51, e38 (2023).
Article CAS PubMed PubMed Central Google Scholar
Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).
Article CAS PubMed Google Scholar
Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. 95, 14863–14868 (1998).
Article CAS PubMed PubMed Central Google Scholar
Faith, J. J. et al. Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).
Article PubMed PubMed Central Google Scholar
Margolin, A. A. et al. ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinforma. 7, S7 (2006).
Article Google Scholar
Meyer, P. E., Kontos, K., Lafitte, F. & Bontempi, G. Information-theoretic inference of large transcriptional regulatory networks. EURASIP J. Bioinforma. Syst. Biol. 2007, 1–9 (2007).
Article Google Scholar
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).
Article PubMed PubMed Central Google Scholar
Wagner, A. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15, 776–784 (1999).
Article CAS PubMed Google Scholar
Kamal, A. et al. GRaNIE and GRaNPA: Inference and evaluation of enhancer-mediated gene regulatory networks. Mol. Syst. Biol. 19, e11627 (2023).
Article CAS PubMed PubMed Central Google Scholar
Duren, Z., Chen, X., Jiang, R., Wang, Y. & Wong, W. H. Modeling gene regulation from paired expression and chromatin accessibility data. Proc. Natl. Acad. Sci. 114, E4914–E4923 (2017).
Article CAS PubMed PubMed Central Google Scholar
Duren, Z., Chen, X., Xin, J., Wang, Y. & Wong, W. H. Time course regulatory analysis based on paired expression and chromatin accessibility data. Genome Res. 30, 622–634 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lemmens, K. et al. DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli. Genome Biol. 10, R27 (2009).
Article PubMed PubMed Central Google Scholar
Ouyang, Z., Zhou, Q. & Wong, W. H. ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc. Natl Acad. Sci. 106, 21521–21526 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wang, P. et al. ChIP-Array 2: integrating multiple omics data to construct gene regulatory networks. Nucleic Acids Res. 43, W264–W269 (2015).
Article CAS PubMed PubMed Central Google Scholar
Jiang, Y. et al. Nonparametric single-cell multiomic characterization of trio relationships between transcription factors, target genes, and cis-regulatory regions. Cell Syst. 13, 737–751.e4 (2022).
Article CAS PubMed PubMed Central Google Scholar
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. J. et al. Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning. Nucleic Acids Res. 48, 1828–1842 (2020).
Article CAS PubMed Google Scholar
Meng, C., Kuster, B., Culhane, A. C. & Gholami, A. M. A multivariate approach to the integration of multi-omics datasets. BMC Bioinforma. 15, 162 (2014).
Article Google Scholar
Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175, 984–997.e24 (2018).
Article CAS PubMed PubMed Central Google Scholar
Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
Article CAS PubMed PubMed Central Google Scholar
Aldridge, S. & Teichmann, S. A. Single cell transcriptomics comes of age. Nat. Commun. 11, 4307 (2020).
Article CAS PubMed PubMed Central Google Scholar
Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal–fetal interface in humans. Nature 563, 347–353 (2018).
Article CAS PubMed PubMed Central Google Scholar
Vieira Braga, F. A. et al. A cellular census of human lungs identifies novel cell states in health and in asthma. Nat. Med. 25, 1153–1163 (2019).
Article CAS PubMed Google Scholar
Osorio, D., Zhong, Y., Li, G., Huang, J. Z. & Cai, J. J. scTenifoldNet: A machine learning workflow for constructing and comparing transcriptome-wide gene regulatory networks from single-cell data. Patterns 1, 100139 (2020).
Article CAS PubMed PubMed Central Google Scholar
Matsumoto, H. et al. SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics 33, 2314–2321 (2017).
Article PubMed PubMed Central Google Scholar
Chan, T. E., Stumpf, M. P. H. & Babtie, A. C. Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures. Cell Syst. 5, 251–267.e3 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Inferring transcription factor regulatory networks from single-cell ATAC-seq data based on graph neural networks. Nat. Mach. Intell. 4, 389–400 (2022).
Article Google Scholar
Jiang, J. et al. IReNA: Integrated regulatory network analysis of single-cell transcriptomes and chromatin accessibility profiles. iScience 25, 105359 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).
Article CAS PubMed PubMed Central Google Scholar
Alanis-Lobato, G. et al. MICA: A multi-omics method to predict gene regulatory networks in early human embryos. Preprint at https://doi.org/10.1101/2023.02.03.527081 (2023).
Jansen, C. et al. Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self Organizing Maps. PLOS Comput. Biol. 15, e1006555 (2019).
Article PubMed PubMed Central Google Scholar
Duren, Z. et al. Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG. Genome Biol. 23, 114 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol. 21, 25 (2020).
Article PubMed PubMed Central Google Scholar
Zeng, W. et al. DC3 is a method for deconvolution and coupled clustering from bulk and single-cell genomics data. Nat. Commun. 10, 4613 (2019).
Article PubMed PubMed Central Google Scholar
Cao, Z. J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lin, Y. et al. scTIE: data integration and inference of gene regulation using single-cell temporal multimodal data. Preprint at https://doi.org/10.1101/2023.05.18.541381 (2023).
Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).
Article CAS PubMed Google Scholar
Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhu, C. et al. Joint profiling of histone modifications and transcriptome in single cells from mouse brain. Nat. Methods 18, 283–292 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yuan, Q. & Duren, Z. Integration of single-cell multi-omics data by regression analysis on unpaired observations. Genome Biol. 23, 160 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kartha, V. K. et al. Functional inference of gene regulation using single-cell multi-omics. Cell Genomics 2, 100166 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, L. et al. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multi-omics. Nat. Methods 20, 1368–1378 (2023).
Article CAS PubMed Google Scholar
Steinacher, A., Bates, D. G., Akman, O. E. & Soyer, O. S. Nonlinear Dynamics in Gene Regulation Promote Robustness and Evolvability of Gene Expression Levels. PLOS ONE 11, e0153295 (2016).
Article PubMed PubMed Central Google Scholar
Fleck, J. S. et al. Inferring and perturbing cell fate regulomes in human brain organoids. Nature 621, 365–372 (2023).
Article CAS PubMed Google Scholar
Ledru, N. et al. Predicting regulators of epithelial cell state through regularized regression analysis of single cell multiomic sequencing. Preprint at https://doi.org/10.1101/2022.12.29.522232 (2022).
Zhang, S. et al. Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets. Nat. Commun. 14, 3064 (2023).
Article CAS PubMed PubMed Central Google Scholar
De Torrenté, L. et al. The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data. BMC Bioinforma. 21, 562 (2020).
Article Google Scholar
Van Dongen, S. Prior specification in Bayesian statistics: Three cautionary tales. J. Theor. Biol. 242, 90–100 (2006).
Article PubMed Google Scholar
Ma, A. et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat. Commun. 14, 964 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yuan, Q. & Duren, Z. Continuous lifelong learning for modeling of gene regulation from single cell multiome data by leveraging atlas-scale external data. Preprint at https://doi.org/10.1101/2023.08.01.551575 (2023).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Article PubMed PubMed Central Google Scholar
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018).
Article PubMed Google Scholar
Deaton, A. M. et al. Cell type–specific DNA methylation at intragenic CpG islands in the immune system. Genome Res. 21, 1074–1086 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ding, J. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat. Biotechnol. 38, 737–746 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jiang, R., Sun, T., Song, D. & Li, J. J. Statistics or biology: the zero-inflation controversy about scRNA-seq data. Genome Biol. 23, 31 (2022).
Article CAS PubMed PubMed Central Google Scholar
Van Dijk, D. et al. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. Cell 174, 716–729.e27 (2018).
Article PubMed PubMed Central Google Scholar
Loney, T. & Nagelkerke, N. J. The individualistic fallacy, ecological studies and instrumental variables: a causal interpretation. Emerg. Themes Epidemiol. 11, 18 (2014).
Article PubMed PubMed Central Google Scholar
Steel, D. G. & Holt, D. Analysing and Adjusting Aggregation Effects: The Ecological Fallacy Revisited. Int. Stat. Rev. Rev. Int. Stat. 64, 39 (1996).
Article Google Scholar
Hou, W., Ji, Z., Ji, H. & Hicks, S. C. A systematic evaluation of single-cell RNA-sequencing imputation methods. Genome Biol. 21, 218 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sekula, M., Gaskins, J. & Datta, S. A sparse Bayesian factor model for the construction of gene co-expression networks from single-cell RNA sequencing count data. BMC Bioinforma. 21, 361 (2020).
Article CAS Google Scholar
Altman, N. & Krzywinski, M. Association, correlation and causation. Nat. Methods 12, 899–900 (2015).
Article CAS PubMed Google Scholar
Pearl, J. Statistics and causal inference: A review. Test 12, 281–345 (2003).
Article Google Scholar
Meinshausen, N. et al. Methods for causal inference from gene perturbation experiments and validation. Proc. Natl Acad. Sci. 113, 7361–7368 (2016).
Article CAS PubMed PubMed Central Google Scholar
Qiu, X. et al. Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe. Cell Syst. 10, 265–274.e11 (2020).
Article CAS PubMed PubMed Central Google Scholar
Streit, A. et al. Experimental approaches for gene regulatory network construction: The chick as a model system: Gene Regulatory Network Construction. genesis 51, 296–310 (2013).
Article CAS PubMed Google Scholar
Tegnér, J., Yeung, M. K. S., Hasty, J. & Collins, J. J. Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling. Proc. Natl Acad. Sci. 100, 5944–5949 (2003).
Article PubMed PubMed Central Google Scholar
Akinci, E., Hamilton, M. C., Khowpinitchai, B. & Sherwood, R. I. Using CRISPR to understand and manipulate gene regulation. Development 148, dev182667 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xie, S., Duan, J., Li, B., Zhou, P. & Hon, G. C. Multiplexed Engineering and Analysis of Combinatorial Enhancer Activity in Single Cells. Mol. Cell 66, 285–299.e5 (2017).
Article CAS PubMed Google Scholar
Xu, W. et al. ISSAAC-seq enables sensitive and flexible multimodal profiling of chromatin accessibility and gene expression in single cells. Nat. Methods 19, 1243–1249 (2022).
Article CAS PubMed Google Scholar
Kang, Y., Thieffry, D. & Cantini, L. Evaluating the reproducibility of single-cell gene regulatory network inference algorithms. Front. Genet. 12, 617282 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, H., Tran, D., Tran, B., Pehlivan, B. & Nguyen, T. A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data. Brief. Bioinform. 22, bbaa190 (2021).
Article PubMed Google Scholar
Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The following sources of funding for each author, and for the manuscript preparation, are gratefully acknowledged: D.K. is supported by an Australian Commonwealth Government Research Training Program Stipend Scholarship and the Children’s Medical Research Institute Top-up Award; A.T. is supported by an Australian Commonwealth Government Research Training Program Stipend Scholarship. J.Y.H.Y. and P.Y. are supported by the AIR@innoHK programme of the Innovation and Technology Commission of Hong Kong. P.Y. is supported by a National Health and Medical Research Council (NHMRC) Investigator Grant (1173469) and a Metcalf Prize from National Stem Cell Foundation of Australia. The authors thank their colleagues at the Sydney Precision Data Science Centre for their support and intellectual discussions. They would also like to thank Farhan Ameen, Tom Geddes, Carissa Chen, Lijia Yu, Jackson Zhou and Helen Fu for their feedback and edits.

Author information

These authors contributed equally: Daniel Kim, Andy Tran.

Authors and Affiliations

School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
Daniel Kim, Andy Tran, Yingxin Lin, Jean Yee Hwa Yang & Pengyi Yang
Computational Systems Biology Unit, Children’s Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
Daniel Kim, Hani Jieun Kim & Pengyi Yang
Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
Daniel Kim, Andy Tran, Hani Jieun Kim, Yingxin Lin, Jean Yee Hwa Yang & Pengyi Yang
Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
Andy Tran, Yingxin Lin, Jean Yee Hwa Yang & Pengyi Yang

Authors

Daniel Kim
View author publications
You can also search for this author in PubMed Google Scholar
Andy Tran
View author publications
You can also search for this author in PubMed Google Scholar
Hani Jieun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yingxin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jean Yee Hwa Yang
View author publications
You can also search for this author in PubMed Google Scholar
Pengyi Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.Y. and J.Y. conceived the study. D.K. and A.T. led the review and wrote the manuscript with contributions from H.K. and Y.L. All authors contributed to the completion of this manuscript.

Corresponding authors

Correspondence to Jean Yee Hwa Yang or Pengyi Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, D., Tran, A., Kim, H.J. et al. Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data. npj Syst Biol Appl 9, 51 (2023). https://doi.org/10.1038/s41540-023-00312-6

Download citation

Received: 14 August 2023
Accepted: 02 October 2023
Published: 19 October 2023
DOI: https://doi.org/10.1038/s41540-023-00312-6
Springer Nature Limited

Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data

Abstract

Similar content being viewed by others

Gene regulatory network inference in the era of single-cell multi-omics

A review on gene regulatory network reconstruction algorithms based on single cell RNA sequencing

Towards precise reconstruction of gene regulatory networks by data integration

Introduction

Methodological foundations of GRN inference

Correlation-based approaches

Regression models

Probabilistic models

Dynamical systems

Deep learning models

GRN inference in bulk omics era

Bulk transcriptomics

Bulk multi-omics

GRN inference in the single-cell era

Single-cell omics

Towards matched single-cell multi-omics

Correlation-based methods

Regression-based methods

Probabilistic models

Dynamical system-based methods

Deep learning-based methods

Challenges and opportunities

Data sparsity

Establishing causality

Validation

Benchmarking

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation