Abstract
Background
Current gene regulatory network (GRN) inference methods are notorious for a great number of indirect interactions hidden in the predictions. Filtering out the indirect interactions from direct ones remains an important challenge in the reconstruction of GRNs. To address this issue, we developed a redundancy silencing and network enhancement technique (RSNET) for inferring GRNs.
Results
To assess the performance of RSNET method, we implemented the experiments on several gold-standard networks by using simulation study, DREAM challenge dataset and Escherichia coli network. The results show that RSNET method performed better than the compared methods in sensitivity and accuracy. As a case of study, we used RSNET to construct functional GRN for apple fruit ripening from gene expression data.
Conclusions
In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively, and the highly dependent nodes are constrained in the model to keep the real interactions. This study provides a useful tool for inferring clean networks.
Similar content being viewed by others
Background
Gene regulatory network (GRN), which represents interactions or causalities between genes, describes the developmental or regulatory process in a cellular system [1]. GRN inference is a focal point of systems biology to understand biological systems [2]. The traditional knock-out or perturbation experiments have been widely used to discover the regulations among genes and achieved success in some degree to understand the biological system [3]. However, these interactions discovered by the expensive and time-consuming experiments are 'just the tip of the iceberg' in a complex GRN. While the genome-wide inference of GRNs from high-throughput data by computational methods promises an economical channel to disclose the complex regulatory mechanism [4, 5]. The challenge of computational methods is to build reasonable models to precisely predict the interactions between regulators and targets from gene expression data [6]. Distinguishing the direct interactions from the indirect ones remains an important challenge in the reconstruction of GRNs because of the notoriousness of the inference methods with the indirect interactions inherited in the network [7, 8].
In recent years, various approaches have been developed to address these challenges in GRN inference, and some of them have achieved success in some degree [9]. According to the techniques involved, these approaches can be divided into two types, i.e., dependence and equation-based methods [10]. In dependence-based methods, gene network is predicted by measuring the dependences among genes based on the methods such as Pearson correlation coefficient [11,12,13], mutual information [14, 15], and Granger method [16, 17]. This types of methods can measure the linear or nonlinear correlations independently but the results involve lots of redundant edges like indirect regulations [18,19,20]. In equation-based methods, the regulations and regulatory strengths among genes are described as equations [21]. Representative equation-based methods contain multiple linear regression [22], nonnegative matrix factorization [23], network component analysis [24, 25], and linear programming [26], and random forest [27, 28]. The equation-based methods can catch the interactions based on the dynamic mechanism but the optimization technique sometimes impacts their capability of parameter estimation for the high dimensionality of candidate regulators [29, 30].
Despite concurrent advances in GRN inference methods, most of them cannot distinguish direct correlations from the indirect ones [31]. Some dependence-based methods have been developed to discriminate direct and indirect connections of GRNs, such as partial correlation coefficient (PCC) [32], conditional mutual information (CMI) [33], part mutual information (PMI) [34], and conditional mutual inclusive information (CMI2) [35]. The equation-based methods are popular for their advantages of sparseness control and optimal estimation [36,37,38]. However, these methods are sensitive to the data with tow limitations which impact the performance of GRN inference seriously [39, 40]. Firstly, the noise of the data, high dimensionality of genes, and small scale of samples will affect parameter estimation of optimization. Secondly, indirect interactions will be involved in the results [41, 42]. The challenge to improve the accuracy of regression-based methods is to address these limitations [43, 44].
We previously proposed a noise and redundancy reduction strategy, namely NARROMI, based on recursive optimization that improved the performance on gene network inference [45]. In this strategy, the network was updated by recursive optimization to remove the indirect interactions. The limitation of the strategy is that some direct interactions identified by previous step were not recognized by next step. In other words, accompanied with the elevated true positive rate (TPR), recursive optimization (RO) also improves false negative rate (FPR). In an algorithm for network inference, the balance between TPR and FPR is the key technique to improve its performance. Some techniques incorporating existing network information into the optimization problem have been proposed to improve network inference [46, http://geneontology.org/) was achieved. With above GO items, the web tool WEGO2.0 (http://wego.genomics.org.cn/) was used for the visualization. Figure 6 shows the result of GO analysis for the genes identified. Out of 313 core genes, 147 genes were annotated and divided into three basic parts in GO first-level items (Additional file 3: Table S3). There are 98 items in biological process part, 30 items in cellular component and 128 items in molecular function part (Fig. 6a). To show the hierarchical relationship for the gene set, the second and third levels of GO items were provided separately (Fig. 6b, c). Listed in first and third places of the columns, two items catalytic activity (GO:0003824) and binding (GO:0005488) reveal that these genes are involved in some catalytic reactions and molecule activities, such as redox reactions, hydrolysis reaction, ion binding, organic cyclic compound binding, etc. Another two items metabolic process (GO:0008152) and cellular process (GO:0009987), listed in second and forth places, indicate that the genes regulate some metabolism related biological progresses. All items above confirm that the gene set identified by RSNET method are highly correlated with fruit developmental progress.
The GO analysis confirmed the genes predicted correlated with fruit development. a Table for the result of GO analysis including the number of genes involved in different GO terms. b Hierarchical relationship of the gene set in second level of GO items. c Hierarchical relationship of gene set in third level of GO items
To explore whether the genes identified by RSNET method correlate with fruit development, we analyzed the dynamical changes of their expression during the stages from floral bud to ripe fruit. We clustered the 313 genes into seven sub-clusters with clustering tool. Among of them, six sub-clusters are matched with the four plant physiological processes, i.e. floral bud/bloom (FB), early fruit development (EDF), mid-development (MD), and ripening (R) (Fig. 7a). This result showed that the sub-cluster 4 matched FB, the sub-cluster 5 matched EDF, the sub-clusters 1 and 7 matched MD and sub-clusters 2 & 3 matched R exactly (Fig. 7b). Our analysis provides a gene list with significance for fruit development. Among of these genes in the list, 30 genes are highly related ones and 283 genes are related ones. Compared to previous analysis by ANOVA method which selected 1955 genes, RSNET method show the superiority in smaller gene size for showing the similar dynamical change with fruit development. With fewer genes, RSNET method significantly caught the dynamical changing during fruit development. The result shows two advantages of RSNET method in network inference. Firstly, RSNET method can identify the direct causal genes by filtering out the indirect and noisy genes. Secondly, RSNET method can identify significant genes but not a random selection from the whole genes.
Methods
Mutual information between gene pairs
The dependency between a gene pair can be measured by computing mutual information (MI) of two gene expression vectors. For the advantage of nonlinear relationship measurement, mutual information has been widely used. For gene pair A and B, their mutual information (MI) can be described as [33]
With mathematical analysis, above formula can be commutated by [33]
where M is covariance matrix and |M| is the determinant of M. In particular, MI(A,B) = 0 represents that genes A and B are independent.
In the first step of the proposed method, mutual information will be used to select the putative regulators from the global candidate genes for a given target gene.
Redundancy silencing and network enhancement technique
To quantitatively describe a gene regulatory network for the transcription procedure from DNA to RNA, a mathematical model involving transcription factors and target gene should be built [45, 54]. Among the reasonable models, regression model is the most popular one for its advantage of dynamic description of transcription. In this work, we provided an update model to silence the redundant regulations and enhance the high-confident edges. The redundancy silencing is implemented by the following recursive optimizations with update results until there is no change for the result.
where \(y,X\) and \(\beta\) represent target gene, TFs, and regulatory strengths respectively. \(\hat{\beta }\) is the network enhancement items with 0 or 1. \(\lambda\) and \(\gamma\) are parameters to balance the error and ensure the network sparseness respectively. The operator \(\otimes\) is the Hadamard product. The parameter \(\hat{\beta }\) will be estimated by mutual information firstly and then updated by optimizations [ In reconstruction of GRNs, distinguishing the direct interactions from the indirect ones is an important challenge because of the notoriousness of the inference methods with the indirect interactions inherited in the network. In this study, we present a redundancy silencing and network enhancement technique-based network inference method named RSNET. In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively. While the highly confident correlated regulators are constrained to improve the true positive rate of prediction. The results on gold-standard networks including simulation study, DREAM challenge dataset and Escherichia coli network show the good performance of RSNET method. The case study for constructing apple fruit ripening GRN show that RSNET method can construct function-specific GRNs. This study provides a useful bioinformatics tool for inferring clean GRN from gene expression data.Conclusion
Availability of data and materials
The RSNET software and related data are freely accessible at https://github.com/zhanglab-wbgcas/rsnet. The raw data of apple gene expression analyzed in this study are available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2287172/bin/1471-2229-8-16-S1.xls.
Availability and requirements
Project name: rsnet. Project home page: https://github.com/zhanglab-wbgcas/rsnet. Operating system: Windows. Programming language: MATLAB. Other requirements: MATLAB 7.0 or higher. License: MATLAB. Any restrictions to use by non-academics: None.
Abbreviations
- GRN:
-
Gene regulatory network
- MI:
-
Mutual information
- TF:
-
Transcription factor
- TG:
-
Target gene
- GO:
-
Gene ontology
- EDF:
-
Early fruit development
- MD:
-
Mid-development
- FB:
-
Full bloom
- R:
-
Ripening
- TPR:
-
True positive rate
- FPR:
-
False positive rate
- PPV:
-
Positive predictive value
- ACC:
-
Accuracy
- MCC:
-
Matthews correlation coefficient
- ROC:
-
Receiver operating characteristic
- AUC:
-
Area under ROC curve
References
Li M, Belmonte JCI. Deconstructing the pluripotency gene regulatory network. Nat Cell Biol. 2018;20(4):382.
Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK, et al. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods. 2016;13(4):310–8.
Meinshausen N, Hauser A, Mooij JM, Peters J, Versteeg P, Bühlmann P. Methods for causal inference from gene perturbation experiments and validation. Proc Natl Acad Sci USA. 2016;113(27):7361–8.
Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9(8):796–804.
Parikshak NN, Gandal MJ, Geschwind DH. Systems biology and gene networks in neurodevelopmental and neurodegenerative disorders. Nat Rev Genet. 2015;16(8):441.
Chiribella G, Ebler D. Quantum speedup in the identification of cause–effect relations. Nat Commun. 2019;10(1):1472.
Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA. 2010;107(14):6286–91.
Parsana P, Ruberman C, Jaffe AE, Schatz MC, Battle A, Leek JT. Addressing confounding artifacts in reconstruction of gene co-expression networks. Genome Biol. 2019;20(1):94.
De Smet R, Marchal K. Advantages and limitations of current network inference methods. Nat Rev Microbiol. 2010;8(10):717–29.
Camacho DM, Collins KM, Powers RK, Costello JC, Collins JJ. Next-generation machine learning for biological networks. Cell. 2018;173(7):1581–92.
Anderson KM, Krienen FM, Choi EY, Reinen JM, Yeo BT, Holmes AJ. Gene expression links functional networks across cortex and striatum. Nat Commun. 2018;9(1):1428.
Chang Y-M, Lin H-H, Liu W-Y, Yu C-P, Chen H-J, Wartini PP, Kao Y-Y, Wu Y-H, Lin J-J, Lu M-YJ. Comparative transcriptomics method to infer gene coexpression networks and its applications to maize and rice leaf transcriptomes. Proc Natl Acad Sci USA. 2019;116(8):3091–9.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9(1):559.
Khatamian A, Paull EO, Califano A, Yu J. SJARACNe: a scalable software tool for gene network reverse engineering from big data. Bioinformatics. 2018;35(12):2165–6.
Wallace Z, Rosenthal SB, Fisch KM, Ideker T, Sasik R. On entropy and information in gene interaction networks. Bioinformatics. 2018;35(5):815–22.
Sheikhattar A, Miran S, Liu J, Fritz JB, Shamma SA, Kanold PO, Babadi B. Extracting neuronal functional network dynamics via adaptive Granger causality analysis. Proc Natl Acad Sci USA. 2018;115(17):E3869–78.
Stokes PA, Purdon PL. A study of problems encountered in Granger causality analysis from a neuroscience perspective. Proc Natl Acad Sci USA. 2017;114(34):E7063–72.
Barzel B, Barabasi AL. Network link prediction by global silencing of indirect correlations. Nat Biotechnol. 2013;31(8):720–5.
Feizi S, Marbach D, Medard M, Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol. 2013;31(8):726–33.
Feizi S, Marbach D, Medard M, Kellis M. Corrigendum: network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol. 2015;33(4):424.
Castro DM, De Veaux NR, Miraldi ER, Bonneau R. Multi-study inference of regulatory networks for more accurate models of gene regulation. PLoS Comput Biol. 2019;15(1): e1006591.
Sulaimanov N, Kumar S, Burdet F, Ibberson M, Pagni M, Koeppl H. Inferring gene expression networks with hubs using a degree weighted Lasso approach. Bioinformatics. 2018;35(6):987–94.
Wu S, Joseph A, Hammonds AS, Celniker SE, Yu B, Frise E. Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks. Proc Natl Acad Sci USA. 2016;113(16):4290–5.
Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP. Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci USA. 2003;100(26):15522–7.
Yan B, Guan D, Wang C, Wang J, He B, Qin J, Boheler KR, Lu A, Zhang G, Zhu H. An integrative method to decode regulatory logics in gene transcription. Nat Commun. 2017;8(1):1044.
Zhu H, Rao RS, Zeng T, Chen L. Reconstructing dynamic gene regulatory networks from sample-based transcriptional data. Nucleic Acids Res. 2012;40(21):10657–67.
Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics. 2015;31(12):i197–205.
Zheng R, Li M, Chen X, Wu F-X, Pan Y, Wang J. BiXGBoost: a scalable, flexible boosting-based method for reconstructing gene regulatory networks. Bioinformatics. 2018;35(11):1893–900.
Aibar S, González-Blas CB, Moerman T, Imrichova H, Hulselmans G, Rambow F, Marine J-C, Geurts P, Aerts J, van den Oord J. SCENIC: single-cell regulatory network inference and clustering. Nat Methods. 2017;14(11):1083.
Magnusson R, Gustafsson M. LiPLike: towards gene regulatory network predictions of high certainty. Bioinformatics. 2020;36(8):2522–9.
Kang T, Moore R, Li Y, Sontag E, Bleris L. Discriminating direct and indirect connectivities in biological networks. Proc Natl Acad Sci USA. 2015;201507168.
Sato T, Yamanishi Y, Horimoto K, Kanehisa M, Toh H. Partial correlation coefficient between distance matrices as a new indicator of protein-protein interactions. Bioinformatics. 2006;22(20):2488–92.
Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, Hao JK, Liu ZP, Chen L. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics. 2012;28(1):98–104.
Zhao J, Zhou Y, Zhang X, Chen L. Part mutual information for quantifying direct associations in networks. Proc Natl Acad Sci USA. 2016;113(18):5130–5.
Zhang X, Zhao J, Hao JK, Zhao XM, Chen L. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Res. 2015;43(5): e31.
Ueno D, Kawabe H, Yamasaki S, Demura T, Kato K. Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana. BMC Bioinform. 2021;22(1):380.
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics. 2020;36(19):4885–93.
Cao Z, Grima R. Linear map** approximation of gene regulatory networks with stochastic dynamics. Nat Commun. 2018;9(1):3305.
Blum C, Heramvand N, Khonsari A, Kollmann M. Experimental noise cutoff boosts inferability of transcriptional networks in large-scale gene-deletion studies. Nat Commun. 2018;9(1):133.
Haehne H, Casadiego J, Peinke J, Timme M. Detecting hidden units and network size from perceptible dynamics. Phys Rev Lett. 2019;122(15): 158301.
Casadiego J, Nitzan M, Hallerberg S, Timme M. Model-free inference of direct network interactions from nonlinear collective dynamics. Nat Commun. 2017;8(1):2192.
Casadiego J, Maoutsa D, Timme M. Inferring network connectivity from event timing patterns. Phys Rev Lett. 2018;121(5): 054101.
Grilli J, Barabás G, Michalska-Smith MJ, Allesina S. Higher-order interactions stabilize dynamics in competitive network models. Nature. 2017;548(7666):210.
Pržulj N, Malod-Dognin N. Network analytics in the age of big data. Science. 2016;353(6295):123–4.
Zhang X, Liu K, Liu ZP, Duval B, Richer JM, Zhao XM, Hao JK, Chen L. NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference. Bioinformatics. 2013;29(1):106–13.
Greenfield A, Hafemeister C, Bonneau R. Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks. Bioinformatics. 2013;29(8):1060–7.
Wang L, **n J, Nie Q. A critical quantity for noise attenuation in feedback systems. PLoS Comput Biol. 2010;6(4): e1000764.
Schaffter T, Marbach D, Floreano D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011;27(16):2263–70.
Santos-Zavaleta A, Salgado H, Gama-Castro S, Sánchez-Pérez M, Gómez-Romero L, Ledezma-Tejeida D, García-Sotelo JS, Alquicira-Hernández K, Muñiz-Rascado LJ, Peña-Loredo P. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 2018;47(D1):D212–20.
Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, Gardner TS. Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 2007;36(1):D866–70.
Daccord N, Celton JM, Linsmith G, Becker C, Choisne N, Schijlen E, Geest HVD, Bianco L, Micheletti D, Velasco R. High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat Genet. 2017;49(7):1099.
Duan N, Bai Y, Sun H, Wang N, Ma Y, Li M, Wang X, Jiao C, Legall N, Mao L. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement. Nat Commun. 2017;8(1):1–11.
Janssen BJ, Thodey K, Schaffer RJ, Alba R, Balakrishnan L, Bishop R, Bowen JH, Crowhurst RN, Gleave AP, Ledger S. Global gene expression analysis of apple fruit development from the floral bud to ripe fruit. BMC Plant Biol. 2008;8(1):16.
McGoff KA, Guo X, Deckard A, Kelliher CM, Leman AR, Francey LJ, Hogenesch JB, Haase SB, Harer JL. The local edge machine: inference of dynamic models of gene regulation. Genome Biol. 2016;17(1):214.
Christley S, Nie Q, **e X. Incorporating existing network information into gene network inference. PLoS ONE. 2009;4(8):06799.
Marbach D, Lamparter D, Quon G, Kellis M, Kutalik Z, Bergmann S. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods. 2016;13(4):366.
Wang Y, Joshi T, Zhang XS, Xu D, Chen L. Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics. 2006;22(19):2413–20.
Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE. 2010;5(9):4439–51.
Acknowledgements
We thank the editor and anonymous reviewers for helpful comments and suggestions.
Funding
This work was supported by the grants from the National Natural Science Foundation of China [32070682, 61402457], Technology Innovation Zone Project [1716315XJ00200303, 1816315XJ00100216], and CAS Pioneer Hundred Talents Program. The funding bodies did not play any role in the design of the study, the collection, analysis, and interpretation of data, or in writing of the manuscript.
Author information
Authors and Affiliations
Contributions
XZ conceived and designed the project. XZ proposed the model and conducted the algorithm. XJ and XZ performed the experiments. XJ and XZ analyzed the data. XJ and XZ wrote the manuscript. All authors have contributed to the content of this paper, and have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Table S1
. The results on DREAM networks of E.coli 1, E.coli 2, Yeast 2 and Yeast 3.
Additional file 2: Table S2
. The function of the identified genes for apple fruit development.
Additional file 3: Table S3
. The GO first-level items of the identified genes for apple fruit development.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Jiang, X., Zhang, X. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 23, 165 (2022). https://doi.org/10.1186/s12859-022-04696-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859-022-04696-w