Abstract
As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.
Similar content being viewed by others
References
Liu J, Wang W. Op-cluster: Clustering by tendency in high dimensional space. In Proc. ICDM 2003 Conference, Melbourne, USA, 2003, 187–194.
Haixun Wang, Wei Wang, Jiong Yang, Philip S Yu. Clustering by pattern similarity in large data sets. In Proc. the 2002 ACM SIGMOD Conference, Wisconsin, 2002, pp.394–405.
Jian Pei, **aoling Zhang, Moonjung Cho et al. Maple: A fast algorithm for maximal pattern-based clustering. In Proc. ICDM 2003 Conf., Florida, 2003, pp.259–266.
Haixun Wang, Fang Chu, Wei Fan, Philip S Yu, Jian Pei. A fast algorithm for subspace clustering by pattern similarity. In Proc. Scientific and Statistical Database Management Conference, Santorini Island, Greece, 2004, pp.51–62.
Lizhuang Zhao, Mohammed J Zaki. Tricluster: An effective algorithm for mining coherent clusters in 3d microarray data. In Proc. SIGMOD 2005 Conference, Maryland, USA, 2005, pp.51–62.
**ze Liu, Jiong Yang, Wei Wang. Biclustering in gene expression data by tendency. In Proc. 3rd Int. IEEE Computer Society Computational Systems Bioinformatics Conf., Stanford, USA, 2004, pp.182–193.
Selnur Erdal, Ozgur Ozturk, David L Armbruster et al. A time series analysis of microarray data. In Proc. 4th IEEE Int. Symp. Bioinformatics and Bioengineering Conference, Taichung, 2004, pp.366–378.
Daxin Jiang, Chun Tang, Aidong Zhang. Cluster analysis for gene expression data: A survey. IEEE Trans. Knowl. Data Eng., 2004, 16(11): 1370–1386.
Jason Ernst, Gerard J Nau, Ziv Bar-Joseph. Clustering short time series gene expression data. Bioinformatics, 2005, 21(Suppl): 159–168.
Yizong Cheng, George M Church. Biclustering of expression data. In Proc. 8th Int. Conf. Intelligent Systems for Molecular Biology 2000 Conference, San Diego, USA, 2000, pp.93–103.
Yu H, Luscombe N, Qian J, Gerstein M. Genomic analysis of gene expression relation-ships in transcriptional regulatory networks. Trends Genet, 2003, 19(8): 422–427.
Zhang Y, Zha H, Chu C H. A time-series biclustering algorithm for revealing co-regulated genes. In Proc. Int. Symp. Information and Technology: Coding and Computing, (ITCC 2005), Las Vegas, USA, 2005, pp.32–37.
Terry P Speed. Review of “stochastic complexity in statistical inquiry”. IEEE Trans. Information Theory, 1991, 37(6): 1739–1746.
Kesheng Wu, Ekow J. Otoo, Arie Shoshani. On the performance of bitmap indices for high cardinality attributes. In Proc. VLDB 2004 Conference, Canada, 2004, pp.24–35.
Kesheng Wu, Ekow J. Otoo, Arie Shoshani. Compressing bitmap indexes for faster search operations. In Proc. SSDBM 2002 Conference, Scotland, UK, 2002, pp.99–108.
Golub T R, Slonim D K, Tamayo P et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531–537.
Spellman P T, Sherlock G, Zhang M Q et al. Comprehensive identification of cell cycle-regulated genes of the yeast sacccha-romyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 1998, 1(9):3273–3297.
Levine E, Getz G, Domany E. Coupled two-way clustering analysis of gene microarray data. In Proc. Natural Academy of Sciences US, 2000, pp.12079–12084.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2006CB303103) and the National Natural Science Foundation of China under Grants No. 60573089, No. 60273079 and No. 60473074.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhao, YH., Wang, GR., Yin, Y. et al. A Novel Approach to Revealing Positive and Negative Co-Regulated Genes. J Comput Sci Technol 22, 261–272 (2007). https://doi.org/10.1007/s11390-007-9033-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-007-9033-7