Novel Biclustering Methods for Re-ordering Data Matrices

  • Chapter
  • First Online:
Optimization and Data Analysis in Biomedical Informatics

Part of the book series: Fields Institute Communications ((FIC,volume 63))

  • 1177 Accesses

Abstract

Clustering of large-scale data sets is an important technique that is used for analysis in a variety of fields. However, a number of these methods are based on heuristics for the identification of the best arrangement of data points. In this chapter, we present rigorous clustering methods based on the iterative optimal re-ordering of data matrices. Distinct Mixed-integer linear programming (MILP) models have been implemented to carry out clustering of dense data matrices (such as gene expression data) and sparse data matrices (such as drug discovery and toxicology). We present the capability of the optimal re-ordering methods on a wide array of data sets from systems biology, molecular discovery and toxicology.

Mathematics Subject Classification (2010): Primary 54C40, 14E20, Secondary 46E25, 20C20

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 53.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 53.49
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. A. Aggarwal, C.A. Floudas, Synthesis of general separation sequences - nonsharp separations. Comp. Chem. Eng. 14(6), 631–653 (1990)

    Article  Google Scholar 

  2. U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)

    Article  Google Scholar 

  3. M.R. Anderberg, Cluster Analysis for Applications (Academic, New York, 1973)

    MATH  Google Scholar 

  4. I.P. Androulakis, C.D. Maranas, C.A. Floudas, Prediction of oligopeptide conformations via deterministic global optimization. J. Glo. Opt. 11, 1–34 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  5. D.L. Applegate, R.E. Bixby, V. Chvatal, W.J. Cook, The Traveling Salesman Problem: A Computational Study (Princeton University Press, Princeton, 2007)

    Google Scholar 

  6. P. Armutlu, M.E. Ozdemir, F. Uney-Yuksektepe, I.H. Kavakli, M. Turkay, Classification of drug molecules considering their ic50 values using mixed-integer linear programming based hyper-boxes method. BMC Bioinformatics 9, 411 (2008)

    Article  Google Scholar 

  7. W. Bannwarth, B. Hinzen, R. Mannhold, H. Kubinyi, G. Folkers, Combinatorial Chemistry: From Theory to Application (Methods and Principles in Medicinal Chemistry) (Wiley, New Jersey, 2006)

    Google Scholar 

  8. Z. Bar-Joseph, E.D. Demaine, D.K. Gifford, N. Srebro, A.M. Hamel, T.S. Jaakola, K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics 19(9), 1070–1078 (2003)

    Article  Google Scholar 

  9. J.N. Bhuyan, V.V. Raghavan, K.E. Venkatesh, in Genetic Algorithm for Clustering with an Ordered Representation. Proceedings of the Fourth International Conference on Genetic Algorithms, p. 408–415 (1991)

    Google Scholar 

  10. S. Bleuler, A. Prelic, E. Zitzler, An EA Framework for Biclustering of Gene Expression Data. IEEE Congress on Evolutionary Computation, pp. 166–173 (2004)

    Google Scholar 

  11. M. J. Brauer, J. Yuan, B. Bennett, W. Lu, E. Kimball, D. Bostein, J.D. Rabinowitz, Conservation of the metabolomic response to starvation across two divergent microbes. Proc. Natl. Acad. Sci. 103, 19302–19307 (2006)

    Article  Google Scholar 

  12. R.B. Brem, L. Kruglyak, The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. 102(5), 1572–1577 (2005)

    Article  Google Scholar 

  13. S. Busygin, O.A. Prokopyev, P.M. Pardalos, Feature selection for consistent biclustering via fractional 0-1 programming. J. Comb. Opt. 10, 7–21 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. S. Busygin, O.A. Prokopyev, P.M. Pardalos, An optimization based approach for data classification. Opt. Meth. Soft. 22(1), 3–9 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. P. Carmona-Saez, R.D. Pasqual-Marqui, F. Tirado, J. Carazo, A. Pascual-Montano, Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 7, 78–96 (2006)

    Article  Google Scholar 

  16. Y. Cheng, G.M. Church, Biclustering of expression data. Proc. ISMB 2000, pp. 93–103 (2000)

    Google Scholar 

  17. A.R. Ciric, C.A. Floudas, A retrofit approach for heat-exchanger networks. Comp. Chem. Eng. 13(6), 703–715 (1989)

    Article  Google Scholar 

  18. S. Climer, W. Zhang, Rearrangement clustering: Pitfalls, remedies, and applications. J. Mach. Learn. Res. 7, 919–943 (2006)

    MathSciNet  MATH  Google Scholar 

  19. CPLEX, ILOG CPLEX 9.0 User’s Manual (2005)

    Google Scholar 

  20. M.S. Denison, J.P. Whitlock, Xenobiotic-inducible transcription of cytochrome P450 genes. J. Biol. Chem. 270(31), 18175–18178 (1995)

    Article  Google Scholar 

  21. P. DiMaggio, S. McAllister, C.A. Floudas, X.J. Feng, J. Rabinowitz, H. Rabitz, Biclustering via optimal re-ordering of data matrices in systems biology: Rigorous methods and comparative studies. BMC Bioinformatics 9, 458 (2008)

    Article  Google Scholar 

  22. P. DiMaggio, S. McAllister, C.A. Floudas, X.J. Feng, J. Rabinowitz, H. Rabitz, A network flow model for biclustering via optimal re-ordering of data matrices. J. Glo. Opt. 47, 343–354 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  23. P.A. DiMaggio, A. Subramani, R.S. Judson, C.A. Floudas, A novel framework for predicting in vivo toxicities from in vitro data using optimal methods for dense and sparse matrix reordering and logistic regression. Toxicol. Sci. 118, 251–265 (2010)

    Article  Google Scholar 

  24. P.A. DiMaggio, S.R. McAllister, C.A. Floudas, X.J. Feng, J.D. Rabinowitz, H.A. Rabitz, Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets. AIChE J 56, 405–418 (2010)

    Google Scholar 

  25. F. Divina, J. Aguilar, Biclustering of expression data with evolutionary computation. IEEE Trans. Knowl. Data Eng. 18(5), 590–602 (2006)

    Article  Google Scholar 

  26. A.W.F. Edwards, L.L. Cavalli-Sforza, A method for cluster analysis. Biometrics 21, 362–375 (1965)

    Google Scholar 

  27. M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)

    Article  Google Scholar 

  28. C.A. Floudas, Nonlinear and Mixed-Integer Optimization (Oxford University Press, New York, 1995)

    MATH  Google Scholar 

  29. C.A. Floudas, S.H. Anastasiadis, Synthesis of distillation sequences with several multicomponent feed and product streams. Chem. Eng. Sci. 43(9), 2407–2419 (1988)

    Article  Google Scholar 

  30. C.A. Floudas, I.E. Grossmann, Synthesis of flexible heat exchanger networks with uncertain flowrates and temperatures. Comp. Chem. Eng. 11(4), 319–336 (1987)

    Article  Google Scholar 

  31. L.R. Ford, D.R. Fulkerson, Flows in Networks (Princeton University Press, NJ, 1962)

    MATH  Google Scholar 

  32. H.K. Fung, C.A. Floudas, M.S. Taylor, L. Zhang, D. Morikis, Towards full sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008)

    Article  Google Scholar 

  33. C. Hansch, A. Leo, Exploring QSAR – Fundamentals and Applications in Chemistry and Biology (American Chemical Society, Washington, DC, 1995)

    Google Scholar 

  34. C. Hansch, B.R. Telzer, L. Zhang, Comparative qsar in toxicology: Examples from teratology and cancer chemotherapy of aniline mustards. Crit. Rev. Toxicol. 25, 67–89 (1995)

    Article  Google Scholar 

  35. J.A. Hartigan, M.A. Wong, Algorithm AS 136: A K-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  36. P. Honkakoski, M. Negishi, Regulation of cytochrome P450 (CYP) genes by nuclear receptors. Biochem. J. 347, 321–337 (2000)

    Article  Google Scholar 

  37. W.W. Huber, B. Grasl-kraupp, R. Schulte-hermann, Hepatocarcinogenic potential of di(2-ethylhexyl)phthalate in rodents and its implications on human risk. Crit. Rev. Toxicol. 26(4), 365–481 (1996)

    Article  Google Scholar 

  38. J. Huser, R. Mannhold, H. Kubinyi, G. Folkers, High-Throughput Screening in Drug Discovery (Methods and Principles in Medicinal Chemistry) (Wiley-VCH, NJ, 2006)

    Google Scholar 

  39. A.K. Jain, P.J. Flynn, in Image Segmentation Using Clustering, ed. by N. Ahuja, K. Bowyer. Advances in Image Understanding: A Festschrift for Azriel Rosenfeld (IEEE, NJ, 1996), pp. 65–83

    Google Scholar 

  40. A.K. Jain, J. Mao, Artificial neural networks: A tutorial. IEEE Comp. 29, 31–44 (1996)

    Article  Google Scholar 

  41. S.L. Janak, X. Lin, C.A. Floudas, Enhanced continuous-time unit-specific event based formulation for short-term scheduling of multipurpose batch processes: Resource constraints and mixed storage policies. Ind. Eng. Chem. Res. 43, 2516–2533 (2004)

    Article  Google Scholar 

  42. R. Judson, A. Richard, D.J. Dix, K. Houck, M. Martin, R. Kavlock, V. Dellarco, T. Henry, T. Holderman, P. Sayre, S. Tan, T. Carpenter, E. Smith, The toxicity data landscape for environmental chemicals. Environ. Health Perspect. 117, 685–695 (2009)

    Google Scholar 

  43. P. Kahraman, M. Turkay, Classification of 1,4-dihydropyridine calcium channel antagonists using the hyperbox approach. Ind. Eng. Chem. Res. 46, 4921–4929 (2007)

    Article  Google Scholar 

  44. R.W. Klein, R.C. Dubes, Experiments in projection and clustering by simulated annealing. Pattern Recogn. 22, 213–220 (1989)

    Article  MATH  Google Scholar 

  45. J.L. Klepeis, C.A. Floudas, Free energy calculations for peptides via deterministic global optimization. J. Chem. Phys. 110, 7491–7512 (1999)

    Article  Google Scholar 

  46. J.L. Klepeis, C.A. Floudas, Ab initio tertiary structure prediction of proteins. J. Glo. Opt. 25, 113–140 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  47. J.L. Klepeis, C.A. Floudas, ASTRO-FOLD: A combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys. J. 85, 2119–2146 (2003)

    Article  Google Scholar 

  48. J.L. Klepeis, C.A. Floudas, D. Morikis, J.D. Lambris, Predicting peptide structures using NMR data and deterministic global optimization. J. Comp. Chem. 20(13), 1354–1370 (1999)

    Article  Google Scholar 

  49. J.L. Klepeis, C.A. Floudas, D. Morikis, C.G. Tsokos, E. Argyropoulos, L. Spruce, J.D. Lambris, Integrated computational and experimenal approach for lead optimization and design of compstatin variants with improved activity. J. Am. Chem. Soc. 125(28), 8422–8423 (2003)

    Article  Google Scholar 

  50. Y. Kluger, R. Basri, J.T. Chang, M. Gerstein, Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 703–716 (2003)

    Article  Google Scholar 

  51. H. Kojima, E. Katsura, S. Takeuchi, K. Niiyama, K. Kobayashi, Screening for estrogen and androgen receptor activities in 200 pesticides by in vitro reporter gene assays using chinese hamster ovary cells. Environ. Health Perspect. 112(5), 524–531 (2004)

    Article  Google Scholar 

  52. A.C. Kokossis, C.A. Floudas, Optimization of complex reactor networks-II: nonisothermal operation. Chem. Eng. Sci. 49(7), 1037–1051 (1994)

    Article  Google Scholar 

  53. J.K. Lenstra, Clustering a data array and the traveling-salesman problem. Oper. Res. 22(2), 413–414 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  54. J.K Lenstra, A.H.G. Rinnooy Kan, Some simple applications of the traveling-salesman problem. Oper. Res. Q. 26(4), 717–733 (1975)

    Google Scholar 

  55. F. Liang, X. Feng, M. Lowry, H. Rabitz, Maximal use of minimal libraries through the adaptive substituent reordering algorithm. J. Phys. Chem. B 109, 5842–5854 (2005)

    Article  Google Scholar 

  56. X. Lin, C.A. Floudas, Design, synthesis and scheduling of multipurpose batch plants via an effective continuous-time formulation. Comp. Chem. Eng. 25, 665–674 (2001)

    Article  Google Scholar 

  57. M. Lutz, T. Kenakin, Quantitative Molecular Pharmacology and Informatics in Drug Discovery (Wiley, NJ, 2001)

    Google Scholar 

  58. S.C. Madeira, A.L. Oliveira, Biclustering algorithms for biological data analysis: A survey. IEE-ACM Trans. Comp. Bio. 1(1), 24–45 (2004)

    Article  Google Scholar 

  59. W.T. McCormick Jr., P.J. Schweitzer, T.W. White, Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20(5), 993–1009 (1972)

    Article  MATH  Google Scholar 

  60. M. Mönnigmann, C.A. Floudas, Protein loop structure prediction with flexible stem geometries. Protein Struct. Funct. Bioinformatics 61, 748–762 (2005)

    Article  Google Scholar 

  61. P. Moscato, A. Mendes, R. Berretta, Benchmarking a Memetic algorithm for ordering microarray data. Biosystems 88(1), 56–75 (2007)

    Article  Google Scholar 

  62. R. Ng, Drugs – From Discovery to Approval (WileyLiss, NJ, 2006)

    Google Scholar 

  63. P.M. Pardalos, V. Boginski, A. Vazakopoulos, Data Mining in Biomedicine (Springer, Berlin, 2007)

    Book  MATH  Google Scholar 

  64. R. Perkins, H. Fang, W. Tong, W. Welsh, Quantitative structure-activity relationship methods: perspectives on drug discovery and toxicology. Environ. Toxicol. Chem. 22, 1666–1679 (2003)

    Article  Google Scholar 

  65. A. Prelic, S. Bleuler, P. Zimmermann, A. Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, E. Zitzler, A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)

    Article  Google Scholar 

  66. V.V. Raghavan, K. Birchand, in A Clustering Strategy Based on a Formalism of the Reproductive Process in a Natural System. Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)

    Google Scholar 

  67. D.J. Reiss, N.S. Baliga, R. Bonneau, Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 7, 280–302 (2006)

    Article  Google Scholar 

  68. G. Salton, Developments in automatic text retrieval. Science 253, 974–980 (1991)

    MathSciNet  Google Scholar 

  69. N. Shenvi, J.M. Geremia, H. Rabitz, Substituent ordering and interpolation in molecular library optimization. J. Phys. Chem. 107, 2066–2074 (2003)

    Article  Google Scholar 

  70. N. Shenvi, J.M. Geremia, H. Rabitz, Substituent ordering and interpolation in molecular library optimization. J. Phys. Chem. A 107, 2066 (2003)

    Article  Google Scholar 

  71. H.D. Sherali, J. Desai, A global optimization RLT-based approach for solving the fuzzy clustering problem. J. Glo. Opt. 33, 597–615 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  72. H.D. Sherali, J. Desai, A global optimization RLT-based approach for solving the hard clustering problem. J. Glo. Opt. 32, 281–306 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  73. N. Slonim, G.S. Atwal, G. Tkacik, W. Bialek, Information-based clustering. Proc. Natl. Acad. Sci. 102(51), 18297–18302 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  74. A. Subramani, P.A. DiMaggio Jr., C.A. Floudas, Selecting high quality structures from diverse conformational ensembles. Biophys. J. 97, 1728–1736 (2009)

    Article  Google Scholar 

  75. S. Takeuchi, T. Matsuda, S. Kobayashi, T. Takahashi, H. Kojima, In vitro screening of 200 pesticides for agonistic activity in mouse peroxisome proliferator-activated receptor PPARa and PPARg and quantitative analysis of in vivo induction pathway. Toxicol. Appl. Pharmacol. 217, 235–244 (2008)

    Article  Google Scholar 

  76. M.P. Tan, J.R. Broach, C.A. Floudas, A novel clustering approach and prediction of optimal number of clusters: Global optimum search with enhanced positioning. J. Glo. Opt. 39, 323–346 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  77. M.P. Tan, J.R. Broach, C.A. Floudas, Evaluation of normalization and pre-clustering issues in a novel clustering approach: Global optimum search with enhanced positioning. J. Bioin. Comp. Bio 5(4), 895–913 (2007)

    Article  Google Scholar 

  78. M.P. Tan, E. Smith, J.R. Broach, C.A. Floudas, Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures. BMC Bioinformatics 9, 268–283 (2008)

    Article  Google Scholar 

  79. A. Tanay, R. Sharan, R. Shamir, Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, S136–S144 (2002)

    Article  Google Scholar 

  80. L.E. Thummel, G.R. Wilkinson, In vitro and in vivo drug interactions involving human CYP3A. Annu. Rev. Pharmacol. Toxicol. 38, 389–430 (1998)

    Article  Google Scholar 

  81. W. Tong, W. Welsh, L. Shi, H. Fang, R. Perkins, Structure-activity relationship approaches and applications. Environ. Toxicol. Chem. 22, 1680–1695 (2003)

    Article  Google Scholar 

  82. H.L. Turner, T.C. Bailey, W.J. Krzanowski, C.A. Hemingway, Biclustering models for structured microarray data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(4), 316–329 (2005)

    Article  Google Scholar 

  83. L.J. van’t Veer, H. Dai, M.J. Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)

    Google Scholar 

  84. J.H. Wolfe, Pattern clustering by multivariate mixture analysis. Multivariate Behav. Res. 5, 329–350 (1970)

    Article  Google Scholar 

  85. S. Yoon, C. Nardini, L. Benini, G. De Micheli, Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(4), 339–354 (2005)

    Article  Google Scholar 

  86. Y. Zhang, J. Skolnick, SPICKER: A clustering approach to identify near-native protein folds. J. Comput. Chem. 25, 865–871 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

CAF gratefully acknowledges financial support from the National Science Foundation, National Institutes of Health (R01 GM52032; R24 GM069736) and U.S. Environmental Protection Agency EPA (GAD R 832721-010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christodoulos A. Floudas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

DiMaggio, P.A., Subramani, A., Floudas, C.A. (2013). Novel Biclustering Methods for Re-ordering Data Matrices. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_1

Download citation

Publish with us

Policies and ethics

Navigation