Novel Biclustering Methods for Re-ordering Data Matrices

DiMaggio, Peter A.; Subramani, Ashwin; Floudas, Christodoulos A.

doi:10.1007/978-1-4614-4133-5_1

Peter A. DiMaggio Jr.⁴,
Ashwin Subramani⁵ &
Christodoulos A. Floudas⁵

Part of the book series: Fields Institute Communications ((FIC,volume 63))

1177 Accesses

Abstract

Clustering of large-scale data sets is an important technique that is used for analysis in a variety of fields. However, a number of these methods are based on heuristics for the identification of the best arrangement of data points. In this chapter, we present rigorous clustering methods based on the iterative optimal re-ordering of data matrices. Distinct Mixed-integer linear programming (MILP) models have been implemented to carry out clustering of dense data matrices (such as gene expression data) and sparse data matrices (such as drug discovery and toxicology). We present the capability of the optimal re-ordering methods on a wide array of data sets from systems biology, molecular discovery and toxicology.

Mathematics Subject Classification (2010): Primary 54C40, 14E20, Secondary 46E25, 20C20

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Hardcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Biclustering via structured regularized matrix decomposition

Article 29 April 2022

A systematic comparative evaluation of biclustering techniques

Article Open access 23 January 2017

Combinatorial Optimization Algorithms to Mine a Sub-Matrix of Maximal Sum

References

A. Aggarwal, C.A. Floudas, Synthesis of general separation sequences - nonsharp separations. Comp. Chem. Eng. 14(6), 631–653 (1990)
Article Google Scholar
U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)
Article Google Scholar
M.R. Anderberg, Cluster Analysis for Applications (Academic, New York, 1973)
MATH Google Scholar
I.P. Androulakis, C.D. Maranas, C.A. Floudas, Prediction of oligopeptide conformations via deterministic global optimization. J. Glo. Opt. 11, 1–34 (1997)
Article MathSciNet MATH Google Scholar
D.L. Applegate, R.E. Bixby, V. Chvatal, W.J. Cook, The Traveling Salesman Problem: A Computational Study (Princeton University Press, Princeton, 2007)
Google Scholar
P. Armutlu, M.E. Ozdemir, F. Uney-Yuksektepe, I.H. Kavakli, M. Turkay, Classification of drug molecules considering their ic50 values using mixed-integer linear programming based hyper-boxes method. BMC Bioinformatics 9, 411 (2008)
Article Google Scholar
W. Bannwarth, B. Hinzen, R. Mannhold, H. Kubinyi, G. Folkers, Combinatorial Chemistry: From Theory to Application (Methods and Principles in Medicinal Chemistry) (Wiley, New Jersey, 2006)
Google Scholar
Z. Bar-Joseph, E.D. Demaine, D.K. Gifford, N. Srebro, A.M. Hamel, T.S. Jaakola, K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics 19(9), 1070–1078 (2003)
Article Google Scholar
J.N. Bhuyan, V.V. Raghavan, K.E. Venkatesh, in Genetic Algorithm for Clustering with an Ordered Representation. Proceedings of the Fourth International Conference on Genetic Algorithms, p. 408–415 (1991)
Google Scholar
S. Bleuler, A. Prelic, E. Zitzler, An EA Framework for Biclustering of Gene Expression Data. IEEE Congress on Evolutionary Computation, pp. 166–173 (2004)
Google Scholar
M. J. Brauer, J. Yuan, B. Bennett, W. Lu, E. Kimball, D. Bostein, J.D. Rabinowitz, Conservation of the metabolomic response to starvation across two divergent microbes. Proc. Natl. Acad. Sci. 103, 19302–19307 (2006)
Article Google Scholar
R.B. Brem, L. Kruglyak, The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. 102(5), 1572–1577 (2005)
Article Google Scholar
S. Busygin, O.A. Prokopyev, P.M. Pardalos, Feature selection for consistent biclustering via fractional 0-1 programming. J. Comb. Opt. 10, 7–21 (2005)
Article MathSciNet MATH Google Scholar
S. Busygin, O.A. Prokopyev, P.M. Pardalos, An optimization based approach for data classification. Opt. Meth. Soft. 22(1), 3–9 (2007)
Article MathSciNet MATH Google Scholar
P. Carmona-Saez, R.D. Pasqual-Marqui, F. Tirado, J. Carazo, A. Pascual-Montano, Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 7, 78–96 (2006)
Article Google Scholar
Y. Cheng, G.M. Church, Biclustering of expression data. Proc. ISMB 2000, pp. 93–103 (2000)
Google Scholar
A.R. Ciric, C.A. Floudas, A retrofit approach for heat-exchanger networks. Comp. Chem. Eng. 13(6), 703–715 (1989)
Article Google Scholar
S. Climer, W. Zhang, Rearrangement clustering: Pitfalls, remedies, and applications. J. Mach. Learn. Res. 7, 919–943 (2006)
MathSciNet MATH Google Scholar
CPLEX, ILOG CPLEX 9.0 User’s Manual (2005)
Google Scholar
M.S. Denison, J.P. Whitlock, Xenobiotic-inducible transcription of cytochrome P450 genes. J. Biol. Chem. 270(31), 18175–18178 (1995)
Article Google Scholar
P. DiMaggio, S. McAllister, C.A. Floudas, X.J. Feng, J. Rabinowitz, H. Rabitz, Biclustering via optimal re-ordering of data matrices in systems biology: Rigorous methods and comparative studies. BMC Bioinformatics 9, 458 (2008)
Article Google Scholar
P. DiMaggio, S. McAllister, C.A. Floudas, X.J. Feng, J. Rabinowitz, H. Rabitz, A network flow model for biclustering via optimal re-ordering of data matrices. J. Glo. Opt. 47, 343–354 (2010)
Article MathSciNet MATH Google Scholar
P.A. DiMaggio, A. Subramani, R.S. Judson, C.A. Floudas, A novel framework for predicting in vivo toxicities from in vitro data using optimal methods for dense and sparse matrix reordering and logistic regression. Toxicol. Sci. 118, 251–265 (2010)
Article Google Scholar
P.A. DiMaggio, S.R. McAllister, C.A. Floudas, X.J. Feng, J.D. Rabinowitz, H.A. Rabitz, Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets. AIChE J 56, 405–418 (2010)
Google Scholar
F. Divina, J. Aguilar, Biclustering of expression data with evolutionary computation. IEEE Trans. Knowl. Data Eng. 18(5), 590–602 (2006)
Article Google Scholar
A.W.F. Edwards, L.L. Cavalli-Sforza, A method for cluster analysis. Biometrics 21, 362–375 (1965)
Google Scholar
M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)
Article Google Scholar
C.A. Floudas, Nonlinear and Mixed-Integer Optimization (Oxford University Press, New York, 1995)
MATH Google Scholar
C.A. Floudas, S.H. Anastasiadis, Synthesis of distillation sequences with several multicomponent feed and product streams. Chem. Eng. Sci. 43(9), 2407–2419 (1988)
Article Google Scholar
C.A. Floudas, I.E. Grossmann, Synthesis of flexible heat exchanger networks with uncertain flowrates and temperatures. Comp. Chem. Eng. 11(4), 319–336 (1987)
Article Google Scholar
L.R. Ford, D.R. Fulkerson, Flows in Networks (Princeton University Press, NJ, 1962)
MATH Google Scholar
H.K. Fung, C.A. Floudas, M.S. Taylor, L. Zhang, D. Morikis, Towards full sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008)
Article Google Scholar
C. Hansch, A. Leo, Exploring QSAR – Fundamentals and Applications in Chemistry and Biology (American Chemical Society, Washington, DC, 1995)
Google Scholar
C. Hansch, B.R. Telzer, L. Zhang, Comparative qsar in toxicology: Examples from teratology and cancer chemotherapy of aniline mustards. Crit. Rev. Toxicol. 25, 67–89 (1995)
Article Google Scholar
J.A. Hartigan, M.A. Wong, Algorithm AS 136: A K-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Article MATH Google Scholar
P. Honkakoski, M. Negishi, Regulation of cytochrome P450 (CYP) genes by nuclear receptors. Biochem. J. 347, 321–337 (2000)
Article Google Scholar
W.W. Huber, B. Grasl-kraupp, R. Schulte-hermann, Hepatocarcinogenic potential of di(2-ethylhexyl)phthalate in rodents and its implications on human risk. Crit. Rev. Toxicol. 26(4), 365–481 (1996)
Article Google Scholar
J. Huser, R. Mannhold, H. Kubinyi, G. Folkers, High-Throughput Screening in Drug Discovery (Methods and Principles in Medicinal Chemistry) (Wiley-VCH, NJ, 2006)
Google Scholar
A.K. Jain, P.J. Flynn, in Image Segmentation Using Clustering, ed. by N. Ahuja, K. Bowyer. Advances in Image Understanding: A Festschrift for Azriel Rosenfeld (IEEE, NJ, 1996), pp. 65–83
Google Scholar
A.K. Jain, J. Mao, Artificial neural networks: A tutorial. IEEE Comp. 29, 31–44 (1996)
Article Google Scholar
S.L. Janak, X. Lin, C.A. Floudas, Enhanced continuous-time unit-specific event based formulation for short-term scheduling of multipurpose batch processes: Resource constraints and mixed storage policies. Ind. Eng. Chem. Res. 43, 2516–2533 (2004)
Article Google Scholar
R. Judson, A. Richard, D.J. Dix, K. Houck, M. Martin, R. Kavlock, V. Dellarco, T. Henry, T. Holderman, P. Sayre, S. Tan, T. Carpenter, E. Smith, The toxicity data landscape for environmental chemicals. Environ. Health Perspect. 117, 685–695 (2009)
Google Scholar
P. Kahraman, M. Turkay, Classification of 1,4-dihydropyridine calcium channel antagonists using the hyperbox approach. Ind. Eng. Chem. Res. 46, 4921–4929 (2007)
Article Google Scholar
R.W. Klein, R.C. Dubes, Experiments in projection and clustering by simulated annealing. Pattern Recogn. 22, 213–220 (1989)
Article MATH Google Scholar
J.L. Klepeis, C.A. Floudas, Free energy calculations for peptides via deterministic global optimization. J. Chem. Phys. 110, 7491–7512 (1999)
Article Google Scholar
J.L. Klepeis, C.A. Floudas, Ab initio tertiary structure prediction of proteins. J. Glo. Opt. 25, 113–140 (2003)
Article MathSciNet MATH Google Scholar
J.L. Klepeis, C.A. Floudas, ASTRO-FOLD: A combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys. J. 85, 2119–2146 (2003)
Article Google Scholar
J.L. Klepeis, C.A. Floudas, D. Morikis, J.D. Lambris, Predicting peptide structures using NMR data and deterministic global optimization. J. Comp. Chem. 20(13), 1354–1370 (1999)
Article Google Scholar
J.L. Klepeis, C.A. Floudas, D. Morikis, C.G. Tsokos, E. Argyropoulos, L. Spruce, J.D. Lambris, Integrated computational and experimenal approach for lead optimization and design of compstatin variants with improved activity. J. Am. Chem. Soc. 125(28), 8422–8423 (2003)
Article Google Scholar
Y. Kluger, R. Basri, J.T. Chang, M. Gerstein, Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 703–716 (2003)
Article Google Scholar
H. Kojima, E. Katsura, S. Takeuchi, K. Niiyama, K. Kobayashi, Screening for estrogen and androgen receptor activities in 200 pesticides by in vitro reporter gene assays using chinese hamster ovary cells. Environ. Health Perspect. 112(5), 524–531 (2004)
Article Google Scholar
A.C. Kokossis, C.A. Floudas, Optimization of complex reactor networks-II: nonisothermal operation. Chem. Eng. Sci. 49(7), 1037–1051 (1994)
Article Google Scholar
J.K. Lenstra, Clustering a data array and the traveling-salesman problem. Oper. Res. 22(2), 413–414 (1974)
Article MathSciNet MATH Google Scholar
J.K Lenstra, A.H.G. Rinnooy Kan, Some simple applications of the traveling-salesman problem. Oper. Res. Q. 26(4), 717–733 (1975)
Google Scholar
F. Liang, X. Feng, M. Lowry, H. Rabitz, Maximal use of minimal libraries through the adaptive substituent reordering algorithm. J. Phys. Chem. B 109, 5842–5854 (2005)
Article Google Scholar
X. Lin, C.A. Floudas, Design, synthesis and scheduling of multipurpose batch plants via an effective continuous-time formulation. Comp. Chem. Eng. 25, 665–674 (2001)
Article Google Scholar
M. Lutz, T. Kenakin, Quantitative Molecular Pharmacology and Informatics in Drug Discovery (Wiley, NJ, 2001)
Google Scholar
S.C. Madeira, A.L. Oliveira, Biclustering algorithms for biological data analysis: A survey. IEE-ACM Trans. Comp. Bio. 1(1), 24–45 (2004)
Article Google Scholar
W.T. McCormick Jr., P.J. Schweitzer, T.W. White, Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20(5), 993–1009 (1972)
Article MATH Google Scholar
M. Mönnigmann, C.A. Floudas, Protein loop structure prediction with flexible stem geometries. Protein Struct. Funct. Bioinformatics 61, 748–762 (2005)
Article Google Scholar
P. Moscato, A. Mendes, R. Berretta, Benchmarking a Memetic algorithm for ordering microarray data. Biosystems 88(1), 56–75 (2007)
Article Google Scholar
R. Ng, Drugs – From Discovery to Approval (WileyLiss, NJ, 2006)
Google Scholar
P.M. Pardalos, V. Boginski, A. Vazakopoulos, Data Mining in Biomedicine (Springer, Berlin, 2007)
Book MATH Google Scholar
R. Perkins, H. Fang, W. Tong, W. Welsh, Quantitative structure-activity relationship methods: perspectives on drug discovery and toxicology. Environ. Toxicol. Chem. 22, 1666–1679 (2003)
Article Google Scholar
A. Prelic, S. Bleuler, P. Zimmermann, A. Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, E. Zitzler, A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
V.V. Raghavan, K. Birchand, in A Clustering Strategy Based on a Formalism of the Reproductive Process in a Natural System. Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)
Google Scholar
D.J. Reiss, N.S. Baliga, R. Bonneau, Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 7, 280–302 (2006)
Article Google Scholar
G. Salton, Developments in automatic text retrieval. Science 253, 974–980 (1991)
MathSciNet Google Scholar
N. Shenvi, J.M. Geremia, H. Rabitz, Substituent ordering and interpolation in molecular library optimization. J. Phys. Chem. 107, 2066–2074 (2003)
Article Google Scholar
N. Shenvi, J.M. Geremia, H. Rabitz, Substituent ordering and interpolation in molecular library optimization. J. Phys. Chem. A 107, 2066 (2003)
Article Google Scholar
H.D. Sherali, J. Desai, A global optimization RLT-based approach for solving the fuzzy clustering problem. J. Glo. Opt. 33, 597–615 (2005)
Article MathSciNet MATH Google Scholar
H.D. Sherali, J. Desai, A global optimization RLT-based approach for solving the hard clustering problem. J. Glo. Opt. 32, 281–306 (2005)
Article MathSciNet MATH Google Scholar
N. Slonim, G.S. Atwal, G. Tkacik, W. Bialek, Information-based clustering. Proc. Natl. Acad. Sci. 102(51), 18297–18302 (2005)
Article MathSciNet MATH Google Scholar
A. Subramani, P.A. DiMaggio Jr., C.A. Floudas, Selecting high quality structures from diverse conformational ensembles. Biophys. J. 97, 1728–1736 (2009)
Article Google Scholar
S. Takeuchi, T. Matsuda, S. Kobayashi, T. Takahashi, H. Kojima, In vitro screening of 200 pesticides for agonistic activity in mouse peroxisome proliferator-activated receptor PPARa and PPARg and quantitative analysis of in vivo induction pathway. Toxicol. Appl. Pharmacol. 217, 235–244 (2008)
Article Google Scholar
M.P. Tan, J.R. Broach, C.A. Floudas, A novel clustering approach and prediction of optimal number of clusters: Global optimum search with enhanced positioning. J. Glo. Opt. 39, 323–346 (2007)
Article MathSciNet MATH Google Scholar
M.P. Tan, J.R. Broach, C.A. Floudas, Evaluation of normalization and pre-clustering issues in a novel clustering approach: Global optimum search with enhanced positioning. J. Bioin. Comp. Bio 5(4), 895–913 (2007)
Article Google Scholar
M.P. Tan, E. Smith, J.R. Broach, C.A. Floudas, Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures. BMC Bioinformatics 9, 268–283 (2008)
Article Google Scholar
A. Tanay, R. Sharan, R. Shamir, Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, S136–S144 (2002)
Article Google Scholar
L.E. Thummel, G.R. Wilkinson, In vitro and in vivo drug interactions involving human CYP3A. Annu. Rev. Pharmacol. Toxicol. 38, 389–430 (1998)
Article Google Scholar
W. Tong, W. Welsh, L. Shi, H. Fang, R. Perkins, Structure-activity relationship approaches and applications. Environ. Toxicol. Chem. 22, 1680–1695 (2003)
Article Google Scholar
H.L. Turner, T.C. Bailey, W.J. Krzanowski, C.A. Hemingway, Biclustering models for structured microarray data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(4), 316–329 (2005)
Article Google Scholar
L.J. van’t Veer, H. Dai, M.J. Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Google Scholar
J.H. Wolfe, Pattern clustering by multivariate mixture analysis. Multivariate Behav. Res. 5, 329–350 (1970)
Article Google Scholar
S. Yoon, C. Nardini, L. Benini, G. De Micheli, Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(4), 339–354 (2005)
Article Google Scholar
Y. Zhang, J. Skolnick, SPICKER: A clustering approach to identify near-native protein folds. J. Comput. Chem. 25, 865–871 (2004)
Article Google Scholar

Download references

Acknowledgements

CAF gratefully acknowledges financial support from the National Science Foundation, National Institutes of Health (R01 GM52032; R24 GM069736) and U.S. Environmental Protection Agency EPA (GAD R 832721-010).

Author information

Authors and Affiliations

Department of Molecular Biology, Princeton University, Princeton, NJ, 08540, USA
Peter A. DiMaggio Jr.
Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08540, USA
Ashwin Subramani & Christodoulos A. Floudas

Authors

Peter A. DiMaggio Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Ashwin Subramani
View author publications
You can also search for this author in PubMed Google Scholar
Christodoulos A. Floudas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christodoulos A. Floudas .

Editor information

Editors and Affiliations

, Department of Industrial & Systems Engin, University of Florida, Weil Hall 401, Gainesville, 32611, Florida, USA
Panos M. Pardalos
, Department of Mathematics, University of Waterloo, University Avenue West 200, Waterloo, N2L 3G1, Ontario, Canada
Thomas F. Coleman
, Department of Industrial Engineering, University of Central Florida, Central Florida Blvd 4000, Orlando, 32816, Florida, USA
Petros Xanthopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

DiMaggio, P.A., Subramani, A., Floudas, C.A. (2013). Novel Biclustering Methods for Re-ordering Data Matrices. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_1

Download citation

DOI: https://doi.org/10.1007/978-1-4614-4133-5_1
Published: 20 July 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4132-8
Online ISBN: 978-1-4614-4133-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Novel Biclustering Methods for Re-ordering Data Matrices

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Biclustering via structured regularized matrix decomposition

A systematic comparative evaluation of biclustering techniques

Combinatorial Optimization Algorithms to Mine a Sub-Matrix of Maximal Sum

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Novel Biclustering Methods for Re-ordering Data Matrices

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Biclustering via structured regularized matrix decomposition

A systematic comparative evaluation of biclustering techniques

Combinatorial Optimization Algorithms to Mine a Sub-Matrix of Maximal Sum

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation