Computational Optimal Transport

Tupitsa, Nazarii; Dvurechensky, Pavel; Dvinskikh, Darina; Gasnikov, Alexander

doi:10.1007/978-3-030-54621-2_861-1

Nazarii Tupitsa³,
Pavel Dvurechensky⁴,
Darina Dvinskikh⁵ &
…
Alexander Gasnikov^6,7,8

86 Accesses

Abstract

The optimal transport (OT) problem is a classical optimization problem having the form of linear programming. Machine learning applications put forward new computational challenges in their solution. In particular, the OT problem defines a distance between real-world objects such as images, videos, texts, etc., modeled as probability distributions. In this case, the large dimension of the corresponding optimization problem does not allow applying classical methods such as network simplex or interior-point methods. This challenge was overcome by introducing entropic regularization and using the efficient Sinkhorn algorithm to solve the regularized problem. A flexible alternative is the accelerated primal–dual gradient method, which can use any strongly convex regularization. These algorithms and other related problems such as approximating the Wasserstein barycenter together with efficient algorithms for its solution, including decentralized distributed algorithms, are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. SIAM J Math Anal 43(2):904–924
Article MathSciNet MATH Google Scholar
Allen-Zhu Z, Li Y, Oliveira R, Wigderson A (2017) Much faster algorithms for matrix scaling. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp 890–901. https://arxiv.org/abs/1704.02315
Altschuler J, Bach F, Rudi A, Weed J (2018) Approximating the quadratic transportation metric in near-linear time. ar**v preprint ar**v:1810.10046
Google Scholar
Altschuler J, Weed J, Rigollet P (2017) Near-linear time approxfimation algorithms for optimal transport via Sinkhorn iteration. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. Curran Associates, Inc., pp 1961–1971. https://arxiv.org/abs/1705.09634
Google Scholar
Ambrosio L, Brué E, Semola D (2021) Lectures on Optimal Transport. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-030-72162-6
Book MATH Google Scholar
Benamou J-D, Carlier G, Cuturi M, Nenna L, Peyré G (2015) Iterative Bregman projections for regularized transportation problems. SIAM J Sci Comput 37(2):A1111–A1138
Article MathSciNet MATH Google Scholar
Bigot J, Cazelles E, Papadakis N (2019) Data-driven regularization of Wasserstein barycenters with an application to multivariate density registration. Inf Inference: J IMA 8(4):719–755
Article MathSciNet MATH Google Scholar
Blanchet J, Jambulapati A, Kent C, Sidford A (2018) Towards optimal running times for optimal transport. ar**v preprint ar**v:1810.07717
Google Scholar
Blondel M, Seguy V, Rolet A (2018) Smooth and sparse optimal transport. In: International Conference on Artificial Intelligence and Statistics. PMLR, pp 880–889
Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, NY
Book MATH Google Scholar
Chambolle A, Contreras JP (2022) Accelerated Bregman primal-dual methods applied to optimal transport and Wasserstein barycenter problems
Google Scholar
Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J Math Imaging Vision 40(1):120–145
Article MathSciNet MATH Google Scholar
Cohen MB, Madry A, Tsipras D, Vladu A (2017) Matrix scaling and balancing via box constrained Newton’s method and interior point methods. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp 902–913. https://arxiv.org/abs/1704.02310
Cominetti R, San Martin J (1994) Asymptotic analysis of the exponential penalty trajectory in linear programming. Math Program 67:169–187
Article MathSciNet MATH Google Scholar
Cuturi M (2013) Sinkhorn distances: lightspeed computation of optimal transport. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., New York, pp 2292–2300
Google Scholar
Cuturi M, Peyré G (2016) A smoothed dual approach for variational Wasserstein problems. SIAM J Imaging Sci 9(1):320–343
Article MathSciNet MATH Google Scholar
Del Barrio E, Cuesta-Albertos JA, Matrán C, Mayo-Íscar A (2019) Robust clustering tools based on optimal transportation. Stat Comput 29(1):139–160
Article MathSciNet MATH Google Scholar
Dvinskikh D, Gorbunov E, Gasnikov A, Dvurechensky P, Uribe CA (2019) On primal and dual approaches for distributed stochastic convex optimization over networks. In: 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, pp 7435–7440
Google Scholar
Dvinskikh D, Tiapkin D (2021) Improved complexity bounds in Wasserstein barycenter problem. In: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics. PMLR, pp 1738–1746
Google Scholar
Dvurechenskii P, Dvinskikh D, Gasnikov A, Uribe C, Nedich A (2018) Decentralize and randomize: faster algorithm for Wasserstein barycenters. Adv Neural Inf Process Syst 31:10760–10770
Google Scholar
Dvurechensky P, Gasnikov A, Kroshnin A (2018) Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Jennifer D, Andreas K (eds) Proceedings of the 35th International Conference on Machine Learning, vol 80, pp 1367–1376. ar**v:1802.04367
Google Scholar
Fang S-C, Rajasekera J, Tsao H-S (1997) Entropy optimization and mathematical programming. Kluwer’s International Series. https://epubs.siam.org/doi/10.1137/130929886
Book MATH Google Scholar
Ferradans S, Papadakis N, Peyré G, Aujol J-F (2014) Regularized discrete optimal transport. SIAM J Imaging Sci 7(3):1853–1882
Article MathSciNet MATH Google Scholar
Franklin J, Lorenz J (1989) On the scaling of multidimensional matrices. Linear Algebra Appl 114:717–735. Special Issue Dedicated to Alan J. Hoffman
Google Scholar
Fréchet M (1948) Les éléments aléatoires de nature quelconque dans un espace distancié. Ann l’inst Henri Poincaré 10(4):215–310
MATH Google Scholar
Gabow HN, Tarjan RE (1991) Faster scaling algorithms for general graph matching problems. J ACM (JACM) 38(4):815–853
Article MathSciNet MATH Google Scholar
Gasnikov AV, Gasnikova EV, Nesterov YE, Chernov AV (2016) Efficient numerical methods for entropy-linear programming problems. Comput Math Math Phys 56(4):514–524
Article MathSciNet MATH Google Scholar
Gasnikov A, Dvurechensky P, Kamzolov D, Nesterov Y, Spokoiny V, Stetsyuk P, Suvorikova A, Chernov A (2015) Universal method with inexact oracle and its applications for searching equilibriums in multistage transport problems. ar**v preprint ar**v:1506.00292
Google Scholar
Genevay A, Cuturi M, Peyré G, Bach F (2016) Stochastic optimization for large-scale optimal transport. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates, Inc., New York, pp 3440–3448
Google Scholar
Gorbunov E, Rogozin A, Beznosikov A, Dvinskikh D, Gasnikov A (2022) Recent theoretical advances in decentralized distributed convex optimization. In: High-dimensional optimization and probability. Springer International Publishing, Cham, pp 253–325. https://springer.longhoe.net/chapter/10. 1007/978-3-031-00832-0_8#copyright-information
Gramfort A, Peyré G, Cuturi M (2015) Fast optimal transport averaging of neuroimaging data. In: International Conference on Information Processing in Medical Imaging. Springer, pp 261–272
Google Scholar
Guminov S, Dvurechensky P, Gasnikov A (2019) Accelerated alternating minimization. ar**v preprint ar**v:1906.03622
Google Scholar
Guminov S, Dvurechensky P, Tupitsa N, Gasnikov A (2021) On a combination of alternating minimization and Nesterov’s momentum. In: International Conference on Machine Learning. PMLR, pp 3886–3898
MATH Google Scholar
Guo W, Ho N, Jordan M (2020) Fast algorithms for computational optimal transport and Wasserstein barycenter. In: Chiappa S, Calandra R (eds) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol 108. PMLR, 26–28 Aug 2020, pp 2088–2097
Google Scholar
Heinemann F, Munk A, Zemel Y (2020) Randomised Wasserstein barycenter computation: resampling with statistical guarantees. ar**v preprint ar**v:2012.06397
Google Scholar
Hopcroft JE, Karp RM (1973) An nˆ5/2 algorithm for maximum matchings in bipartite graphs. SIAM J Comput 2(4):225–231
Article MathSciNet MATH Google Scholar
Jambulapati A, Sidford A, Tian K (2019) A direct tilde \(\widetilde {O}(1/\varepsilon )\) iteration parallel algorithm for optimal transport. In: Advances in neural information processing systems, pp 11359–11370
Google Scholar
Kalantari B, Lari I, Ricca F, Simeone B (2008) On the complexity of general matrix scaling and entropy minimization via the RAS algorithm. Math Program 112(2):371–401
Article MathSciNet MATH Google Scholar
Kantorovich L (1942) On the translocation of masses. (Doklady) Acad Sci URSS (NS) 37:199–201
Google Scholar
Kantorovich LV (1960) Mathematical methods of organizing and planning production. Manag Sci 6(4):366–422
Article MathSciNet MATH Google Scholar
Knight PA (2008) The Sinkhorn–Knopp algorithm: convergence and applications. SIAM J Matrix Anal Appl 30(1):261–275
Article MathSciNet MATH Google Scholar
Kroshnin A, Dvinskikh D, Tupitsa N, Dvurechensky P, Gasnikov A, Uribe C (2019) On the complexity of approximating Wasserstein barycenters. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning, vol 97, pp 3530–3540. ar**v:1901.08686
Google Scholar
Le Gouic T, Loubes J-M (2017) Existence and consistency of Wasserstein barycenters. Probab Theory Relat Fields 168(3–4):901–917
MathSciNet MATH Google Scholar
Lee YT, Sidford A (2014) Path finding methods for linear programming: solving linear programs in \(\tilde {O}(\sqrt {\text{rank}})\) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp 424–433
Google Scholar
Léonard C (2013) A survey of the Schr∖” odinger problem and some of its connections with optimal transport. ar**v preprint ar**v:1308.0215
Google Scholar
Lin T, Ho N, Chen X, Cuturi M, Jordan MI (2020) Fixed-support Wasserstein barycenters: computational hardness and fast algorithm. Adv Neural Inf Process Syst 33:5368–5380
Google Scholar
Lin T, Ho N, Jordan M (2019) On efficient optimal transport: an analysis of greedy and accelerated mirror descent algorithms. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol 97. PMLR, 09–15 Jun 2019, pp 3982–3991
Google Scholar
Lin T, Ho N, Jordan MI (2022) On the efficiency of entropic regularized algorithms for optimal transport. J Mach Learn Res 23(137):1–42
MathSciNet Google Scholar
Monge G (1781) Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris
Google Scholar
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152
Article MathSciNet MATH Google Scholar
Nesterov Y (2007) Dual extrapolation and its applications to solving variational inequalities and related problems. Math Program 109(2–3):319–344
Article MathSciNet MATH Google Scholar
Pele O, Werman M (2009) Fast and robust earth mover’s distances. In: 2009 IEEE 12th International Conference on Computer Vision, pp 460–467
Google Scholar
Peyré G, Cuturi M et al (2019) Computational optimal transport. Found Trends® Mach Learn 11(5–6):355–607
Article MATH Google Scholar
Quanrud K (2018) Approximating optimal transport with linear programs. ar**v preprint ar**v:1810.05957
Google Scholar
Rabin J, Peyré G, Delon J, Bernot M (2011) Wasserstein barycenter and its application to texture mixing. In: International Conference on Scale Space and Variational Methods in Computer Vision. Springer, pp 435–446
Google Scholar
Rogozin A, Dvurechensky P, Dvinkikh D, Beznosikov A, Kovalev D, Gasnikov A (2021) Decentralized distributed optimization for saddle point problems. ar**v preprint ar**v:2102.07758
Google Scholar
Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math Program 162(1–2):83–112
Article MathSciNet MATH Google Scholar
Sherman J (2017) Area-convexity, l_∞ regularization, and undirected multicommodity flow. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp 452–460
Google Scholar
Sinkhorn R (1974) Diagonal equivalence to matrices with prescribed row and column sums. II. Proc Am Math Soc 45:195–198
Article MathSciNet MATH Google Scholar
Solomon J, De Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L (2015) Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans Graphics (TOG) 34(4):66
Article MATH Google Scholar
Srivastava S, Cevher V, Dinh Q, Dunson D (2015) WASP: scalable bayes via barycenters of subset posteriors. In: Artificial intelligence and statistics. PMLR, pp 912–920
Google Scholar
Stonyakin FS, Dvinskikh D, Dvurechensky P, Kroshnin A, Kuznetsova O, Agafonov A, Gasnikov A, Tyurin A, Uribe CA, Pasechnyuk D, Artamonov S (2019) Gradient methods for problems with inexact model of the objective. In: Khachay M, Kochetov Y, Pardalos P (eds) Mathematical optimization theory and operations research. Springer International Publishing, Cham, pp 97–114
Chapter Google Scholar
Tarjan RE (1997) Dynamic trees as search trees via euler tours, applied to the network simplex algorithm. Math Program 78(2):169–177
Article MathSciNet MATH Google Scholar
Uribe CA, Lee S, Gasnikov A, Nedić A (2017) Optimal algorithms for distributed optimization. ar**v preprint ar**v:1712.00232
Google Scholar
Weed J (2018) An explicit analysis of the entropic penalty in linear programming. In: Bubeck S, Perchet V, Rigollet P (eds) Proceedings of the 31st Conference On Learning Theory. Proceedings of Machine Learning Research, vol 75. PMLR, 06–09 Jul 2018, pp 1841–1855
Google Scholar

Download references

Acknowledgements

The first section of the research is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, project No. 0714-2020-0005.

Author information

Authors and Affiliations

MIPT, IITP RAS, Moscow, Russia
Nazarii Tupitsa
WIAS, Berlin, Germany
Pavel Dvurechensky
HSE University, Moscow, Russia
Darina Dvinskikh
MIPT, Moscow, Russia
Alexander Gasnikov
IITP RAS, Moscow, Russia
Alexander Gasnikov
Caucasus Mathematical Center, Adyghe State University, Maikop, Russia
Alexander Gasnikov

Authors

Nazarii Tupitsa
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Dvurechensky
View author publications
You can also search for this author in PubMed Google Scholar
Darina Dvinskikh
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nazarii Tupitsa .

Editor information

Editors and Affiliations

Department of Industrial & Systems Engin, University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Departmentl of Industrial Engineering, University of Pittsburgh, Pittsburgh, PA, USA
Oleg A. Prokopyev

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Tupitsa, N., Dvurechensky, P., Dvinskikh, D., Gasnikov, A. (2023). Computational Optimal Transport. In: Pardalos, P.M., Prokopyev, O.A. (eds) Encyclopedia of Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-54621-2_861-1

Download citation

DOI: https://doi.org/10.1007/978-3-030-54621-2_861-1
Published: 11 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54621-2
Online ISBN: 978-3-030-54621-2
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics