Abstract
The optimal transport (OT) problem is a classical optimization problem having the form of linear programming. Machine learning applications put forward new computational challenges in their solution. In particular, the OT problem defines a distance between real-world objects such as images, videos, texts, etc., modeled as probability distributions. In this case, the large dimension of the corresponding optimization problem does not allow applying classical methods such as network simplex or interior-point methods. This challenge was overcome by introducing entropic regularization and using the efficient Sinkhorn algorithm to solve the regularized problem. A flexible alternative is the accelerated primal–dual gradient method, which can use any strongly convex regularization. These algorithms and other related problems such as approximating the Wasserstein barycenter together with efficient algorithms for its solution, including decentralized distributed algorithms, are discussed.
References
Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. SIAM J Math Anal 43(2):904–924
Allen-Zhu Z, Li Y, Oliveira R, Wigderson A (2017) Much faster algorithms for matrix scaling. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp 890–901. https://arxiv.org/abs/1704.02315
Altschuler J, Bach F, Rudi A, Weed J (2018) Approximating the quadratic transportation metric in near-linear time. ar**v preprint ar**v:1810.10046
Altschuler J, Weed J, Rigollet P (2017) Near-linear time approxfimation algorithms for optimal transport via Sinkhorn iteration. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. Curran Associates, Inc., pp 1961–1971. https://arxiv.org/abs/1705.09634
Ambrosio L, Brué E, Semola D (2021) Lectures on Optimal Transport. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-030-72162-6
Benamou J-D, Carlier G, Cuturi M, Nenna L, Peyré G (2015) Iterative Bregman projections for regularized transportation problems. SIAM J Sci Comput 37(2):A1111–A1138
Bigot J, Cazelles E, Papadakis N (2019) Data-driven regularization of Wasserstein barycenters with an application to multivariate density registration. Inf Inference: J IMA 8(4):719–755
Blanchet J, Jambulapati A, Kent C, Sidford A (2018) Towards optimal running times for optimal transport. ar**v preprint ar**v:1810.07717
Blondel M, Seguy V, Rolet A (2018) Smooth and sparse optimal transport. In: International Conference on Artificial Intelligence and Statistics. PMLR, pp 880–889
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, NY
Chambolle A, Contreras JP (2022) Accelerated Bregman primal-dual methods applied to optimal transport and Wasserstein barycenter problems
Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J Math Imaging Vision 40(1):120–145
Cohen MB, Madry A, Tsipras D, Vladu A (2017) Matrix scaling and balancing via box constrained Newton’s method and interior point methods. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp 902–913. https://arxiv.org/abs/1704.02310
Cominetti R, San Martin J (1994) Asymptotic analysis of the exponential penalty trajectory in linear programming. Math Program 67:169–187
Cuturi M (2013) Sinkhorn distances: lightspeed computation of optimal transport. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., New York, pp 2292–2300
Cuturi M, Peyré G (2016) A smoothed dual approach for variational Wasserstein problems. SIAM J Imaging Sci 9(1):320–343
Del Barrio E, Cuesta-Albertos JA, Matrán C, Mayo-Íscar A (2019) Robust clustering tools based on optimal transportation. Stat Comput 29(1):139–160
Dvinskikh D, Gorbunov E, Gasnikov A, Dvurechensky P, Uribe CA (2019) On primal and dual approaches for distributed stochastic convex optimization over networks. In: 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, pp 7435–7440
Dvinskikh D, Tiapkin D (2021) Improved complexity bounds in Wasserstein barycenter problem. In: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics. PMLR, pp 1738–1746
Dvurechenskii P, Dvinskikh D, Gasnikov A, Uribe C, Nedich A (2018) Decentralize and randomize: faster algorithm for Wasserstein barycenters. Adv Neural Inf Process Syst 31:10760–10770
Dvurechensky P, Gasnikov A, Kroshnin A (2018) Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Jennifer D, Andreas K (eds) Proceedings of the 35th International Conference on Machine Learning, vol 80, pp 1367–1376. ar**v:1802.04367
Fang S-C, Rajasekera J, Tsao H-S (1997) Entropy optimization and mathematical programming. Kluwer’s International Series. https://epubs.siam.org/doi/10.1137/130929886
Ferradans S, Papadakis N, Peyré G, Aujol J-F (2014) Regularized discrete optimal transport. SIAM J Imaging Sci 7(3):1853–1882
Franklin J, Lorenz J (1989) On the scaling of multidimensional matrices. Linear Algebra Appl 114:717–735. Special Issue Dedicated to Alan J. Hoffman
Fréchet M (1948) Les éléments aléatoires de nature quelconque dans un espace distancié. Ann l’inst Henri Poincaré 10(4):215–310
Gabow HN, Tarjan RE (1991) Faster scaling algorithms for general graph matching problems. J ACM (JACM) 38(4):815–853
Gasnikov AV, Gasnikova EV, Nesterov YE, Chernov AV (2016) Efficient numerical methods for entropy-linear programming problems. Comput Math Math Phys 56(4):514–524
Gasnikov A, Dvurechensky P, Kamzolov D, Nesterov Y, Spokoiny V, Stetsyuk P, Suvorikova A, Chernov A (2015) Universal method with inexact oracle and its applications for searching equilibriums in multistage transport problems. ar**v preprint ar**v:1506.00292
Genevay A, Cuturi M, Peyré G, Bach F (2016) Stochastic optimization for large-scale optimal transport. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates, Inc., New York, pp 3440–3448
Gorbunov E, Rogozin A, Beznosikov A, Dvinskikh D, Gasnikov A (2022) Recent theoretical advances in decentralized distributed convex optimization. In: High-dimensional optimization and probability. Springer International Publishing, Cham, pp 253–325. https://springer.longhoe.net/chapter/10. 1007/978-3-031-00832-0_8#copyright-information
Gramfort A, Peyré G, Cuturi M (2015) Fast optimal transport averaging of neuroimaging data. In: International Conference on Information Processing in Medical Imaging. Springer, pp 261–272
Guminov S, Dvurechensky P, Gasnikov A (2019) Accelerated alternating minimization. ar**v preprint ar**v:1906.03622
Guminov S, Dvurechensky P, Tupitsa N, Gasnikov A (2021) On a combination of alternating minimization and Nesterov’s momentum. In: International Conference on Machine Learning. PMLR, pp 3886–3898
Guo W, Ho N, Jordan M (2020) Fast algorithms for computational optimal transport and Wasserstein barycenter. In: Chiappa S, Calandra R (eds) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol 108. PMLR, 26–28 Aug 2020, pp 2088–2097
Heinemann F, Munk A, Zemel Y (2020) Randomised Wasserstein barycenter computation: resampling with statistical guarantees. ar**v preprint ar**v:2012.06397
Hopcroft JE, Karp RM (1973) An nˆ5/2 algorithm for maximum matchings in bipartite graphs. SIAM J Comput 2(4):225–231
Jambulapati A, Sidford A, Tian K (2019) A direct tilde \(\widetilde {O}(1/\varepsilon )\) iteration parallel algorithm for optimal transport. In: Advances in neural information processing systems, pp 11359–11370
Kalantari B, Lari I, Ricca F, Simeone B (2008) On the complexity of general matrix scaling and entropy minimization via the RAS algorithm. Math Program 112(2):371–401
Kantorovich L (1942) On the translocation of masses. (Doklady) Acad Sci URSS (NS) 37:199–201
Kantorovich LV (1960) Mathematical methods of organizing and planning production. Manag Sci 6(4):366–422
Knight PA (2008) The Sinkhorn–Knopp algorithm: convergence and applications. SIAM J Matrix Anal Appl 30(1):261–275
Kroshnin A, Dvinskikh D, Tupitsa N, Dvurechensky P, Gasnikov A, Uribe C (2019) On the complexity of approximating Wasserstein barycenters. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning, vol 97, pp 3530–3540. ar**v:1901.08686
Le Gouic T, Loubes J-M (2017) Existence and consistency of Wasserstein barycenters. Probab Theory Relat Fields 168(3–4):901–917
Lee YT, Sidford A (2014) Path finding methods for linear programming: solving linear programs in \(\tilde {O}(\sqrt {\text{rank}})\) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp 424–433
Léonard C (2013) A survey of the Schr∖” odinger problem and some of its connections with optimal transport. ar**v preprint ar**v:1308.0215
Lin T, Ho N, Chen X, Cuturi M, Jordan MI (2020) Fixed-support Wasserstein barycenters: computational hardness and fast algorithm. Adv Neural Inf Process Syst 33:5368–5380
Lin T, Ho N, Jordan M (2019) On efficient optimal transport: an analysis of greedy and accelerated mirror descent algorithms. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol 97. PMLR, 09–15 Jun 2019, pp 3982–3991
Lin T, Ho N, Jordan MI (2022) On the efficiency of entropic regularized algorithms for optimal transport. J Mach Learn Res 23(137):1–42
Monge G (1781) Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152
Nesterov Y (2007) Dual extrapolation and its applications to solving variational inequalities and related problems. Math Program 109(2–3):319–344
Pele O, Werman M (2009) Fast and robust earth mover’s distances. In: 2009 IEEE 12th International Conference on Computer Vision, pp 460–467
Peyré G, Cuturi M et al (2019) Computational optimal transport. Found Trends® Mach Learn 11(5–6):355–607
Quanrud K (2018) Approximating optimal transport with linear programs. ar**v preprint ar**v:1810.05957
Rabin J, Peyré G, Delon J, Bernot M (2011) Wasserstein barycenter and its application to texture mixing. In: International Conference on Scale Space and Variational Methods in Computer Vision. Springer, pp 435–446
Rogozin A, Dvurechensky P, Dvinkikh D, Beznosikov A, Kovalev D, Gasnikov A (2021) Decentralized distributed optimization for saddle point problems. ar**v preprint ar**v:2102.07758
Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math Program 162(1–2):83–112
Sherman J (2017) Area-convexity, l∞ regularization, and undirected multicommodity flow. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp 452–460
Sinkhorn R (1974) Diagonal equivalence to matrices with prescribed row and column sums. II. Proc Am Math Soc 45:195–198
Solomon J, De Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L (2015) Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans Graphics (TOG) 34(4):66
Srivastava S, Cevher V, Dinh Q, Dunson D (2015) WASP: scalable bayes via barycenters of subset posteriors. In: Artificial intelligence and statistics. PMLR, pp 912–920
Stonyakin FS, Dvinskikh D, Dvurechensky P, Kroshnin A, Kuznetsova O, Agafonov A, Gasnikov A, Tyurin A, Uribe CA, Pasechnyuk D, Artamonov S (2019) Gradient methods for problems with inexact model of the objective. In: Khachay M, Kochetov Y, Pardalos P (eds) Mathematical optimization theory and operations research. Springer International Publishing, Cham, pp 97–114
Tarjan RE (1997) Dynamic trees as search trees via euler tours, applied to the network simplex algorithm. Math Program 78(2):169–177
Uribe CA, Lee S, Gasnikov A, Nedić A (2017) Optimal algorithms for distributed optimization. ar**v preprint ar**v:1712.00232
Weed J (2018) An explicit analysis of the entropic penalty in linear programming. In: Bubeck S, Perchet V, Rigollet P (eds) Proceedings of the 31st Conference On Learning Theory. Proceedings of Machine Learning Research, vol 75. PMLR, 06–09 Jul 2018, pp 1841–1855
Acknowledgements
The first section of the research is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, project No. 0714-2020-0005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this entry
Cite this entry
Tupitsa, N., Dvurechensky, P., Dvinskikh, D., Gasnikov, A. (2023). Computational Optimal Transport. In: Pardalos, P.M., Prokopyev, O.A. (eds) Encyclopedia of Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-54621-2_861-1
Download citation
DOI: https://doi.org/10.1007/978-3-030-54621-2_861-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54621-2
Online ISBN: 978-3-030-54621-2
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering