Abstract
The decentralized optimization paradigm assumes that each term of a finite-sum objective is privately stored by the corresponding agent. Agents are only allowed to communicate with their neighbors in the communication graph. We consider the case when the agents additionally have local affine constraints and the communication graph can change over time. We provide the first linearly convergent decentralized algorithm for time-varying networks by generalizing the optimal decentralized algorithm ADOM to the case of affine constraints. We show that its rate of convergence is optimal for first-order methods by providing the lower bounds for the number of communications and oracle calls.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10287-023-00492-w/MediaObjects/10287_2023_492_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10287-023-00492-w/MediaObjects/10287_2023_492_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10287-023-00492-w/MediaObjects/10287_2023_492_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10287-023-00492-w/MediaObjects/10287_2023_492_Fig3_HTML.png)
Similar content being viewed by others
Notes
Source code: https://github.com/niquepolice/ADOM_affine_constraints.
References
Alghunaim S.A, Yuan K, Sayed A.H (2018) Dual coupled diffusion for distributed optimization with affine constraints. In: 2018 IEEE Conference on decision and control (CDC). IEEE, pp. 829–834
Aybat NS, Hamedani EY (2019) A distributed admm-like method for resource sharing over time-varying networks. SIAM J Optim 29(4):3036–3068
Carli R, Dotoli M (2019) Distributed alternating direction method of multipliers for linearly constrained optimization over a network. IEEE Control Syst Lett 4(1):247–252
Chang T-H (2016) A proximal dual consensus admm method for multi-agent constrained optimization. IEEE Trans Signal Process 64(14):3719–3734
Gong K, Zhang L (2023) Push-pull based distributed primal-dual algorithm for coupled constrained convex optimization in multi-agent networks. Available at SSRN 4109852
Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le Q.V, Wu Y, et al.: (2019) Gpipe: efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32
Hu T.-K, Gama F, Chen T, Wang Z, Ribeiro A, Sadler B.M (2021) Vgai: End-to-end learning of vision-based decentralized controllers for robot swarms. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 4900–4904
Kovalev D, Gasanov E, Gasnikov A, Richtarik P (2021) Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks. Advances in Neural Information Processing Systems, 34
Kovalev D, Shulgin E, Richtárik P, Rogozin A, Gasnikov A (2021) Adom: accelerated decentralized optimization method for time-varying networks. ar**v preprint ar**v:2102.09234
Li W, Tang R, Wang S, Zheng Z (2023) An optimal design method for communication topology of wireless sensor networks to implement fully distributed optimal control in iot-enabled smart buildings. Appl Energy 349:121539
Liang S, Yin G et al (2019) Distributed smooth convex optimization with coupled constraints. IEEE Trans Autom Control 65(1):347–353
Lian X, Zhang C, Zhang H, Hsieh C.-J, Zhang W, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: Advances in neural information processing systems, pp. 5330–5340
Molzahn DK, Dörfler F, Sandberg H, Low SH, Chakrabarti S, Baldick R, Lavaei J (2017) A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans Smart Grid 8(6):2941–2962
Necoara I, Nedelcu V, Dumitrache I (2011) Parallel and distributed optimization methods for estimation and control in networks. J Process Control 21(5):756–766
Nedic A, Ozdaglar A, Parrilo PA (2010) Constrained consensus and optimization in multi-agent networks. IEEE Trans Autom Control 55(4):922–938
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, Amsterdam
Rogozin A, Yarmoshik D, Kopylova K, Gasnikov, A (2022) Decentralized strongly-convex optimization with affine constraints: Primal and dual approaches. ar**v preprint ar**v:2207.04555
Salim A, Condat L, Kovalev D, Richtárik P (2022) An optimal algorithm for strongly convex minimization under affine constraints. In: International conference on artificial intelligence and statistics, pp. 4482–4498 . PMLR
Scaman K, Bach F, Bubeck S, Lee Y.T, Massoulié L (2017) Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: Proceedings of the 34th international conference on machine learning-Volume 70 JMLR. org, pp. 3027–3036
Scutari G, Sun Y (2019) Distributed nonconvex constrained optimization over time-varying digraphs. Math Program 176(1):497–544
Scutari G, Facchinei F, Lampariello L (2016) Parallel and distributed methods for constrained nonconvex optimization-part i: theory. IEEE Trans Signal Process 65(8):1929–1944
Silva-Rodriguez J, Li X (2023) Privacy-preserving decentralized energy management for networked microgrids via objective-based admm. ar**v preprint ar**v:2304.03649
Wang J, Hu G (2022) Distributed optimization with coupling constraints in multi-cluster networks based on dual proximal gradient method. ar**v preprint ar**v:2203.00956
Wu X, Wang H, Lu J (2022) Distributed optimization with coupling constraints. IEEE Trans Autom Control 8(3):1847–1854
Yarmoshik D, Rogozin A, Khamisov O, Dvurechensky P, Gasnikov A, et al (2022) Decentralized convex optimization under affine constraints for power systems control. ar**v preprint ar**v:2203.16686
Zhou H, Lange K (2013) A path algorithm for constrained estimation. J Comput Graph Stat 22(2):261–283
Zhu M, Martinez S (2011) On distributed convex optimization under inequality and equality constraints. IEEE Trans Autom Control 57(1):151–164
Zhu F, Ren Y, Kong F, Wu H, Liang S, Chen N, Xu W, Zhang F (2023) Swarm-lio: Decentralized swarm lidar-inertial odometry. In: 2023 IEEE International conference on robotics and automation (ICRA). IEEE, pp. 3254–3260
Acknowledgements
This work was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Moscow Institute of Physics and Technology dated November 1, 2021 No. 70-2021-00138.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Proof of theorem 1
Proof
Let the affine constraint in problem 3 be \(A_i x_i = 0\), with \(A_i=A=\sqrt{W' \otimes I_{d/m}}\). Then the affine constrained decentralized problem can be seen as two-level decentralized problem, as explained above.
Select sets of subnodes \(S_1\), \(S_2\) and \(S_3\) such that \(S_1\), \(S_2\) are at the distance \(\ge \Delta _A\) through the inner graph, and \(S_2\), \(S_3\) are at the distance \(\ge \Delta _W\) through the outer graph. Consider the following splitting of the Nesterov’s “bad” function
where
Then increasing the number of nonzero components in \(x^k\) on any subnode by three requires one local computation on a node in \(S_1\), \(\Delta _A\) inner communications a.k.a. multiplications by \(A^\top A\), one local computation on a node in \(S_2\), \(\Delta _W\) communications in the outer graph and one local computation on a node in \(S_3\). Denote by \(\kappa _g = \frac{\beta }{\alpha }\) the “global” condition number of \(\sum _{ij}^{mn}f_{ij}\). Since the solution is \(x^*_k = \left( \frac{\sqrt{\kappa }_g - 1}{\sqrt{\kappa }_g +1 }\right) ^k\), we have
where N is the number of iterations, each including 3 sequential computational steps, \(\Delta _A\) multiplications by \(A^TA\) and \(\Delta _W\) communications.
To finish the proof we need to construct communication graphs G, \(G'\), where distances between \(S_1\), \(S_2\) and \(S_2\), \(S_3\) are close to \(\Delta _A, \Delta _W\), and equip the graphs with gossip matrices with given condition numbers \(\chi _A, \chi _W\).
We should also choose \(\alpha\) and \(\beta\) such that \(f_i\) are \(L_F\)-smooth and \(\mu _F\)-strongly convex, and choose \(S_1\), \(S_2\), \(S_3\) so that \(\kappa _g\) is similar to \(\frac{L_F}{\mu _F}\).
Denote \(\gamma (M) = \sigma _{\min }^{+}(M) / \sigma _{\max }(M)\), \(\gamma _W = 1/\chi _W\), \(\gamma _A = 1/\chi _A\). Let \(\gamma _n=\frac{1-\cos \left( \frac{\pi }{n}\right) }{1+\cos \left( \frac{\pi }{n}\right) }\) be a decreasing sequence of positive numbers. Since \(\gamma _2=1\) and \(\lim _n \gamma _n=\) 0, there exists \(n \ge 2\) such that \(\gamma _n \ge \gamma >\gamma _{n+1}\) and \(m \ge 2\) such that \(\gamma _m \ge \gamma >\gamma _{m+1}\).
First, construct graph G. The cases \(n=2\) and \(n \ge 3\) are treated separately. If \(n \ge 3\), let G be the linear graph of size n ordered from node 1 to n, and weighted with \(w_{i, i+1}={\left\{ \begin{array}{ll}1-a, i=1\\ 1, \text {otherwise}\end{array}\right. }.\) Then set \(S_{2G}=\left\{ 1, \ldots , \lceil n / 32\rceil \right\}\) and \(\Delta _W=(1-1 / 16) n-1\), so that \(S_{3\,G} = \{\lceil n / 32\rceil + \lceil \Delta _W\rceil , \ldots , n\}\).
Take \(W_a\) as the Laplacian of the weighted graph G. A simple calculation gives that, if \(a=0\), \(\gamma \left( W_a\right) =\gamma _n\) and, if \(a=1\), the network is disconnected and \(\gamma \left( W_a\right) =0\). Thus, by continuity of the eigenvalues of a matrix, there exists a value \(a \in [0,1]\) such that \(\gamma \left( W_a\right) =\gamma _W\). Finally, by definition of n, one has \(\gamma _W>\gamma _{n+1} \ge \frac{2}{(n+1)^2}\), and \(\Delta _W \ge \frac{15}{16}\left( \sqrt{\frac{2}{\gamma _W}}-1\right) -1 \ge \frac{1}{5 \sqrt{\gamma _W}}\) when \(\gamma _W \le \gamma _3=\frac{1}{3}\).
For the case \(n=2\), we consider the totally connected network of 3 nodes, reweight only the edge \(\left( 1, 3\right)\) by \(a \in [0,1]\), and let \(W_a\) be its Laplacian matrix. If \(a=1\), then the network is totally connected and \(\gamma \left( W_a\right) =1\). If, on the contrary, \(a=0\), then the network is a linear graph and \(\gamma \left( W_a\right) =\gamma _3\). Thus, there exists a value \(a \in [0,1]\) such that \(\gamma \left( W_a\right) =\gamma\). Set \(S_{2G}=\left\{ 1\right\} , S_{3G}=\left\{ 2\right\}\), then \(\Delta _W=1 \ge \frac{1}{\sqrt{3 \gamma _W}}\).
Second, do the same for graph \(G'\), obtaining m, \(S_{1\,G'}, S_{2\,G'}\) and \(\Delta _A \ge \frac{1}{5\sqrt{\gamma _A}}\).
Define \(S_1 = S_{2\,G} \times S_{1\,G'}\), \(S_2 = S_{2\,G} \times S_{2\,G'}\) and \(S_3 = S_{3G} \times S_{2G'}\), see Fig. 4. In all cases we have \(|S_k| \ge |S_2| \ge \lceil \frac{n}{32}\rceil \lceil \frac{m}{32} \rceil\) for \(k\in \{1,3\}\).
Because \(\mu _F = \frac{\alpha }{n}\), we set \(\alpha = \mu _F n\). Since \(0 \preceq M_k \preceq 2I\) for \(k\in \{1,2,3\}\), \(L_F = \frac{\alpha }{n} + \frac{(\beta -\alpha )m}{2|S_2|}\), thus set \(\beta =2|S_2|(L_F-\mu _F)/m + \mu _F n\) to make all \(f_i\) be \(L_F\)-smooth and \(\mu _F\)-strongly convex. Then \(\kappa _g = \frac{\beta }{\alpha } = 1 + \frac{2|S_2|(L_F - \mu _F)}{\mu _F mn} \ge \frac{L_F}{512 \mu _F}\). Combining this with (13) and the inequalities between \(\Delta _A, \gamma _A\) and \(\Delta _W, \gamma _W\) we conclude the proof. \(\square\)
1.2 Proof of theorem 2
Proof
As in the proof of Theorem 1 we set the affine constraint in problem 3 to be \(A_i x_i = 0\), with \(A_i=A=\sqrt{W' \otimes I_{d/m}}\), where \(W'\) is a gossip matrix of some inner communication graph \(G'\). Let the sequence of outer communication graphs G(k) be the same as in the proof of Theorem 1 in Kovalev et al. (2021b): \(n= 3 \left\lfloor {\chi _W/3}\right\rfloor\) nodes are split into three disjoint sets \(V_1, V_2, V_3\) of equal size, and \(G(k) = (V, E(k))\) are star graphs with the center nodes cycling through \(V_2\). Choose the inner communication graph \(G' = (V', E')\) as in the proof of Theorem 1. Use Nesterov’s function splitting given by (12), choose \(S_{1G'}\) and \(S_{2G'}\) as in the proof of Theorem 1. Set \(S_1 = V_1 \times S_{1\,G'}\), \(S_2 = V_1 \times S_{2\,G'}\) and \(S_3 = V_3 \times E'\). Setting W(k) to be the Laplacian of the star graph G(k) we have \(\frac{\lambda _{\max }(W(k))}{\lambda _{\min }(W(k))} =n \le \chi _W\). Also (Lemma 2 Kovalev et al. (2021b) and proof of Theorem 1), increasing the number of nonzero components of \(x_k\) on any subnode requires local computation on a node in \(S_1\), \(\Theta \left( \sqrt{\chi _A} \right)\) communications in the inner graph \(G'\) (i.e. multiplications by \(A^\top A\)), one local computation on a node in \(S_2\), \(\Theta \left( \chi _W \right)\) communications in the outer graph G and one local computation on a node in \(S_3\). Same reasoning as in the proof of the previous theorem gives \(\kappa _g = \Theta \left( \frac{L_F}{\mu _F} \right)\), then using (13) we conclude the proof. \(\square\)
1.3 Auxiliary lemmas for theorem 3
Lemma 1
For \(\theta \le \frac{1}{L_{H}\lambda _{\max }}\) we have the inequality
Proof
We start with \(L_{H}\)-smoothness of H on \(\text {Im}\textbf{P}\textbf{B}\):
Using line 7 of Algorithm 1 together with (10) we get
Using condition \(\theta \le \frac{1}{L_{H}\lambda _{\max }}\) we get
\(\square\)
Lemma 2
For \(\sigma \le \frac{1}{\lambda _{\max }}\) we have the inequality
Proof
Using \(\textbf{P}= \textbf{P}^2\) and \(\textbf{P}\textbf{W}(k) = \textbf{W}(k) \textbf{P}= \textbf{W}(k)\)
together with lines 4 and 5 of Algorithm 1 we obtain
Using (10) we obtain
Using condition \(\sigma \le \frac{1}{\lambda _{\max }}\) we get
Using Young’s inequality we get
Rearranging concludes the proof. \(\square\)
Lemma 3
Let
Define the Lyapunov function
where \(\hat{\textbf{z}}^k\) is defined as
Then the following inequality holds:
Proof
Using (22) together with lines 5 and 6 of Algorithm 1, we get
From line 4 of Algorithm 1 and \(\textbf{P}\textbf{W}(k) = \textbf{W}(k)\) it follows that \(\textbf{P}\Delta ^k = \Delta ^k\), which implies
Hence,
Using inequality \(\left\| a + b \right\| ^2 \le (1+\gamma ) \left\| a \right\| ^2 + (1 + \frac{1}{\gamma }) \left\| b \right\| ^2,~\gamma > 0\) with \(\gamma = \frac{\eta \alpha }{1- \eta \alpha }\) we get
One can observe, that \(\textbf{z}^k,\textbf{z}_g^k,\textbf{z}^* \in \text {Im}\textbf{P}\textbf{B}\). Hence,
Using line 3 of Algorithm 1 we get
Using convexity and \(\mu _{H}\)-strong convexity of \(H(\textbf{z})\) on \(\text {Im}\textbf{P}\textbf{B}\) we get
Using \(\alpha\) defined by (16) we get
Since \(H(\textbf{z}_g^k) \ge H(\textbf{z}^*)\), we get
Using (14) and \(\theta\) defined by (18) we get
Using Young’s inequality we get
Using (17) and (16), that imply \(\eta \alpha \le \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\), we obtain
Using (15) and \(\sigma\) defined by (19) we get
Using \(\eta\) defined by (17) and \(\tau\) defined by (20) we get
Rearranging and using (21) concludes the proof. \(\square\)
1.4 Proof of theorem 3
Proof
From derivation of the reformulated problem and Demyanov-Danskin theorem it follows that \(\nabla F^*(\textbf{B}^\top \textbf{z}^*) = \textbf{x}^*\). Therefore \(\nabla H(\textbf{z}^*) = (0_d, \textbf{x}^*)^\top\). Using \(L_{H}\)-smoothness of H on \(\text {Im}\textbf{P}\textbf{B}\) we get
Using line 3 of Algorithm 1 and inequality \(\left\| a + b \right\| ^2 \le (1+\gamma ) \left\| a \right\| ^2 + (1 + \frac{1}{\gamma }) \left\| b \right\| ^2,~\gamma > 0\) with \(\gamma = \frac{1}{\tau } - 1\) we get we get
Using \(\mu _{H}\)-strong convexity of H on \(\text {Im}\textbf{P}\textbf{B}\) we get
Using (22) we get
Using the definition of \(\Psi ^k\) (21) and denoting \(C = \Psi ^0 \max \left\{ 2\tau L_{H}^2,\frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}} \right\}\) we get
Applying Lemma 3 concludes the proof. \(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yarmoshik, D., Rogozin, A. & Gasnikov, A. Decentralized optimization with affine constraints over time-varying networks. Comput Manag Sci 21, 10 (2024). https://doi.org/10.1007/s10287-023-00492-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10287-023-00492-w