Log in

Decentralized optimization with affine constraints over time-varying networks

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

The decentralized optimization paradigm assumes that each term of a finite-sum objective is privately stored by the corresponding agent. Agents are only allowed to communicate with their neighbors in the communication graph. We consider the case when the agents additionally have local affine constraints and the communication graph can change over time. We provide the first linearly convergent decentralized algorithm for time-varying networks by generalizing the optimal decentralized algorithm ADOM to the case of affine constraints. We show that its rate of convergence is optimal for first-order methods by providing the lower bounds for the number of communications and oracle calls.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Source code: https://github.com/niquepolice/ADOM_affine_constraints.

References

  • Alghunaim S.A, Yuan K, Sayed A.H (2018) Dual coupled diffusion for distributed optimization with affine constraints. In: 2018 IEEE Conference on decision and control (CDC). IEEE, pp. 829–834

  • Aybat NS, Hamedani EY (2019) A distributed admm-like method for resource sharing over time-varying networks. SIAM J Optim 29(4):3036–3068

    Article  Google Scholar 

  • Carli R, Dotoli M (2019) Distributed alternating direction method of multipliers for linearly constrained optimization over a network. IEEE Control Syst Lett 4(1):247–252

    Article  Google Scholar 

  • Chang T-H (2016) A proximal dual consensus admm method for multi-agent constrained optimization. IEEE Trans Signal Process 64(14):3719–3734

    Article  Google Scholar 

  • Gong K, Zhang L (2023) Push-pull based distributed primal-dual algorithm for coupled constrained convex optimization in multi-agent networks. Available at SSRN 4109852

  • Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le Q.V, Wu Y, et al.: (2019) Gpipe: efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32

  • Hu T.-K, Gama F, Chen T, Wang Z, Ribeiro A, Sadler B.M (2021) Vgai: End-to-end learning of vision-based decentralized controllers for robot swarms. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 4900–4904

  • Kovalev D, Gasanov E, Gasnikov A, Richtarik P (2021) Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks. Advances in Neural Information Processing Systems, 34

  • Kovalev D, Shulgin E, Richtárik P, Rogozin A, Gasnikov A (2021) Adom: accelerated decentralized optimization method for time-varying networks. ar**v preprint ar**v:2102.09234

  • Li W, Tang R, Wang S, Zheng Z (2023) An optimal design method for communication topology of wireless sensor networks to implement fully distributed optimal control in iot-enabled smart buildings. Appl Energy 349:121539

    Article  Google Scholar 

  • Liang S, Yin G et al (2019) Distributed smooth convex optimization with coupled constraints. IEEE Trans Autom Control 65(1):347–353

    Article  Google Scholar 

  • Lian X, Zhang C, Zhang H, Hsieh C.-J, Zhang W, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: Advances in neural information processing systems, pp. 5330–5340

  • Molzahn DK, Dörfler F, Sandberg H, Low SH, Chakrabarti S, Baldick R, Lavaei J (2017) A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans Smart Grid 8(6):2941–2962

    Article  Google Scholar 

  • Necoara I, Nedelcu V, Dumitrache I (2011) Parallel and distributed optimization methods for estimation and control in networks. J Process Control 21(5):756–766

    Article  Google Scholar 

  • Nedic A, Ozdaglar A, Parrilo PA (2010) Constrained consensus and optimization in multi-agent networks. IEEE Trans Autom Control 55(4):922–938

    Article  Google Scholar 

  • Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, Amsterdam

    Book  Google Scholar 

  • Rogozin A, Yarmoshik D, Kopylova K, Gasnikov, A (2022) Decentralized strongly-convex optimization with affine constraints: Primal and dual approaches. ar**v preprint ar**v:2207.04555

  • Salim A, Condat L, Kovalev D, Richtárik P (2022) An optimal algorithm for strongly convex minimization under affine constraints. In: International conference on artificial intelligence and statistics, pp. 4482–4498 . PMLR

  • Scaman K, Bach F, Bubeck S, Lee Y.T, Massoulié L (2017) Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: Proceedings of the 34th international conference on machine learning-Volume 70 JMLR. org, pp. 3027–3036

  • Scutari G, Sun Y (2019) Distributed nonconvex constrained optimization over time-varying digraphs. Math Program 176(1):497–544

    Article  Google Scholar 

  • Scutari G, Facchinei F, Lampariello L (2016) Parallel and distributed methods for constrained nonconvex optimization-part i: theory. IEEE Trans Signal Process 65(8):1929–1944

    Article  Google Scholar 

  • Silva-Rodriguez J, Li X (2023) Privacy-preserving decentralized energy management for networked microgrids via objective-based admm. ar**v preprint ar**v:2304.03649

  • Wang J, Hu G (2022) Distributed optimization with coupling constraints in multi-cluster networks based on dual proximal gradient method. ar**v preprint ar**v:2203.00956

  • Wu X, Wang H, Lu J (2022) Distributed optimization with coupling constraints. IEEE Trans Autom Control 8(3):1847–1854

    Article  Google Scholar 

  • Yarmoshik D, Rogozin A, Khamisov O, Dvurechensky P, Gasnikov A, et al (2022) Decentralized convex optimization under affine constraints for power systems control. ar**v preprint ar**v:2203.16686

  • Zhou H, Lange K (2013) A path algorithm for constrained estimation. J Comput Graph Stat 22(2):261–283

    Article  Google Scholar 

  • Zhu M, Martinez S (2011) On distributed convex optimization under inequality and equality constraints. IEEE Trans Autom Control 57(1):151–164

    Google Scholar 

  • Zhu F, Ren Y, Kong F, Wu H, Liang S, Chen N, Xu W, Zhang F (2023) Swarm-lio: Decentralized swarm lidar-inertial odometry. In: 2023 IEEE International conference on robotics and automation (ICRA). IEEE, pp. 3254–3260

Download references

Acknowledgements

This work was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Moscow Institute of Physics and Technology dated November 1, 2021 No. 70-2021-00138.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Demyan Yarmoshik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of theorem 1

Proof

Let the affine constraint in problem 3 be \(A_i x_i = 0\), with \(A_i=A=\sqrt{W' \otimes I_{d/m}}\). Then the affine constrained decentralized problem can be seen as two-level decentralized problem, as explained above.

Select sets of subnodes \(S_1\), \(S_2\) and \(S_3\) such that \(S_1\), \(S_2\) are at the distance \(\ge \Delta _A\) through the inner graph, and \(S_2\), \(S_3\) are at the distance \(\ge \Delta _W\) through the outer graph. Consider the following splitting of the Nesterov’s “bad” function

$$\begin{aligned} f_{ij}(x) = \frac{\alpha }{2mn} \left\| x \right\| ^2 + \frac{\beta - \alpha }{8} \cdot {\left\{ \begin{array}{ll} \frac{1}{|S_1|}\left( x^\top M_1 x - 2x_{[1]} \right) ,~ (i,j) \in S_1,\\ \frac{1}{|S_2|}x^\top M_2 x,~ (i,j) \in S_2,\\ \frac{1}{|S_3|}x^\top M_3 x,~ (i,k) \in S_3,\\ 0,~ \text {otherwise,} \end{array}\right. } \end{aligned}$$
(12)

where

$$\begin{aligned} M_1&= \text {diag}{(1, 0, M_0, 0, M_0, \ldots )},\\ M_2&= \text {diag}{(M_0, 0, M_0, 0, \ldots )},\\ M_3&= \text {diag}{(0, M_0, 0, M_0, \ldots )},\\ M_0&= \begin{pmatrix} 1 &{}-1\\ -1&{} 1 \end{pmatrix}. \end{aligned}$$

Then increasing the number of nonzero components in \(x^k\) on any subnode by three requires one local computation on a node in \(S_1\), \(\Delta _A\) inner communications a.k.a. multiplications by \(A^\top A\), one local computation on a node in \(S_2\), \(\Delta _W\) communications in the outer graph and one local computation on a node in \(S_3\). Denote by \(\kappa _g = \frac{\beta }{\alpha }\) the “global” condition number of \(\sum _{ij}^{mn}f_{ij}\). Since the solution is \(x^*_k = \left( \frac{\sqrt{\kappa }_g - 1}{\sqrt{\kappa }_g +1 }\right) ^k\), we have

$$\begin{aligned} \left\| x^N - x^* \right\| ^2 \ge \sum _{k=N+2}^\infty (x^*_k)^2 \ge \left( \frac{\sqrt{\kappa }_g - 1}{\sqrt{\kappa }_g +1 }\right) ^{N+2}, \end{aligned}$$
(13)

where N is the number of iterations, each including 3 sequential computational steps, \(\Delta _A\) multiplications by \(A^TA\) and \(\Delta _W\) communications.

To finish the proof we need to construct communication graphs G, \(G'\), where distances between \(S_1\), \(S_2\) and \(S_2\), \(S_3\) are close to \(\Delta _A, \Delta _W\), and equip the graphs with gossip matrices with given condition numbers \(\chi _A, \chi _W\).

We should also choose \(\alpha\) and \(\beta\) such that \(f_i\) are \(L_F\)-smooth and \(\mu _F\)-strongly convex, and choose \(S_1\), \(S_2\), \(S_3\) so that \(\kappa _g\) is similar to \(\frac{L_F}{\mu _F}\).

Denote \(\gamma (M) = \sigma _{\min }^{+}(M) / \sigma _{\max }(M)\), \(\gamma _W = 1/\chi _W\), \(\gamma _A = 1/\chi _A\). Let \(\gamma _n=\frac{1-\cos \left( \frac{\pi }{n}\right) }{1+\cos \left( \frac{\pi }{n}\right) }\) be a decreasing sequence of positive numbers. Since \(\gamma _2=1\) and \(\lim _n \gamma _n=\) 0, there exists \(n \ge 2\) such that \(\gamma _n \ge \gamma >\gamma _{n+1}\) and \(m \ge 2\) such that \(\gamma _m \ge \gamma >\gamma _{m+1}\).

First, construct graph G. The cases \(n=2\) and \(n \ge 3\) are treated separately. If \(n \ge 3\), let G be the linear graph of size n ordered from node 1 to n, and weighted with \(w_{i, i+1}={\left\{ \begin{array}{ll}1-a, i=1\\ 1, \text {otherwise}\end{array}\right. }.\) Then set \(S_{2G}=\left\{ 1, \ldots , \lceil n / 32\rceil \right\}\) and \(\Delta _W=(1-1 / 16) n-1\), so that \(S_{3\,G} = \{\lceil n / 32\rceil + \lceil \Delta _W\rceil , \ldots , n\}\).

Take \(W_a\) as the Laplacian of the weighted graph G. A simple calculation gives that, if \(a=0\), \(\gamma \left( W_a\right) =\gamma _n\) and, if \(a=1\), the network is disconnected and \(\gamma \left( W_a\right) =0\). Thus, by continuity of the eigenvalues of a matrix, there exists a value \(a \in [0,1]\) such that \(\gamma \left( W_a\right) =\gamma _W\). Finally, by definition of n, one has \(\gamma _W>\gamma _{n+1} \ge \frac{2}{(n+1)^2}\), and \(\Delta _W \ge \frac{15}{16}\left( \sqrt{\frac{2}{\gamma _W}}-1\right) -1 \ge \frac{1}{5 \sqrt{\gamma _W}}\) when \(\gamma _W \le \gamma _3=\frac{1}{3}\).

For the case \(n=2\), we consider the totally connected network of 3 nodes, reweight only the edge \(\left( 1, 3\right)\) by \(a \in [0,1]\), and let \(W_a\) be its Laplacian matrix. If \(a=1\), then the network is totally connected and \(\gamma \left( W_a\right) =1\). If, on the contrary, \(a=0\), then the network is a linear graph and \(\gamma \left( W_a\right) =\gamma _3\). Thus, there exists a value \(a \in [0,1]\) such that \(\gamma \left( W_a\right) =\gamma\). Set \(S_{2G}=\left\{ 1\right\} , S_{3G}=\left\{ 2\right\}\), then \(\Delta _W=1 \ge \frac{1}{\sqrt{3 \gamma _W}}\).

Second, do the same for graph \(G'\), obtaining m, \(S_{1\,G'}, S_{2\,G'}\) and \(\Delta _A \ge \frac{1}{5\sqrt{\gamma _A}}\).

Define \(S_1 = S_{2\,G} \times S_{1\,G'}\), \(S_2 = S_{2\,G} \times S_{2\,G'}\) and \(S_3 = S_{3G} \times S_{2G'}\), see Fig. 4. In all cases we have \(|S_k| \ge |S_2| \ge \lceil \frac{n}{32}\rceil \lceil \frac{m}{32} \rceil\) for \(k\in \{1,3\}\).

Because \(\mu _F = \frac{\alpha }{n}\), we set \(\alpha = \mu _F n\). Since \(0 \preceq M_k \preceq 2I\) for \(k\in \{1,2,3\}\), \(L_F = \frac{\alpha }{n} + \frac{(\beta -\alpha )m}{2|S_2|}\), thus set \(\beta =2|S_2|(L_F-\mu _F)/m + \mu _F n\) to make all \(f_i\) be \(L_F\)-smooth and \(\mu _F\)-strongly convex. Then \(\kappa _g = \frac{\beta }{\alpha } = 1 + \frac{2|S_2|(L_F - \mu _F)}{\mu _F mn} \ge \frac{L_F}{512 \mu _F}\). Combining this with (13) and the inequalities between \(\Delta _A, \gamma _A\) and \(\Delta _W, \gamma _W\) we conclude the proof. \(\square\)

Fig. 4
figure 4

Splitting of Nesterov’s “bad” function in the main case \(m,n \ge 3\), namely \(m=20, n=80\). The vertical dimension corresponds to the inner graph \(G'\), and the horizontal dimension — to the outer graph G

1.2 Proof of theorem 2

Proof

As in the proof of Theorem 1 we set the affine constraint in problem 3 to be \(A_i x_i = 0\), with \(A_i=A=\sqrt{W' \otimes I_{d/m}}\), where \(W'\) is a gossip matrix of some inner communication graph \(G'\). Let the sequence of outer communication graphs G(k) be the same as in the proof of Theorem 1 in Kovalev et al. (2021b): \(n= 3 \left\lfloor {\chi _W/3}\right\rfloor\) nodes are split into three disjoint sets \(V_1, V_2, V_3\) of equal size, and \(G(k) = (V, E(k))\) are star graphs with the center nodes cycling through \(V_2\). Choose the inner communication graph \(G' = (V', E')\) as in the proof of Theorem 1. Use Nesterov’s function splitting given by (12), choose \(S_{1G'}\) and \(S_{2G'}\) as in the proof of Theorem 1. Set \(S_1 = V_1 \times S_{1\,G'}\), \(S_2 = V_1 \times S_{2\,G'}\) and \(S_3 = V_3 \times E'\). Setting W(k) to be the Laplacian of the star graph G(k) we have \(\frac{\lambda _{\max }(W(k))}{\lambda _{\min }(W(k))} =n \le \chi _W\). Also (Lemma 2 Kovalev et al. (2021b) and proof of Theorem 1), increasing the number of nonzero components of \(x_k\) on any subnode requires local computation on a node in \(S_1\), \(\Theta \left( \sqrt{\chi _A} \right)\) communications in the inner graph \(G'\) (i.e. multiplications by \(A^\top A\)), one local computation on a node in \(S_2\), \(\Theta \left( \chi _W \right)\) communications in the outer graph G and one local computation on a node in \(S_3\). Same reasoning as in the proof of the previous theorem gives \(\kappa _g = \Theta \left( \frac{L_F}{\mu _F} \right)\), then using (13) we conclude the proof. \(\square\)

1.3 Auxiliary lemmas for theorem 3

Lemma 1

For \(\theta \le \frac{1}{L_{H}\lambda _{\max }}\) we have the inequality

$$\begin{aligned} H(\textbf{z}_f^{k+1}) \le H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}}. \end{aligned}$$
(14)

Proof

We start with \(L_{H}\)-smoothness of H on \(\text {Im}\textbf{P}\textbf{B}\):

$$\begin{aligned} H(\textbf{z}_f^{k+1}) \le H(\textbf{z}_g^k) + \langle \nabla H(\textbf{z}_g^k), \textbf{z}_f^{k+1} - \textbf{z}_g^k\rangle + \frac{L_{H}}{2}\left\| \textbf{z}_f^{k+1} - \textbf{z}_g^k \right\| ^2. \end{aligned}$$

Using line 7 of Algorithm 1 together with (10) we get

$$\begin{aligned} H(\textbf{z}_f^{k+1})&\le H(\textbf{z}_g^k) - \theta \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \frac{L_{H}\theta ^2}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}^2(k)}\\&\le H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} - \frac{\theta }{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \frac{L_{H}\theta ^2\lambda _{\max }}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} \\&= H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} +\frac{\theta }{2}\left( \theta L_{H}\lambda _{\max }- 1\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)}. \end{aligned}$$

Using condition \(\theta \le \frac{1}{L_{H}\lambda _{\max }}\) we get

$$\begin{aligned} H(\textbf{z}_f^{k+1})&\le H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}}. \end{aligned}$$

\(\square\)

Lemma 2

For \(\sigma \le \frac{1}{\lambda _{\max }}\) we have the inequality

$$\begin{aligned} \begin{aligned} \left\| \textbf{m}^k \right\| ^2_{\textbf{P}} \le&\left( 1 - \frac{\sigma \lambda _{\min }^{+}}{4}\right) \frac{4}{\sigma \lambda _{\min }^{+}}\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} \\&- \frac{4}{\sigma \lambda _{\min }^{+}}\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}+ \frac{8\eta ^2}{(\sigma \lambda _{\min }^{+})^2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}. \end{aligned} \end{aligned}$$
(15)

Proof

Using \(\textbf{P}= \textbf{P}^2\) and \(\textbf{P}\textbf{W}(k) = \textbf{W}(k) \textbf{P}= \textbf{W}(k)\)

together with lines 4 and 5 of Algorithm 1 we obtain

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&= \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) - \Delta ^k \right\| ^2_\textbf{P}\\&= \left\| (\textbf{P}-\sigma \textbf{W}(k))(\textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k)) \right\| ^2 \\&= \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}- 2\sigma \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \sigma ^2\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}^2(k)}. \end{aligned}$$

Using (10) we obtain

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&\le \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}- \sigma \lambda _{\min }^{+}\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} \\&\quad - \sigma \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \sigma ^2\lambda _{\max }\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} \\&= \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}- \sigma \lambda _{\min }^{+}\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} \\&\quad + \sigma (\sigma \lambda _{\max }- 1)\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)}. \end{aligned}$$

Using condition \(\sigma \le \frac{1}{\lambda _{\max }}\) we get

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&\le (1-\sigma \lambda _{\min }^{+})\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}. \end{aligned}$$

Using Young’s inequality we get

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&\le (1-\sigma \lambda _{\min }^{+})\left( \left( 1 + \frac{\sigma \lambda _{\min }^{+}}{2(1-\sigma \lambda _{\min }^{+})}\right) \left\| \textbf{m}^{k} \right\| ^2_\textbf{P}+ \left( 1 + \frac{2(1-\sigma \lambda _{\min }^{+})}{\sigma \lambda _{\min }^{+}}\right) \left\| \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\right) \\&= \left( 1 - \frac{\sigma \lambda _{\min }^{+}}{2}\right) \left\| \textbf{m}^k \right\| ^2_{\textbf{P}} + \eta ^2\frac{(1-\sigma \lambda _{\min }^{+})(2-\sigma \lambda _{\min }^{+})}{\sigma \lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\le \left( 1 - \frac{\sigma \lambda _{\min }^{+}}{2}\right) \left\| \textbf{m}^k \right\| ^2_{\textbf{P}} + \frac{2\eta ^2}{\sigma \lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}. \end{aligned}$$

Rearranging concludes the proof. \(\square\)

Lemma 3

Let

$$\begin{aligned} \alpha&= \frac{\mu _{H}}{2}, \end{aligned}$$
(16)
$$\begin{aligned} \eta&= \frac{2\lambda _{\min }^{+}}{7\lambda _{\max }\sqrt{\mu _{H}L_{H}}}, \end{aligned}$$
(17)
$$\begin{aligned} \theta&= \frac{1}{L_{H}\lambda _{\max }}, \end{aligned}$$
(18)
$$\begin{aligned} \sigma&= \frac{1}{\lambda _{\max }}, \end{aligned}$$
(19)
$$\begin{aligned} \tau&= \frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}. \end{aligned}$$
(20)

Define the Lyapunov function

$$\begin{aligned} \Psi ^k {:}{=}\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(F^*(\textbf{z}_f^k) - F^*(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}, \end{aligned}$$
(21)

where \(\hat{\textbf{z}}^k\) is defined as

$$\begin{aligned} \hat{\textbf{z}}^k = \textbf{z}^k + \textbf{P}\textbf{m}^k. \end{aligned}$$
(22)

Then the following inequality holds:

$$\begin{aligned} \Psi ^{k+1} \le \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \Psi ^k. \end{aligned}$$
(23)

Proof

Using (22) together with lines 5 and 6 of Algorithm 1, we get

$$\begin{aligned} \hat{\textbf{z}}^{k+1}&= \textbf{z}^{k+1} + \textbf{P}\textbf{m}^{k+1}\\&= \textbf{z}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) + \Delta ^k + \textbf{P}( \textbf{m}^k - \eta \nabla H(\textbf{z}_g^k) - \Delta ^k) \\&= \textbf{z}^k + \textbf{P}\textbf{m}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k) + \Delta ^k - \textbf{P}\Delta ^k. \end{aligned}$$

From line 4 of Algorithm 1 and \(\textbf{P}\textbf{W}(k) = \textbf{W}(k)\) it follows that \(\textbf{P}\Delta ^k = \Delta ^k\), which implies

$$\begin{aligned} \hat{\textbf{z}}^{k+1}&= \textbf{z}^k + \textbf{P}\textbf{m}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k) \\&= \hat{\textbf{z}}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k). \end{aligned}$$

Hence,

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&= \left\| \hat{\textbf{z}}^k - \textbf{z}^* + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k) \right\| ^2 \\&= \left\| (1 - \eta \alpha )(\hat{\textbf{z}}^k - \textbf{z}^* )+ \eta \alpha (\textbf{z}_g^k + \textbf{P}\textbf{m}^k - \textbf{z}^*) \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k), \textbf{z}^k + \textbf{P}\textbf{m}^k- \textbf{z}^* + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k)\rangle \\&\le . \end{aligned}$$

Using inequality \(\left\| a + b \right\| ^2 \le (1+\gamma ) \left\| a \right\| ^2 + (1 + \frac{1}{\gamma }) \left\| b \right\| ^2,~\gamma > 0\) with \(\gamma = \frac{\eta \alpha }{1- \eta \alpha }\) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&= (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta \alpha \left\| \textbf{z}_g^k + \textbf{P}\textbf{m}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{P}(\textbf{z}_g^k - \textbf{z}^*)\rangle +2\eta (1-\eta \alpha )\langle \nabla H(\textbf{z}_g^k), \textbf{P}(\textbf{z}_g^k - \textbf{z}^k)\rangle \\&\quad -2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle \\&\le (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{P}(\textbf{z}_g^k - \textbf{z}^*)\rangle +2\eta (1-\eta \alpha )\langle \nabla H(\textbf{z}_g^k), \textbf{P}(\textbf{z}_g^k - \textbf{z}^k)\rangle \\&\quad -2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle \end{aligned}$$

One can observe, that \(\textbf{z}^k,\textbf{z}_g^k,\textbf{z}^* \in \text {Im}\textbf{P}\textbf{B}\). Hence,

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2 \le&(1-\eta \alpha )\left\| \hat{\textbf{z}}^k -\textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 +2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&- 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{z}_g^k - \textbf{z}^*\rangle + 2\eta (1-\eta \alpha )\langle \nabla H(\textbf{z}_g^k), \textbf{z}_g^k - \textbf{z}^k\rangle - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle . \end{aligned}$$

Using line 3 of Algorithm 1 we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le (1-\eta \alpha )\left\| \hat{\textbf{z}}^k -\textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 +2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{z}_g^k - \textbf{z}^*\rangle +2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }\langle \nabla H(\textbf{z}_g^k), \textbf{z}_f^k - \textbf{z}_g^k\rangle - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle . \end{aligned}$$

Using convexity and \(\mu _{H}\)-strong convexity of \(H(\textbf{z})\) on \(\text {Im}\textbf{P}\textbf{B}\) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (H(\textbf{z}_g^k) - H(\textbf{z}^*)) - \eta \mu _{H}\left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 \\&\quad + 2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k))- 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle \\&= (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( 2\eta \alpha - \eta \mu _{H}\right) \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (H(\textbf{z}_g^k) - H(\textbf{z}^*)) +2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k)) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using \(\alpha\) defined by (16) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (H(\textbf{z}_g^k) - H(\textbf{z}^*)) + 2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k)) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Since \(H(\textbf{z}_g^k) \ge H(\textbf{z}^*)\), we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (1-\eta \alpha )(H(\textbf{z}_g^k) - H(\textbf{z}^*)) + 2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k)) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}\\&= \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad + 2\eta (1-\eta \alpha )\left( \frac{(1-\tau )}{\tau }H(\textbf{z}_f^k) + H(\textbf{z}^*) - \frac{1}{\tau }H(\textbf{z}_g^k) \right) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using (14) and \(\theta\) defined by (18) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2 \le&\left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2-\frac{(1-\eta \alpha )\eta \lambda _{\min }^{+}}{\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&- 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using Young’s inequality we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2-\frac{(1-\eta \alpha )\eta \lambda _{\min }^{+}}{\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}+ \frac{\lambda _{\min }^{+}}{\lambda _{\max }}\left\| \textbf{m}^k \right\| ^2_\textbf{P}+ 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}\\&= \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2+\frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{(1-\eta \alpha )\eta \lambda _{\min }^{+}}{\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \left( \frac{\lambda _{\min }^{+}}{\lambda _{\max }}+2\eta \alpha \right) \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using (17) and (16), that imply \(\eta \alpha \le \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\), we obtain

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2+\frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{3\eta \lambda _{\min }^{+}}{4\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \frac{3\lambda _{\min }^{+}}{2\lambda _{\max }}\left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using (15) and \(\sigma\) defined by (19) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2+\frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{3\eta \lambda _{\min }^{+}}{4\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \left( 1 - \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\right) 6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}+ \frac{12\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \frac{14\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{3\eta \lambda _{\min }^{+}}{4\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \left( 1 - \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\right) 6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}. \end{aligned}$$

Using \(\eta\) defined by (17) and \(\tau\) defined by (20) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( 1 - \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\right) 6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}\\&\quad+ \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\le \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \left( \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}\right) \\&\quad- \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}. \end{aligned}$$

Rearranging and using (21) concludes the proof. \(\square\)

1.4 Proof of theorem 3

Proof

From derivation of the reformulated problem and Demyanov-Danskin theorem it follows that \(\nabla F^*(\textbf{B}^\top \textbf{z}^*) = \textbf{x}^*\). Therefore \(\nabla H(\textbf{z}^*) = (0_d, \textbf{x}^*)^\top\). Using \(L_{H}\)-smoothness of H on \(\text {Im}\textbf{P}\textbf{B}\) we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2&= \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - F^*(\textbf{B}^\top \textbf{z}^*) \right\| ^2 \le \left\| \nabla H(\textbf{z}_g^k) - \nabla H(\textbf{z}^*) \right\| ^2 \le L_{H}^2 \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2. \end{aligned}$$

Using line 3 of Algorithm 1 and inequality \(\left\| a + b \right\| ^2 \le (1+\gamma ) \left\| a \right\| ^2 + (1 + \frac{1}{\gamma }) \left\| b \right\| ^2,~\gamma > 0\) with \(\gamma = \frac{1}{\tau } - 1\) we get we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2&\le \tau L_{H}^2\left\| \textbf{z}^k - \textbf{z}^* \right\| ^2 + (1-\tau )L_{H}^2\left\| \textbf{z}_f^k - \textbf{z}^* \right\| ^2. \end{aligned}$$

Using \(\mu _{H}\)-strong convexity of H on \(\text {Im}\textbf{P}\textbf{B}\) we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2&\le \tau L_{H}^2\left\| \textbf{z}^k - \textbf{z}^* \right\| ^2 + \frac{2(1-\tau ) L_{H}^2}{\mu _{H}}(H(\textbf{z}_f^k) - H(\textbf{z}^*)). \end{aligned}$$

Using (22) we get

$$\begin{aligned}&\left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2 \\&\le 2\tau L_{H}^2\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + 2\tau L_{H}^2\left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \frac{2(1-\tau ) L_{H}^2}{\mu _{H}}(H(\textbf{z}_f^k) - H(\textbf{z}^*)) \\&= 2\tau L_{H}^2\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}}\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*)) + \frac{\tau L_{H}^2}{3}6\left\| \textbf{m}^k \right\| ^2_\textbf{P}. \\&\le \max \left\{ 2\tau L_{H}^2,\frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}}, \frac{\tau L_{H}^2}{3}\right\} \left( \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(F^*(\textbf{z}_f^k) - F^*(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}\right) \\&= \max \left\{ 2\tau L_{H}^2,\frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}}\right\} \left( \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(F^*(\textbf{z}_f^k) - F^*(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}\right) . \end{aligned}$$

Using the definition of \(\Psi ^k\) (21) and denoting \(C = \Psi ^0 \max \left\{ 2\tau L_{H}^2,\frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}} \right\}\) we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2 \le \frac{C}{\Psi ^0}\Psi ^k. \end{aligned}$$

Applying Lemma 3 concludes the proof. \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yarmoshik, D., Rogozin, A. & Gasnikov, A. Decentralized optimization with affine constraints over time-varying networks. Comput Manag Sci 21, 10 (2024). https://doi.org/10.1007/s10287-023-00492-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10287-023-00492-w

Keywords

Navigation