Decentralized optimization with affine constraints over time-varying networks

Yarmoshik, Demyan; Rogozin, Alexander; Gasnikov, Alexander

doi:10.1007/s10287-023-00492-w

Decentralized optimization with affine constraints over time-varying networks

Original Paper
Published: 07 December 2023

Volume 21, article number 10, (2024)
Cite this article

Computational Management Science Aims and scope Submit manuscript

Demyan Yarmoshik^1,3,
Alexander Rogozin¹ &
Alexander Gasnikov^1,2,3

121 Accesses
1 Citation
Explore all metrics

Abstract

The decentralized optimization paradigm assumes that each term of a finite-sum objective is privately stored by the corresponding agent. Agents are only allowed to communicate with their neighbors in the communication graph. We consider the case when the agents additionally have local affine constraints and the communication graph can change over time. We provide the first linearly convergent decentralized algorithm for time-varying networks by generalizing the optimal decentralized algorithm ADOM to the case of affine constraints. We show that its rate of convergence is optimal for first-order methods by providing the lower bounds for the number of communications and oracle calls.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decentralized optimization over slowly time-varying graphs: algorithms and lower bounds

Article 28 November 2023

Decentralized Strongly-Convex Optimization with Affine Constraints: Primal and Dual Approaches

Non-smooth setting of stochastic decentralized convex optimization problem over time-varying Graphs

Article 24 October 2023

Notes

Source code: https://github.com/niquepolice/ADOM_affine_constraints.

References

Alghunaim S.A, Yuan K, Sayed A.H (2018) Dual coupled diffusion for distributed optimization with affine constraints. In: 2018 IEEE Conference on decision and control (CDC). IEEE, pp. 829–834
Aybat NS, Hamedani EY (2019) A distributed admm-like method for resource sharing over time-varying networks. SIAM J Optim 29(4):3036–3068
Article Google Scholar
Carli R, Dotoli M (2019) Distributed alternating direction method of multipliers for linearly constrained optimization over a network. IEEE Control Syst Lett 4(1):247–252
Article Google Scholar
Chang T-H (2016) A proximal dual consensus admm method for multi-agent constrained optimization. IEEE Trans Signal Process 64(14):3719–3734
Article Google Scholar
Gong K, Zhang L (2023) Push-pull based distributed primal-dual algorithm for coupled constrained convex optimization in multi-agent networks. Available at SSRN 4109852
Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le Q.V, Wu Y, et al.: (2019) Gpipe: efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32
Hu T.-K, Gama F, Chen T, Wang Z, Ribeiro A, Sadler B.M (2021) Vgai: End-to-end learning of vision-based decentralized controllers for robot swarms. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 4900–4904
Kovalev D, Gasanov E, Gasnikov A, Richtarik P (2021) Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks. Advances in Neural Information Processing Systems, 34
Kovalev D, Shulgin E, Richtárik P, Rogozin A, Gasnikov A (2021) Adom: accelerated decentralized optimization method for time-varying networks. ar**v preprint ar**v:2102.09234
Li W, Tang R, Wang S, Zheng Z (2023) An optimal design method for communication topology of wireless sensor networks to implement fully distributed optimal control in iot-enabled smart buildings. Appl Energy 349:121539
Article Google Scholar
Liang S, Yin G et al (2019) Distributed smooth convex optimization with coupled constraints. IEEE Trans Autom Control 65(1):347–353
Article Google Scholar
Lian X, Zhang C, Zhang H, Hsieh C.-J, Zhang W, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: Advances in neural information processing systems, pp. 5330–5340
Molzahn DK, Dörfler F, Sandberg H, Low SH, Chakrabarti S, Baldick R, Lavaei J (2017) A survey of distributed optimization and control algorithms for electric power systems. IEEE Trans Smart Grid 8(6):2941–2962
Article Google Scholar
Necoara I, Nedelcu V, Dumitrache I (2011) Parallel and distributed optimization methods for estimation and control in networks. J Process Control 21(5):756–766
Article Google Scholar
Nedic A, Ozdaglar A, Parrilo PA (2010) Constrained consensus and optimization in multi-agent networks. IEEE Trans Autom Control 55(4):922–938
Article Google Scholar
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, Amsterdam
Book Google Scholar
Rogozin A, Yarmoshik D, Kopylova K, Gasnikov, A (2022) Decentralized strongly-convex optimization with affine constraints: Primal and dual approaches. ar**v preprint ar**v:2207.04555
Salim A, Condat L, Kovalev D, Richtárik P (2022) An optimal algorithm for strongly convex minimization under affine constraints. In: International conference on artificial intelligence and statistics, pp. 4482–4498 . PMLR
Scaman K, Bach F, Bubeck S, Lee Y.T, Massoulié L (2017) Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: Proceedings of the 34th international conference on machine learning-Volume 70 JMLR. org, pp. 3027–3036
Scutari G, Sun Y (2019) Distributed nonconvex constrained optimization over time-varying digraphs. Math Program 176(1):497–544
Article Google Scholar
Scutari G, Facchinei F, Lampariello L (2016) Parallel and distributed methods for constrained nonconvex optimization-part i: theory. IEEE Trans Signal Process 65(8):1929–1944
Article Google Scholar
Silva-Rodriguez J, Li X (2023) Privacy-preserving decentralized energy management for networked microgrids via objective-based admm. ar**v preprint ar**v:2304.03649
Wang J, Hu G (2022) Distributed optimization with coupling constraints in multi-cluster networks based on dual proximal gradient method. ar**v preprint ar**v:2203.00956
Wu X, Wang H, Lu J (2022) Distributed optimization with coupling constraints. IEEE Trans Autom Control 8(3):1847–1854
Article Google Scholar
Yarmoshik D, Rogozin A, Khamisov O, Dvurechensky P, Gasnikov A, et al (2022) Decentralized convex optimization under affine constraints for power systems control. ar**v preprint ar**v:2203.16686
Zhou H, Lange K (2013) A path algorithm for constrained estimation. J Comput Graph Stat 22(2):261–283
Article Google Scholar
Zhu M, Martinez S (2011) On distributed convex optimization under inequality and equality constraints. IEEE Trans Autom Control 57(1):151–164
Google Scholar
Zhu F, Ren Y, Kong F, Wu H, Liang S, Chen N, Xu W, Zhang F (2023) Swarm-lio: Decentralized swarm lidar-inertial odometry. In: 2023 IEEE International conference on robotics and automation (ICRA). IEEE, pp. 3254–3260

Download references

Acknowledgements

This work was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Moscow Institute of Physics and Technology dated November 1, 2021 No. 70-2021-00138.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Demyan Yarmoshik, Alexander Rogozin & Alexander Gasnikov
Skoltech, Moscow, Russia
Alexander Gasnikov
Institute for Information Transmission Problems, Moscow, Russia
Demyan Yarmoshik & Alexander Gasnikov

Authors

Demyan Yarmoshik
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Rogozin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Demyan Yarmoshik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of theorem 1

Proof

Let the affine constraint in problem 3 be $A_i x_i = 0$, with $A_i=A=\sqrt{W' \otimes I_{d/m}}$. Then the affine constrained decentralized problem can be seen as two-level decentralized problem, as explained above.

Select sets of subnodes $S_1$, $S_2$ and $S_3$ such that $S_1$, $S_2$ are at the distance $\ge \Delta _A$ through the inner graph, and $S_2$, $S_3$ are at the distance $\ge \Delta _W$ through the outer graph. Consider the following splitting of the Nesterov’s “bad” function

$$\begin{aligned} f_{ij}(x) = \frac{\alpha }{2mn} \left\| x \right\| ^2 + \frac{\beta - \alpha }{8} \cdot {\left\{ \begin{array}{ll} \frac{1}{|S_1|}\left( x^\top M_1 x - 2x_{[1]} \right) ,~ (i,j) \in S_1,\\ \frac{1}{|S_2|}x^\top M_2 x,~ (i,j) \in S_2,\\ \frac{1}{|S_3|}x^\top M_3 x,~ (i,k) \in S_3,\\ 0,~ \text {otherwise,} \end{array}\right. } \end{aligned}$$

(12)

where

$$\begin{aligned} M_1&= \text {diag}{(1, 0, M_0, 0, M_0, \ldots )},\\ M_2&= \text {diag}{(M_0, 0, M_0, 0, \ldots )},\\ M_3&= \text {diag}{(0, M_0, 0, M_0, \ldots )},\\ M_0&= \begin{pmatrix} 1 &{}-1\\ -1&{} 1 \end{pmatrix}. \end{aligned}$$

Then increasing the number of nonzero components in $x^k$ on any subnode by three requires one local computation on a node in $S_1$, $\Delta _A$ inner communications a.k.a. multiplications by $A^\top A$, one local computation on a node in $S_2$, $\Delta _W$ communications in the outer graph and one local computation on a node in $S_3$. Denote by $\kappa _g = \frac{\beta }{\alpha }$ the “global” condition number of $\sum _{ij}^{mn}f_{ij}$. Since the solution is $x^*_k = \left( \frac{\sqrt{\kappa }_g - 1}{\sqrt{\kappa }_g +1 }\right) ^k$, we have

$$\begin{aligned} \left\| x^N - x^* \right\| ^2 \ge \sum _{k=N+2}^\infty (x^*_k)^2 \ge \left( \frac{\sqrt{\kappa }_g - 1}{\sqrt{\kappa }_g +1 }\right) ^{N+2}, \end{aligned}$$

(13)

where N is the number of iterations, each including 3 sequential computational steps, $\Delta _A$ multiplications by $A^TA$ and $\Delta _W$ communications.

To finish the proof we need to construct communication graphs G, $G'$, where distances between $S_1$, $S_2$ and $S_2$, $S_3$ are close to $\Delta _A, \Delta _W$, and equip the graphs with gossip matrices with given condition numbers $\chi _A, \chi _W$.

We should also choose $\alpha$ and $\beta$ such that $f_i$ are $L_F$-smooth and $\mu _F$-strongly convex, and choose $S_1$, $S_2$, $S_3$ so that $\kappa _g$ is similar to $\frac{L_F}{\mu _F}$.

Denote $\gamma (M) = \sigma _{\min }^{+}(M) / \sigma _{\max }(M)$, $\gamma _W = 1/\chi _W$, $\gamma _A = 1/\chi _A$. Let $\gamma _n=\frac{1-\cos \left( \frac{\pi }{n}\right) }{1+\cos \left( \frac{\pi }{n}\right) }$ be a decreasing sequence of positive numbers. Since $\gamma _2=1$ and $\lim _n \gamma _n=$ 0, there exists $n \ge 2$ such that $\gamma _n \ge \gamma >\gamma _{n+1}$ and $m \ge 2$ such that $\gamma _m \ge \gamma >\gamma _{m+1}$.

First, construct graph G. The cases $n=2$ and $n \ge 3$ are treated separately. If $n \ge 3$, let G be the linear graph of size n ordered from node 1 to n, and weighted with $w_{i, i+1}={\left\{ \begin{array}{ll}1-a, i=1\\ 1, \text {otherwise}\end{array}\right. }.$ Then set $S_{2G}=\left\{ 1, \ldots , \lceil n / 32\rceil \right\}$ and $\Delta _W=(1-1 / 16) n-1$, so that $S_{3\,G} = \{\lceil n / 32\rceil + \lceil \Delta _W\rceil , \ldots , n\}$.

Take $W_a$ as the Laplacian of the weighted graph G. A simple calculation gives that, if $a=0$, $\gamma \left( W_a\right) =\gamma _n$ and, if $a=1$, the network is disconnected and $\gamma \left( W_a\right) =0$. Thus, by continuity of the eigenvalues of a matrix, there exists a value $a \in [0,1]$ such that $\gamma \left( W_a\right) =\gamma _W$. Finally, by definition of n, one has $\gamma _W>\gamma _{n+1} \ge \frac{2}{(n+1)^2}$, and $\Delta _W \ge \frac{15}{16}\left( \sqrt{\frac{2}{\gamma _W}}-1\right) -1 \ge \frac{1}{5 \sqrt{\gamma _W}}$ when $\gamma _W \le \gamma _3=\frac{1}{3}$.

For the case $n=2$, we consider the totally connected network of 3 nodes, reweight only the edge $\left( 1, 3\right)$ by $a \in [0,1]$, and let $W_a$ be its Laplacian matrix. If $a=1$, then the network is totally connected and $\gamma \left( W_a\right) =1$. If, on the contrary, $a=0$, then the network is a linear graph and $\gamma \left( W_a\right) =\gamma _3$. Thus, there exists a value $a \in [0,1]$ such that $\gamma \left( W_a\right) =\gamma$. Set $S_{2G}=\left\{ 1\right\} , S_{3G}=\left\{ 2\right\}$, then $\Delta _W=1 \ge \frac{1}{\sqrt{3 \gamma _W}}$.

Second, do the same for graph $G'$, obtaining m, $S_{1\,G'}, S_{2\,G'}$ and $\Delta _A \ge \frac{1}{5\sqrt{\gamma _A}}$.

Define $S_1 = S_{2\,G} \times S_{1\,G'}$, $S_2 = S_{2\,G} \times S_{2\,G'}$ and $S_3 = S_{3G} \times S_{2G'}$, see Fig. 4. In all cases we have $|S_k| \ge |S_2| \ge \lceil \frac{n}{32}\rceil \lceil \frac{m}{32} \rceil$ for $k\in \{1,3\}$.

Because $\mu _F = \frac{\alpha }{n}$, we set $\alpha = \mu _F n$. Since $0 \preceq M_k \preceq 2I$ for $k\in \{1,2,3\}$, $L_F = \frac{\alpha }{n} + \frac{(\beta -\alpha )m}{2|S_2|}$, thus set $\beta =2|S_2|(L_F-\mu _F)/m + \mu _F n$ to make all $f_i$ be $L_F$-smooth and $\mu _F$-strongly convex. Then $\kappa _g = \frac{\beta }{\alpha } = 1 + \frac{2|S_2|(L_F - \mu _F)}{\mu _F mn} \ge \frac{L_F}{512 \mu _F}$. Combining this with (13) and the inequalities between $\Delta _A, \gamma _A$ and $\Delta _W, \gamma _W$ we conclude the proof. $\square$

1.2 Proof of theorem 2

Proof

As in the proof of Theorem 1 we set the affine constraint in problem 3 to be $A_i x_i = 0$, with $A_i=A=\sqrt{W' \otimes I_{d/m}}$, where $W'$ is a gossip matrix of some inner communication graph $G'$. Let the sequence of outer communication graphs G(k) be the same as in the proof of Theorem 1 in Kovalev et al. (2021b): $n= 3 \left\lfloor {\chi _W/3}\right\rfloor$ nodes are split into three disjoint sets $V_1, V_2, V_3$ of equal size, and $G(k) = (V, E(k))$ are star graphs with the center nodes cycling through $V_2$. Choose the inner communication graph $G' = (V', E')$ as in the proof of Theorem 1. Use Nesterov’s function splitting given by (12), choose $S_{1G'}$ and $S_{2G'}$ as in the proof of Theorem 1. Set $S_1 = V_1 \times S_{1\,G'}$, $S_2 = V_1 \times S_{2\,G'}$ and $S_3 = V_3 \times E'$. Setting W(k) to be the Laplacian of the star graph G(k) we have $\frac{\lambda _{\max }(W(k))}{\lambda _{\min }(W(k))} =n \le \chi _W$. Also (Lemma 2 Kovalev et al. (2021b) and proof of Theorem 1), increasing the number of nonzero components of $x_k$ on any subnode requires local computation on a node in $S_1$, $\Theta \left( \sqrt{\chi _A} \right)$ communications in the inner graph $G'$ (i.e. multiplications by $A^\top A$), one local computation on a node in $S_2$, $\Theta \left( \chi _W \right)$ communications in the outer graph G and one local computation on a node in $S_3$. Same reasoning as in the proof of the previous theorem gives $\kappa _g = \Theta \left( \frac{L_F}{\mu _F} \right)$, then using (13) we conclude the proof. $\square$

1.3 Auxiliary lemmas for theorem 3

Lemma 1

For $\theta \le \frac{1}{L_{H}\lambda _{\max }}$ we have the inequality

$$\begin{aligned} H(\textbf{z}_f^{k+1}) \le H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}}. \end{aligned}$$

(14)

Proof

We start with $L_{H}$-smoothness of H on $\text {Im}\textbf{P}\textbf{B}$:

$$\begin{aligned} H(\textbf{z}_f^{k+1}) \le H(\textbf{z}_g^k) + \langle \nabla H(\textbf{z}_g^k), \textbf{z}_f^{k+1} - \textbf{z}_g^k\rangle + \frac{L_{H}}{2}\left\| \textbf{z}_f^{k+1} - \textbf{z}_g^k \right\| ^2. \end{aligned}$$

Using line 7 of Algorithm 1 together with (10) we get

$$\begin{aligned} H(\textbf{z}_f^{k+1})&\le H(\textbf{z}_g^k) - \theta \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \frac{L_{H}\theta ^2}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}^2(k)}\\&\le H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} - \frac{\theta }{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \frac{L_{H}\theta ^2\lambda _{\max }}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} \\&= H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} +\frac{\theta }{2}\left( \theta L_{H}\lambda _{\max }- 1\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)}. \end{aligned}$$

Using condition $\theta \le \frac{1}{L_{H}\lambda _{\max }}$ we get

$$\begin{aligned} H(\textbf{z}_f^{k+1})&\le H(\textbf{z}_g^k) - \frac{\theta \lambda _{\min }^{+}}{2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}}. \end{aligned}$$

$\square$

Lemma 2

For $\sigma \le \frac{1}{\lambda _{\max }}$ we have the inequality

$$\begin{aligned} \begin{aligned} \left\| \textbf{m}^k \right\| ^2_{\textbf{P}} \le&\left( 1 - \frac{\sigma \lambda _{\min }^{+}}{4}\right) \frac{4}{\sigma \lambda _{\min }^{+}}\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} \\&- \frac{4}{\sigma \lambda _{\min }^{+}}\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}+ \frac{8\eta ^2}{(\sigma \lambda _{\min }^{+})^2}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}. \end{aligned} \end{aligned}$$

(15)

Proof

Using $\textbf{P}= \textbf{P}^2$ and $\textbf{P}\textbf{W}(k) = \textbf{W}(k) \textbf{P}= \textbf{W}(k)$

together with lines 4 and 5 of Algorithm 1 we obtain

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&= \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) - \Delta ^k \right\| ^2_\textbf{P}\\&= \left\| (\textbf{P}-\sigma \textbf{W}(k))(\textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k)) \right\| ^2 \\&= \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}- 2\sigma \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \sigma ^2\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}^2(k)}. \end{aligned}$$

Using (10) we obtain

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&\le \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}- \sigma \lambda _{\min }^{+}\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} \\&\quad - \sigma \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} + \sigma ^2\lambda _{\max }\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)} \\&= \left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}- \sigma \lambda _{\min }^{+}\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{P}} \\&\quad + \sigma (\sigma \lambda _{\max }- 1)\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_{\textbf{W}(k)}. \end{aligned}$$

Using condition $\sigma \le \frac{1}{\lambda _{\max }}$ we get

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&\le (1-\sigma \lambda _{\min }^{+})\left\| \textbf{m}^{k} - \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}. \end{aligned}$$

Using Young’s inequality we get

$$\begin{aligned} \left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}&\le (1-\sigma \lambda _{\min }^{+})\left( \left( 1 + \frac{\sigma \lambda _{\min }^{+}}{2(1-\sigma \lambda _{\min }^{+})}\right) \left\| \textbf{m}^{k} \right\| ^2_\textbf{P}+ \left( 1 + \frac{2(1-\sigma \lambda _{\min }^{+})}{\sigma \lambda _{\min }^{+}}\right) \left\| \eta \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\right) \\&= \left( 1 - \frac{\sigma \lambda _{\min }^{+}}{2}\right) \left\| \textbf{m}^k \right\| ^2_{\textbf{P}} + \eta ^2\frac{(1-\sigma \lambda _{\min }^{+})(2-\sigma \lambda _{\min }^{+})}{\sigma \lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\le \left( 1 - \frac{\sigma \lambda _{\min }^{+}}{2}\right) \left\| \textbf{m}^k \right\| ^2_{\textbf{P}} + \frac{2\eta ^2}{\sigma \lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}. \end{aligned}$$

Rearranging concludes the proof. $\square$

Lemma 3

Let

$$\begin{aligned} \alpha&= \frac{\mu _{H}}{2}, \end{aligned}$$

(16)

$$\begin{aligned} \eta&= \frac{2\lambda _{\min }^{+}}{7\lambda _{\max }\sqrt{\mu _{H}L_{H}}}, \end{aligned}$$

(17)

$$\begin{aligned} \theta&= \frac{1}{L_{H}\lambda _{\max }}, \end{aligned}$$

(18)

$$\begin{aligned} \sigma&= \frac{1}{\lambda _{\max }}, \end{aligned}$$

(19)

$$\begin{aligned} \tau&= \frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}. \end{aligned}$$

(20)

Define the Lyapunov function

$$\begin{aligned} \Psi ^k {:}{=}\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(F^*(\textbf{z}_f^k) - F^*(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}, \end{aligned}$$

(21)

where $\hat{\textbf{z}}^k$ is defined as

$$\begin{aligned} \hat{\textbf{z}}^k = \textbf{z}^k + \textbf{P}\textbf{m}^k. \end{aligned}$$

(22)

Then the following inequality holds:

$$\begin{aligned} \Psi ^{k+1} \le \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \Psi ^k. \end{aligned}$$

(23)

Proof

Using (22) together with lines 5 and 6 of Algorithm 1, we get

$$\begin{aligned} \hat{\textbf{z}}^{k+1}&= \textbf{z}^{k+1} + \textbf{P}\textbf{m}^{k+1}\\&= \textbf{z}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) + \Delta ^k + \textbf{P}( \textbf{m}^k - \eta \nabla H(\textbf{z}_g^k) - \Delta ^k) \\&= \textbf{z}^k + \textbf{P}\textbf{m}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k) + \Delta ^k - \textbf{P}\Delta ^k. \end{aligned}$$

From line 4 of Algorithm 1 and $\textbf{P}\textbf{W}(k) = \textbf{W}(k)$ it follows that $\textbf{P}\Delta ^k = \Delta ^k$, which implies

$$\begin{aligned} \hat{\textbf{z}}^{k+1}&= \textbf{z}^k + \textbf{P}\textbf{m}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k) \\&= \hat{\textbf{z}}^k + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k). \end{aligned}$$

Hence,

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&= \left\| \hat{\textbf{z}}^k - \textbf{z}^* + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k) - \eta \textbf{P}\nabla H(\textbf{z}_g^k) \right\| ^2 \\&= \left\| (1 - \eta \alpha )(\hat{\textbf{z}}^k - \textbf{z}^* )+ \eta \alpha (\textbf{z}_g^k + \textbf{P}\textbf{m}^k - \textbf{z}^*) \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k), \textbf{z}^k + \textbf{P}\textbf{m}^k- \textbf{z}^* + \eta \alpha (\textbf{z}_g^k - \textbf{z}^k)\rangle \\&\le . \end{aligned}$$

Using inequality $\left\| a + b \right\| ^2 \le (1+\gamma ) \left\| a \right\| ^2 + (1 + \frac{1}{\gamma }) \left\| b \right\| ^2,~\gamma > 0$ with $\gamma = \frac{\eta \alpha }{1- \eta \alpha }$ we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&= (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta \alpha \left\| \textbf{z}_g^k + \textbf{P}\textbf{m}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{P}(\textbf{z}_g^k - \textbf{z}^*)\rangle +2\eta (1-\eta \alpha )\langle \nabla H(\textbf{z}_g^k), \textbf{P}(\textbf{z}_g^k - \textbf{z}^k)\rangle \\&\quad -2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle \\&\le (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{P}(\textbf{z}_g^k - \textbf{z}^*)\rangle +2\eta (1-\eta \alpha )\langle \nabla H(\textbf{z}_g^k), \textbf{P}(\textbf{z}_g^k - \textbf{z}^k)\rangle \\&\quad -2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle \end{aligned}$$

One can observe, that $\textbf{z}^k,\textbf{z}_g^k,\textbf{z}^* \in \text {Im}\textbf{P}\textbf{B}$. Hence,

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2 \le&(1-\eta \alpha )\left\| \hat{\textbf{z}}^k -\textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 +2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&- 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{z}_g^k - \textbf{z}^*\rangle + 2\eta (1-\eta \alpha )\langle \nabla H(\textbf{z}_g^k), \textbf{z}_g^k - \textbf{z}^k\rangle - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle . \end{aligned}$$

Using line 3 of Algorithm 1 we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le (1-\eta \alpha )\left\| \hat{\textbf{z}}^k -\textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 +2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta \langle \nabla H(\textbf{z}_g^k),\textbf{z}_g^k - \textbf{z}^*\rangle +2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }\langle \nabla H(\textbf{z}_g^k), \textbf{z}_f^k - \textbf{z}_g^k\rangle - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle . \end{aligned}$$

Using convexity and $\mu _{H}$-strong convexity of $H(\textbf{z})$ on $\text {Im}\textbf{P}\textbf{B}$ we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (H(\textbf{z}_g^k) - H(\textbf{z}^*)) - \eta \mu _{H}\left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 \\&\quad + 2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k))- 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle \\&= (1-\eta \alpha )\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( 2\eta \alpha - \eta \mu _{H}\right) \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (H(\textbf{z}_g^k) - H(\textbf{z}^*)) +2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k)) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using $\alpha$ defined by (16) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (H(\textbf{z}_g^k) - H(\textbf{z}^*)) + 2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k)) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Since $H(\textbf{z}_g^k) \ge H(\textbf{z}^*)$, we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad - 2\eta (1-\eta \alpha )(H(\textbf{z}_g^k) - H(\textbf{z}^*)) + 2\eta (1-\eta \alpha )\frac{(1-\tau )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}_g^k)) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}\\&= \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \eta ^2\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad + 2\eta (1-\eta \alpha )\left( \frac{(1-\tau )}{\tau }H(\textbf{z}_f^k) + H(\textbf{z}^*) - \frac{1}{\tau }H(\textbf{z}_g^k) \right) \\&\quad - 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using (14) and $\theta$ defined by (18) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2 \le&\left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2-\frac{(1-\eta \alpha )\eta \lambda _{\min }^{+}}{\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&- 2\eta \langle \textbf{P}\nabla H(\textbf{z}_g^k),\textbf{m}^k\rangle + 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using Young’s inequality we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2-\frac{(1-\eta \alpha )\eta \lambda _{\min }^{+}}{\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}+ \frac{\lambda _{\min }^{+}}{\lambda _{\max }}\left\| \textbf{m}^k \right\| ^2_\textbf{P}+ 2\eta \alpha \left\| \textbf{m}^k \right\| ^2_\textbf{P}\\&= \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2+\frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{(1-\eta \alpha )\eta \lambda _{\min }^{+}}{\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \left( \frac{\lambda _{\min }^{+}}{\lambda _{\max }}+2\eta \alpha \right) \left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using (17) and (16), that imply $\eta \alpha \le \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}$, we obtain

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2+\frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{3\eta \lambda _{\min }^{+}}{4\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \frac{3\lambda _{\min }^{+}}{2\lambda _{\max }}\left\| \textbf{m}^k \right\| ^2_\textbf{P}. \end{aligned}$$

Using (15) and $\sigma$ defined by (19) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \eta ^2+\frac{\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{3\eta \lambda _{\min }^{+}}{4\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \left( 1 - \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\right) 6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}+ \frac{12\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}\left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\le \left( 1-\frac{\eta \mu _{H}}{2}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( \frac{14\eta ^2\lambda _{\max }}{\lambda _{\min }^{+}}-\frac{3\eta \lambda _{\min }^{+}}{4\tau \lambda _{\max }L_{H}}\right) \left\| \nabla H(\textbf{z}_g^k) \right\| ^2_\textbf{P}\\&\quad+ (1-\tau )\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\quad+ \left( 1 - \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\right) 6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}. \end{aligned}$$

Using $\eta$ defined by (17) and $\tau$ defined by (20) we get

$$\begin{aligned} \left\| \hat{\textbf{z}}^{k+1} - \textbf{z}^* \right\| ^2&\le \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \left( 1 - \frac{\lambda _{\min }^{+}}{4\lambda _{\max }}\right) 6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}} - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}\\&\quad+ \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) ) - \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) \\&\le \left( 1-\frac{\lambda _{\min }^{+}}{7\lambda _{\max }}\sqrt{\frac{\mu _{H}}{L_{H}}}\right) \left( \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}\right) \\&\quad- \frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^{k+1}) - H(\textbf{z}^*) ) - 6\left\| \textbf{m}^{k+1} \right\| ^2_\textbf{P}. \end{aligned}$$

Rearranging and using (21) concludes the proof. $\square$

1.4 Proof of theorem 3

Proof

From derivation of the reformulated problem and Demyanov-Danskin theorem it follows that $\nabla F^*(\textbf{B}^\top \textbf{z}^*) = \textbf{x}^*$. Therefore $\nabla H(\textbf{z}^*) = (0_d, \textbf{x}^*)^\top$. Using $L_{H}$-smoothness of H on $\text {Im}\textbf{P}\textbf{B}$ we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2&= \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - F^*(\textbf{B}^\top \textbf{z}^*) \right\| ^2 \le \left\| \nabla H(\textbf{z}_g^k) - \nabla H(\textbf{z}^*) \right\| ^2 \le L_{H}^2 \left\| \textbf{z}_g^k - \textbf{z}^* \right\| ^2. \end{aligned}$$

Using line 3 of Algorithm 1 and inequality $\left\| a + b \right\| ^2 \le (1+\gamma ) \left\| a \right\| ^2 + (1 + \frac{1}{\gamma }) \left\| b \right\| ^2,~\gamma > 0$ with $\gamma = \frac{1}{\tau } - 1$ we get we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2&\le \tau L_{H}^2\left\| \textbf{z}^k - \textbf{z}^* \right\| ^2 + (1-\tau )L_{H}^2\left\| \textbf{z}_f^k - \textbf{z}^* \right\| ^2. \end{aligned}$$

Using $\mu _{H}$-strong convexity of H on $\text {Im}\textbf{P}\textbf{B}$ we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2&\le \tau L_{H}^2\left\| \textbf{z}^k - \textbf{z}^* \right\| ^2 + \frac{2(1-\tau ) L_{H}^2}{\mu _{H}}(H(\textbf{z}_f^k) - H(\textbf{z}^*)). \end{aligned}$$

Using (22) we get

$$\begin{aligned}&\left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2 \\&\le 2\tau L_{H}^2\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + 2\tau L_{H}^2\left\| \textbf{m}^k \right\| ^2_\textbf{P}+ \frac{2(1-\tau ) L_{H}^2}{\mu _{H}}(H(\textbf{z}_f^k) - H(\textbf{z}^*)) \\&= 2\tau L_{H}^2\left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}}\frac{2\eta (1-\eta \alpha )}{\tau }(H(\textbf{z}_f^k) - H(\textbf{z}^*)) + \frac{\tau L_{H}^2}{3}6\left\| \textbf{m}^k \right\| ^2_\textbf{P}. \\&\le \max \left\{ 2\tau L_{H}^2,\frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}}, \frac{\tau L_{H}^2}{3}\right\} \left( \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(F^*(\textbf{z}_f^k) - F^*(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}\right) \\&= \max \left\{ 2\tau L_{H}^2,\frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}}\right\} \left( \left\| \hat{\textbf{z}}^k - \textbf{z}^* \right\| ^2 + \frac{2\eta (1-\eta \alpha )}{\tau }(F^*(\textbf{z}_f^k) - F^*(\textbf{z}^*) )+6\left\| \textbf{m}^k \right\| ^2_{\textbf{P}}\right) . \end{aligned}$$

Using the definition of $\Psi ^k$ (21) and denoting $C = \Psi ^0 \max \left\{ 2\tau L_{H}^2,\frac{\tau (1-\tau )L_{H}^2}{\eta (1-\eta \alpha )\mu _{H}} \right\}$ we get

$$\begin{aligned} \left\| \nabla F^*(\textbf{B}^\top \textbf{z}_g^k) - \textbf{x}^* \right\| ^2 \le \frac{C}{\Psi ^0}\Psi ^k. \end{aligned}$$

Applying Lemma 3 concludes the proof. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yarmoshik, D., Rogozin, A. & Gasnikov, A. Decentralized optimization with affine constraints over time-varying networks. Comput Manag Sci 21, 10 (2024). https://doi.org/10.1007/s10287-023-00492-w

Download citation

Received: 30 June 2023
Accepted: 14 November 2023
Published: 07 December 2023
DOI: https://doi.org/10.1007/s10287-023-00492-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decentralized optimization with affine constraints over time-varying networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Decentralized optimization over slowly time-varying graphs: algorithms and lower bounds

Decentralized Strongly-Convex Optimization with Affine Constraints: Primal and Dual Approaches

Non-smooth setting of stochastic decentralized convex optimization problem over time-varying Graphs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of theorem 1

Proof

1.2 Proof of theorem 2

Proof

1.3 Auxiliary lemmas for theorem 3

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

1.4 Proof of theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Decentralized optimization with affine constraints over time-varying networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Decentralized optimization over slowly time-varying graphs: algorithms and lower bounds

Decentralized Strongly-Convex Optimization with Affine Constraints: Primal and Dual Approaches

Non-smooth setting of stochastic decentralized convex optimization problem over time-varying Graphs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of theorem 1

Proof

1.2 Proof of theorem 2

Proof

1.3 Auxiliary lemmas for theorem 3

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

1.4 Proof of theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation