Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure

Xue, Zhonghui; Yang, Kaiyuan; Ma, Qianfeng; Dang, Yazheng

doi:10.1186/s13660-024-03141-1

Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure

Research
Open access
Published: 08 May 2024

Volume 2024, article number 65, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Inequalities and Applications Submit manuscript

Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure

Download PDF

Zhonghui Xue¹,
Kaiyuan Yang²,
Qianfeng Ma¹ &
…
Yazheng Dang²

311 Accesses
Explore all metrics

Abstract

The alternating direction method of multipliers (ADMM) has been widely used to solve linear constrained problems in signal processing, matrix decomposition, machine learning, and many other fields. This paper introduces two linearized ADMM algorithms, namely sequential partial linear inertial ADMM (SPLI-ADMM) and sequential complete linear inertial ADMM (SCLI-ADMM), which integrate linearized ADMM approach with inertial technique in the full nonconvex framework with nonseparable structure. Iterative schemes are formulated using either partial or full linearization while also incorporating the sequential gradient of the composite term in each subproblem’s update. This adaptation ensures that each iteration utilizes the latest information to improve the efficiency of the algorithms. Under some mild conditions, we prove that the sequences generated by two proposed algorithms converge to the critical points of the problem with the help of KŁ property. Finally, some numerical results are reported to show the effectiveness of the proposed algorithms.

Primal and dual mixed-integer least-squares: distributional statistics and global algorithm

Article Open access 24 June 2024

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Article 07 June 2018

The Parameterized Augmentation Block Preconditioner for Nonsymmetric Saddle Point Problems

Article 22 June 2024

1 Introduction

In this paper, we consider the following linearly constrained nonconvex optimization problem with multiple block variables:

$$\begin{aligned} \begin{aligned}& \underset{x_{i}, y}{\min } \sum_{i = 1}^{n} {{f_{i}} ( {{x_{i}}} )} + g ( {{x_{1}}, {x_{2}},\ldots , {x_{n}} ,y} ), \\ &\quad \text{s.t. }\sum_{i = 1}^{n} {{A_{i}} {x_{i}}} + By =b, \end{aligned} \end{aligned}$$

(1.1)

where ${x_{i}} \in {\mathbb{R} ^{{p_{i}}}} ( {i = 1,2, \ldots n} )$ and $y \in {\mathbb{R} ^{q}}$ are variables, each ${f_{i}}:{\mathbb{R} ^{{p_{i}}}} \to \mathbb{R} \cup \{ { + \infty } \} ( {i = 1,2, \ldots n} )$ are proper lower semicontinuous functions, which are nonconvex and (possibly) nonsmooth, $g:{\mathbb{R} ^{p_{1}}}\times{\mathbb{R} ^{p_{2}}}\times \cdots \times{\mathbb{R} ^{p_{n}}}\times{\mathbb{R} ^{q}} \to \mathbb{R}$ is continuously differentiable, and ∇g is Lipschitz continuous with modulus $l_{g}>0$, ${A_{i}} \in {\mathbb{R}^{m \times {p_{i}}}} ( {i = 1,2, \ldots n} ), B \in {\mathbb{R}^{m \times {q}}}$ are given matrices, and $b \in {\mathbb{R} ^{m}}$. Denote $\mathbf{x}_{[i,j]} = (x_{i},x_{i+1}, \ldots , x_{j-1},x_{j})$ and $\mathbf{Ax}_{[j,k]} = \sum_{i=j}^{k}A_{i}x_{i}$.

The Augmented Lagrangian Function (ALF) of (1.1) is defined as

$$\begin{aligned} \begin{aligned} L ( {\mathbf{x}_{[1,n]} ,y,\lambda } ) = \sum _{i=1}^{n}{f_{i}} ( {{x_{i}}} ) +g ( \mathbf{x}_{[1,n]},y ) - \langle {\lambda , \mathbf{Ax}_{[1,n]} + By - b} \rangle +\frac{\beta}{2} \Vert \mathbf{Ax}_{[1,n]} + By - b \Vert ^{2}, \end{aligned} \end{aligned}$$

where $\lambda \in \mathbb{R}^{m}$ is the Lagrangian dual variable, and $\beta >0$ is a penalty parameter.

The problem (1.1) encapsulates a multitude of nonconvex optimization problems across various domains, including signal processing, image reconstruction, matrix decomposition, machine learning, etc. [1–3]. When the number of blocks n equals 2, and $g(\cdot )$ is identically zero, this problem degenerates into two-block separable problem. If the problem contains merely a mixed term, it becomes similar to the problem in [4]. On the other hand, if variable y is absent, the problem becomes the study in [5]. Hence, problem (1.1) extends the scope of the objective functions found in the literature [4–6], encompassing a broader range of scenarios with additional variables and potential mixed terms, thereby reflecting the versatility and complexity encountered in contemporary applications.

Indeed, ADMM has been established as a powerful tool for solving two-block separable convex optimization problems [7, 8]. However, its effectiveness and convergence guarantees become much more intricate when dealing with nonconvex problems, especially when the number of blocks exceeds two. Zhang et al. [9] tackled this challenge by proposing a proximal ADMM for solving three-block nonconvex optimization tasks, building upon the groundwork laid by Sun et al. [10]. Meanwhile, Wang et al. [11] proposed an inertial proximal partially symmetric ADMM, suitable for handling multiblock separable nonconvex optimization problems. Hien et al. [12] developed an inertial version of ADMM, referred to as iADMM, which integrated the majorization-minimization principle within each block update step to address a specific class of nonconvex low-rank representation problems. Chao et al. [13] contributed to this area with a linear Bregman ADMM algorithm for nonconvex multiblock optimization problems featuring nonseparable structures.

Linearized Alternating Direction Method of Multipliers (LADMM) simplifies the problem-solving process and significantly decreases the computational overhead associated with traditional ADMM. By linearizing certain components of the optimization problem at each iteration, LADMM allows for more straightforward and efficient updates. Li et al. [14] effectively utilized LADMM in the context of the least absolute shrinkage and selection operator (LASSO) problem, demonstrating that this linearized approach is simple and highly efficient. Ling et al. [15] further extended the application of LADMM by introducing a decentralized linearized ADMM algorithm, which solely linearizes the objective functions at each iterative step. This method facilitates distributed computation and can handle large-scale problems more effectively. Specifically addressing nonconvex and nonsmooth scenarios, Liu et al. [16] proposed a two-block linearized ADMM. This variant linearizes the mixed term and the quadratic penalty term in the Augmented Lagrangian Function (ALF), thereby providing a viable solution strategy for such challenging optimization problems. Chao et al. [13] presented a linear Bregman ADMM, which only linearized the mixed term for solving three-block nonseparable problems. This approach maintains the efficiency gains of LADMM while adapting it to accommodate the complexities inherent in multiblock and nonseparable optimizations.

Inertial technique, initially conceived by Polyak [17], serves as an acceleration strategy that takes into account the dynamics of the optimization process by incorporating information from the last two iterations, thereby mitigating substantial differences between consecutive points. Subsequently, Zavriv et al. [18] expanded the use of the inertial technique to tackle nonconvex optimization problems, marking a significant milestone in broadening the applicability of this methodology. Recently, the inertial technique has seen widespread adoption in conjunction with various optimization algorithms to enhance their performance in solving nonconvex optimization problems. Bot et al. [19] proposed an inertial forward-backward algorithm for the minimization of the sum of two non-convex functions. Attouch et al. [20] introduced an inertial proximal method and a proximal alternating projection method for maximal-monotone problems and minimization problems, respectively. Pock et al. [21] went on to propose a linear Inertial Proximal Alternating Minimization Algorithm (IPAMA) for a diverse range of nonconvex and nonsmooth optimization problems. Building upon these advancements, researchers have successfully integrated the inertial technique with the Alternating Direction Method of Multipliers (ADMM). Hien et al. [22] developed an Inertial Alternating Direction Method of Multipliers (iADMM) specifically designed for a class of nonconvex multiblock optimization problems with nonlinear coupling constraints. Wang et al. [11] also introduced an Inertial Proximal Partially Symmetric ADMM, tailored for nonconvex settings, further highlighting the versatility and efficacy of combining inertial techniques with ADMM in modern optimization methodologies.

Inspired by the previous works [11, 13, 16, 23], in this paper, we construct two new variant linear inertial ADMM algorithms, sequential partial linear inertial ADMM (SPLI-ADMM) and sequential complete linear inertial ADMM (SCLI-ADMM) for problem (1.1).

The novelty of this paper can be summarized as follows:

(I) The proposed algorithms combine the inertial effect with the linearization skill. The former improves the feasibility of the algorithms, while the latter contributes to fast convergence.

(II) Unlike conventional approaches such as those in [13], during the linearization phase, the gradient of the mixed term of the $x_{j}$-sub-problem is calculated as ${\nabla _{{x_{j}}}}g( {\mathbf{x}_{[1,j-1]}^{k+1} ,\mathbf{x}_{[j,k]}^{k} ,{y^{k}}}) $ rather than ${\nabla _{{x_{j}}}}g( {\mathbf{x}_{[1,n]}^{k} ,{y^{k}}} ) $. This distinctive characteristic enables us to linearize the mixed term dynamically based on the progress of the indicator sequence, meaning that each update depends on the current state of the indicators. Consequently, it is referred to as a sequential gradient iteration scheme.

The rest of this paper is organized as follows: In Sect. 2, some necessary preliminaries for further analysis are summarized. Then, we establish the convergence of the two algorithms in Sect. 3. Section 4 shows the validity of the algorithms by some numerical experiments. Finally, some conclusions are drawn in Sect. 5.

2 Preliminaries

In this section, we recall some basic notations and preliminary results, which will be used in this paper. Throughout, ${\mathbb{R}^{n}}$ denotes the n-dimensional Euclidean space, $\mathbb{R} \cup \{ { + \infty } \}$ denotes the extended real number set, and $\mathbb{N}$ denotes the natural number set. The image space of a matrix $Q \in {\mathbb{R} ^{m \times n}}$ is defined as ${\mathop{\mathrm{Im}} } Q: = \{ {Qx:x \in { \mathbb{R}^{n}}} \}$. If matrix $Q \ne 0$, let ${\rho _{\min (Q^{\mathrm{T}}Q)}}$ denote the smallest positive singular value of the matrix ${Q^{\mathrm{T}}Q}$. $\Vert \cdot \Vert $ represents the Euclidean norm. $\operatorname{dom} f: = \{ {x \in {\mathbb{R} ^{n}}:f ( x ) < + \infty } \}$ is the domain of a function $f:{\mathbb{R} ^{n}} \to \mathbb{R} \cup \{ { + \infty } \}$, $\langle {x,y} \rangle = {x^{\mathrm{T}}}y = \sum_{i = 1}^{n} {{x_{i}}{y_{i}}} $.

Definition 2.1

([24])

Let $f:\mathbb{R}^{n}\to \mathbb{R}\bigcup \{+\infty \}$ be a proper lower semicontinuous function.

(I) The Fréchet subdifferential, or regular subdifferential, of f at $x\in {\mathrm{dom}} f $, written $\hat{\partial} f(x) $, is defined as

$$\begin{aligned} \hat{\partial f}(x)= \biggl\{ x^{*}\in \mathbb{R}^{n}:\lim _{y\neq x}\inf_{y \neq x}\frac{f(y)-f(x)-\langle x^{*},y-x\rangle}{ \Vert y-x \Vert }\geq 0 \biggr\} , \end{aligned}$$

when $x\notin \operatorname{dom}f $, we set $\hat{\partial} f( x) = \emptyset $.

(II) The limiting-subdifferential, or simply the subdifferential, of f at $x\in {\mathrm{dom}}f$, written $\partial f(x)$, is defined as

$$\begin{aligned} \partial f(x)= \bigl\{ x^{*}\in \mathbb{R}^{n}:\exists x_{k}\to x, s.t. f(x_{k}) \to f(x),x_{k}^{*} \in \hat{\partial}f(x), x_{k}^{*}\to x^{*} \bigr\} . \end{aligned}$$

(III) A point that satisfies

$$\begin{aligned} 0\in \partial f(x) \end{aligned}$$

is called a critical point or a stationary point of the function f. The set of critical points of f is denoted by crit f.

Proposition 2.1

We collect some basic properties of the subdifferential [24].

(I) $\hat{f}(x) \subseteq \partial f(x) $ for each $x\in \mathbb{R}^{n}$, where the first set is closed convex, and the second set is only closed.

(II) Let $x_{k}^{*}\in \partial f(x_{k})$ and $\lim_{k\to \infty}(x_{k},x_{k}^{*})=(x,x^{*})$, then, $x^{*}\in \partial f(x)$.

(III) If $f: \mathbb{R}^{n}\to \mathbb{R}\bigcup \{ + \infty \} $ is proper lower semicontinuous, and $g:\mathbb{R}^{m}\to \mathbb{R}$ is continuous differentiable, then $\partial (f+g)(x)=\partial f(x)+\nabla g(x)$ for any $x\in \operatorname{dom}f$.

Definition 2.2

If ${\omega ^{*}} = { ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}},{ \lambda ^{*}}} )^{T}}$ such that

$$\begin{aligned} \textstyle\begin{cases} A_{i}^{\mathrm{T}}{\lambda ^{*}} \in \partial {f_{i}} ( {x_{i}^{*}} ) + {\nabla _{{x_{i}}}}g ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}}} ),\quad i = 1,2, \ldots n, \\ {B^{\mathrm{T}}}{\lambda ^{*}} = {\nabla _{y}}g ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}}} ), \\ {A_{1}}x_{1}^{*} + \cdots + {A_{n}}x_{n}^{*}+B{y^{*}} = b, \end{cases}\displaystyle \end{aligned}$$

(2.1)

then ${\omega ^{*}}$ is called a critical point or stationary point of the Lagrangian function $L ( {x_{1}}, \ldots, {x_{n}},y,\lambda )$.

A very important technique to prove the convergence of ADMM for nonconvex optimization problems is the assumption that the Lagrangian function satisfies the Kurdyka-Łojasiewicz property (KŁ property) [19, 25]. For notational simplicity, we use ${\Phi _{\eta }} ( {\eta > 0} )$ to denote the set of concave functions $\varphi : [ 0, \eta ) \to [ 0, \infty ) $ such that

(I) $\varphi ( 0 ) = 0$;

(II) φ is continuously differentiable on $( {0,\eta } )$ and continuous at 0;

(III) $\varphi ' ( s ) > 0$ for all $s \in ( {0,\eta } )$.

The KŁ property can be described as follows.

Definition 2.3

(see [19, 26]) (KŁ property) Let $f:{\mathbb{R}^{n}} \to \mathbb{R} \cup \{ { + \infty } \}$ be a proper lower semicontinuous function. If there exist $\eta \in ( 0 , { + \infty } ]$, a neighborhood U of ${x^{*}}$, and a continuous concave function $\varphi \in {\Phi _{\eta }}$ such that for all $x \in U \cap \{ {x \in {R^{m}}:f ( {{x^{*}}} ) < f ( x ) < f ( {{x^{*}}} ) + \eta } \}$, it holds that

$$\begin{aligned} \varphi ' \bigl( {f ( x ) - f \bigl( {{x^{*}}} \bigr)} \bigr)\operatorname{dist} \bigl( {0,\partial f ( x )} \bigr) \ge 1, \end{aligned}$$

(2.2)

where the distance from x to S is defined by $d(x,S):=\inf \{\|y-x\|:y\in S\}$. Then, f is said to have the KŁ property at ${x^{*}}$.

Lemma 2.1

(see [25]) (Uniformized KŁ property) Suppose that $f:{\mathbb{R}^{n}} \to \mathbb{R} \cup \{ { + \infty } \}$ is a proper lower semicontinuous function, and Ω is a compact set. If $f ( x ) \equiv {f^{*}}$ for all $x \in \Omega $ and satisfies the KŁ property at each point of Ω, then there exist $\varepsilon > 0,\eta > 0$ and $\varphi \in {\Phi _{\eta }}$ such that

$$\begin{aligned} \varphi ' \bigl( {f ( x ) - {f^{*}}} \bigr) \operatorname{dist} \bigl( {0, \partial f ( x )} \bigr) \ge 1, \end{aligned}$$

(2.3)

for all $x \in \{ {x \in {\mathbb{R}^{m}}:\operatorname{dist} ( {x,\Omega } ) < \varepsilon } \} \cap \{ {{f^{*}} < f ( x ) < {f^{*}} + \eta } \}$.

Lemma 2.2

(see [25]) (Descent lemma) Let $h:{\mathbb{R}^{n}} \to \mathbb{R}$ be a continuous differentiable function where gradient ∇h is Lipschitz continuous with the modulus ${l_{h}} > 0$, then for any $x,y \in {\mathbb{R}^{n}}$, we have

$$\begin{aligned} \bigl\vert {h ( y ) - h ( x ) - \bigl\langle { \nabla h ( x ),y - x} \bigr\rangle } \bigr\vert ^{2} \le \frac{{{l_{h}}}}{2}{ \Vert {y - x} \Vert ^{2}}. \end{aligned}$$

(2.4)

Lemma 2.3

(see [27]) Let $Q \in {\mathbb{R}^{m \times n}}$ be a nonzero matrix, and let ${\rho _{\min (Q^{\mathrm{T}}Q)}}$ denote the smallest positive eigenvalue of ${Q^{\mathrm{T}}Q}$. Then, for every $u \in {\mathbb{R}^{n}}$, it holds that

$$\begin{aligned} \sqrt {\rho _{\min (Q^{\mathrm{T}}Q)}} \Vert {P_{Q}u} \Vert \le \Vert {Qu} \Vert , \end{aligned}$$

(2.5)

where ${P_{Q}}$ denotes the Euclidean projection onto ${\mathrm{Im}}(Q)$.

3 Algorithms and their convergence

In this section, we propose two linear inertial ADMM algorithms, sequential partial linear inertial ADMM (SPLI-ADMM), and sequential complete linear inertial ADMM (SCLI-ADMM) and prove their convergence with some suitable conditions. Furthermore, we prove the boundedness of the sequence.

3.1 Two linear inertial algorithms

First, we present Algorithm 1 for (1.1).

In every iteration of the subproblems, our approach utilizes sequential gradient to update the variables. Specifically, for the $(k+1)$th iteration of $x_{i}$ $(i=1,\ldots ,n)$, the mixed term $g(\mathbf{x}_{[1,i-1]}^{k+1} ,x_{i},\mathbf{x}_{[i+1,n]}^{k} ,y^{k})$ is replaced with a linearized approximation that includes an inertial proximal term: $g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k}) + \langle x_{i}-x_{i}^{k}, \nabla g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k}) \rangle + \frac{\tau}{2}\| x_{i}-z_{i}^{k} \|^{2}$. Here, the sequential gradient $\nabla g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k})$ is refreshed for each subproblem, reflecting the most recent variable updates. Note that the y-subproblem remains unlinearized, so we call it sequential partial linear inertial ADMM.

For $x_{j}$-subproblem $(i=1,\ldots ,n)$ and y-subproblem, respectively, we get the following auxiliary functions:

$$\begin{aligned} &\begin{aligned} \hat{f}_{j}^{k}(x_{j})={}&{f_{j}} ( {{x_{j}}} ) + \bigl\langle x_{j}-x_{j}^{k}, \nabla _{x_{j}} g \bigl(\mathbf{x}_{[1,j-1]}^{k+1} , \mathbf{x}_{[j,n]}^{k} ,y^{k} \bigr) \bigr\rangle \\ &{}+ \frac{\beta }{2}{ \biggl\Vert {\mathbf{Ax}_{[1,j-1]}^{k+1} + {A_{j}}x_{j} +\mathbf{Ax}_{[j+1,n]}^{k} + B{y^{k}} - b - \frac{{{\lambda ^{k}}}}{\beta }} \biggr\Vert ^{2}} + \frac{{{\tau }}}{2}{ \bigl\Vert {{x_{j}} - z_{j}^{k}} \bigr\Vert ^{2}}, \end{aligned} \end{aligned}$$

(3.1)

$$\begin{aligned} &\begin{aligned} \hat{h}^{k}(y)=g \bigl( \mathbf{x}_{[1,n]}^{k+1} ,y \bigr) + \frac{\beta }{2} \biggl\Vert \mathbf{Ax}_{[1,n]}^{k+1} + By - b - \frac{\lambda ^{k}}{\beta } \biggr\Vert ^{2} + \frac{{{\tau }}}{2}{ \bigl\Vert {y - y^{k}} \bigr\Vert ^{2}}, \end{aligned} \end{aligned}$$

(3.2)

where

$$\begin{aligned} \textstyle\begin{cases} z_{1}^{k} = x_{1}^{k} + {\theta _{k}} ( {x_{1}^{k-1} - x_{1}^{k}} ), \\ z_{2}^{k} = x_{2}^{k} + {\theta _{k}} ( {x_{2}^{k-1} - x_{2}^{k}} ), \\ \vdots \\ z_{n}^{k} = x_{n}^{k} + {\theta _{k}} ( {x_{n}^{k-1} - x_{n}^{k}} ), \end{cases}\displaystyle \end{aligned}$$

(3.3)

and $\theta _{k}\in [0,\frac{1}{2})$. Utilizing the auxiliary functions above, the update rules are summarized in Algorithm 1 as follows:

Remark 1

(I) The auxiliary functions defined in (3.1) own the inertial term $\frac{\tau}{2}\|x_{i}-z_{i}^{k}\|^{2}$, $i=1,2,\ldots ,n $, respectively. The inertial schemes update the new iteration by employing the two previous iterations. By adding the inertial term to $x_{i} $ subproblems, the iteration trends to the direction $x_{i}^{k}-x_{i}^{k-1}$.

(II) The purpose of linearizing the mixed term in $x_{i}$-subproblem is to use the properties of differentiable blocks and simplify the calculation of each iteration.

(III) The initial point $\mathbf{x}_{[1,n]}^{-1} =\mathbf{x}_{[1,n]}^{0} = 0, y^{-1}=y^{0}=0$ was designed for demonstrating the boundedness of the sequence $\{\omega ^{k}\}$ generated by the algorithm.

The update rules of Algorithm 2 can be written as follows:

Algorithm 2 is obtained by further linearization on the basis of Algorithm 1. The $x_{i}$-subproblems $(i=1,\ldots ,n)$ are same to that of Algorithm 1, the iterative scheme can be written as (3.4). During the $(k+1)$th iteration for updating y, we replace the function in $g(\mathbf{x}_{[1,n]}^{k+1} ,y)$ with a linearized approximation plus a regularization term $g_{y}(\mathbf{x}_{[1,n]}^{k+1},y^{k}) + \langle y-y^{k}, \nabla g_{y}(\mathbf{x}_{[1,n]}^{k+1},y^{k}) \rangle + \frac{\tau}{2}\|y-y^{k} \|^{2}$. In Algorithm 2, all the subproblems were linearized and sequential updated, hence we call it the Sequential Complete Linear Inertial ADMM.

The auxiliary function of y-subproblem is as follows

$$\begin{aligned} \begin{aligned} \bar{{h}}^{k}(y)= \bigl\langle y-y^{k}, \nabla _{y} g \bigl( \mathbf{x}_{[1,n]}^{k+1},y^{k} \bigr) \bigr\rangle + \frac{\beta }{2}{ \biggl\Vert {\mathbf{Ax}_{[1,n]}^{k+1} + By - b - \frac{{{\lambda ^{k}}}}{\beta }} \biggr\Vert ^{2}} + \frac{{{\tau }}}{2}{ \bigl\Vert {y - y^{k}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

(3.8)

3.2 A descent inequality

A crucial element in establishing the convergence of these algorithms is to verify the descent property of the regularized augmented Lagrangian function sequence. To facilitate our analysis, the following notations are introduced throughout this paper. For $k\ge 1$,

$$\begin{aligned} \begin{aligned}&\Delta x_{i}^{k+1} = x_{i}^{k+1}-x_{i}^{k}, \qquad\Delta y^{k+1}=y^{k+1}-y^{k},\qquad \Delta \lambda ^{k+1}=\lambda ^{k+1}-\lambda ^{k}. \\ & \Delta \mathbf{x}_{[i,j]}^{k+1} = \bigl(\Delta x_{i}^{k+1},\ldots , \Delta x_{j}^{k+1} \bigr), \qquad \theta \bigl\Vert \Delta \mathbf{x}_{[i,j]}^{k+1} \bigr\Vert =\sum_{s=i}^{j}\theta \bigl\Vert \Delta x_{s}^{k+1} \bigr\Vert . \end{aligned} \end{aligned}$$

The convergence analysis relies on the following assumptions:

Assumption A

(I) g is $l_{g}$-Lipschitz differentiable, and g is bounded from below. ∇g is $l_{g}$-Lipschitz continuous, i.e.,$\Vert { \nabla g(u) - \nabla g(v)} \Vert \le {l_{g}} \Vert {u - v} \Vert $ for all $u,v \in {\mathbb{R} ^{p_{1}}}\times{\mathbb{R} ^{p_{2}}}\times \cdots \times{\mathbb{R} ^{p_{n}}}\times{\mathbb{R} ^{q}}$;

(II) $f_{i}$, $i=1,\ldots ,n$ are proper lower semicontinuous, and $f_{i} $ are bounded from below;

(III) The linear operator B is surjective, i.e., $B\neq 0$ and $\{b\}\bigcup \{\bigcup_{i=1}^{n} \mathop{\mathrm{Im}}A_{i} \} \subset \mathop{\mathrm{Im}}B $;

(IV) For Algorithm 1 and Algorithm 2, $\theta _{k} \in [0,\frac{1}{2} )$, $τ > 0$ and β is large enough such that $\tau > \frac{2+l_{g}}{1-2\theta _{k}}$, $\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{6 (\tau ^{2}+l_{g}^{2} )}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \} $;

(V) Let $X:= {\mathbb{R}}^{p_{1}}\times \cdots \times{\mathbb{R}}^{p_{n}} \times{\mathbb{R}}^{q}\times{\mathbb{R}}^{m}$. The set $\{\omega \in X:L_{\beta}(\omega )\leq L_{\beta}({\omega}^{0}) \}$ is bounded.

For showing the descent property, the following lemmas are necessary.

Lemma 3.1

For Algorithm 1, for each $k \in { N}$, we have

$$\begin{aligned} \begin{aligned} { \bigl\Vert {\Delta \lambda ^{k + 1}} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

(3.9)

For Algorithm 2, for each $k \in { N}$, we have

$$\begin{aligned} \begin{aligned} { \bigl\Vert {\Delta \lambda ^{k + 1}} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

(3.10)

Proof

Using Assumption A(III) and Lemma 2.3, we have

$$\begin{aligned} \bigl\Vert {{\Delta \lambda ^{k + 1}}} \bigr\Vert \le \frac{1}{{\sqrt {\rho _{\min (B^{\mathrm{T}}B)}} }} \bigl\Vert {B^{\mathrm{T}}} { \Delta \lambda ^{k + 1}} \bigr\Vert . \end{aligned}$$

(3.11)

For Algorithm 1, the optimal condition of y-subproblem in (3.2) yields

$$\begin{aligned} \begin{aligned} 0 = {\nabla _{y}}g \bigl( \mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}} \bigr) - {B^{\mathrm{T}}} {\lambda ^{k}} + \beta {B^{\mathrm{T}}} \bigl( \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k + 1}} - b \bigr) + {\tau } \bigl({\Delta y^{k + 1}} \bigr) . \end{aligned} \end{aligned}$$

Since ${\lambda ^{k + 1}} = {\lambda ^{k}} - \beta ( {\mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k + 1}} - b}) $, we have

$$\begin{aligned} \begin{aligned} {B^{\mathrm{T}}} {\lambda ^{k + 1}} = {\nabla _{y}}g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}} \bigr)+\tau \bigl(\Delta y^{k+1} \bigr) . \end{aligned} \end{aligned}$$

(3.12)

Let ${u^{k}}=(\mathbf{x}_{[1,n]}^{k},{y^{k}})$. Using Assumption A (I) and (3.12), we have

$$\begin{aligned} \begin{aligned} &{ \bigl\Vert {{B^{\mathrm{T}}} { \lambda ^{k + 1}} - {B^{\mathrm{T}}} {\lambda ^{k}}} \bigr\Vert ^{2}} \\ &\quad={ \bigl\Vert {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) + \tau \Delta y^{k+1} - \tau \Delta y^{k} \bigr\Vert ^{2}} \\ &\quad= \bigl\Vert {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) \bigr\Vert ^{2} + \bigl\Vert \tau \Delta y^{k+1} \bigr\Vert ^{2} + \bigl\Vert \tau \Delta y^{k} \bigr\Vert ^{2} - 2 \bigl\langle \tau \Delta y^{k+1} , \tau \Delta y^{k} \bigr\rangle \\ &\qquad{} - 2 \bigl\langle {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) , \tau \Delta y^{k} \bigr\rangle + 2 \bigl\langle {\nabla _{y}}g \bigl(u^{k+1} \bigr) - {\nabla _{y}}g \bigl(u^{k} \bigr) , \tau \Delta y^{k+1} \bigr\rangle \\ &\quad\le 3l_{g}^{2}{ \bigl\Vert {\Delta u^{k+1}} \bigr\Vert ^{2}}+3\tau ^{2} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}+3\tau ^{2} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2} \\ &\quad\le 3l_{g}^{2} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} + 3 \bigl(l_{g}^{2}+ \tau ^{2} \bigr) \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} +3\tau ^{2} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

(3.13)

It follows from the above mentioned formula and (3.11) that

$$\begin{aligned} \begin{aligned} { \bigl\Vert \Delta \lambda ^{k + 1} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

For Algorithm 2, similarly, we get

$$\begin{aligned} \begin{aligned} \bigl\Vert \Delta{\lambda ^{k + 1}} \bigr\Vert ^{2} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

The proof is completed. □

To brief the analysis, some notations are given below. Let ${w^{k}} = (\mathbf{x}_{[1,n]}^{k},{y^{k}},{\lambda ^{k}}),{u^{k}}=( \mathbf{x}_{[1,n]}^{k},y^{k})$, ${r_{k}}=\mathbf{Ax}_{[1,n]}^{k} + B{y^{k}} - b $. The following lemma is important to prove the monotonicity of the sequence $\{\hat{L}_{\beta }(\hat{w}^{k+1})\}$ defined as (3.20).

Lemma 3.2

For Algorithm 1 and Algorithm 2, select $\theta _{k} \in [0,\frac{1}{2} )$ and ${\tau},{\beta} $ large enough to assure $\tau > \frac{2+l_{g}}{1-2\theta _{k}} $, $\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{6(\tau ^{2}+l_{g}^{2})}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \} $.

Then, for each $k \in {\mathrm{N}}$, we have

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl( \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}+ { \bigl\Vert \Delta{{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + { \delta _{1}} \bigl( \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2} + { \bigl\Vert \Delta{{y^{k}} } \bigr\Vert ^{2}} \bigr), \end{aligned} \end{aligned}$$

(3.14)

where $\delta _{2} >\delta _{1}>0 $.

Proof

We first give the proof of Algorithm 1.

From (3.1) and (3.4), for $j=1,\ldots ,n$, we have

$$\begin{aligned} \begin{aligned} &{f_{j}} \bigl(x_{j}^{k + 1} \bigr) + \bigl\langle {\Delta x_{j}^{k + 1},{ \nabla _{{x_{j}}}}g \bigl( \mathbf{x}_{[1,j-1]}^{k+1}, \mathbf{x}_{[j,n]}^{k}, y^{k} \bigr)} \bigr\rangle \\ &\qquad{}- \bigl\langle {{\lambda ^{k}},\mathbf{Ax}_{[1,j]}^{k+1} + \mathbf{Ax}_{[j+1,n]}^{k} + B{y^{k}} - b} \bigr\rangle + \frac{\beta }{2}{ \bigl\Vert {\mathbf{Ax}_{[1,j-1]}^{k+1} +\mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b} \bigr\Vert ^{2}} \\ &\quad\le{f_{j}} \bigl(x_{j}^{k} \bigr) - \bigl\langle {{\lambda ^{k}},\mathbf{Ax}_{[1,j-1]}^{k+1} + \mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b} \bigr\rangle \\ &\qquad{}+\frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,j-1]}^{k+1} + \mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{j}^{k} - z_{j}^{k}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{j}^{k + 1} - z_{j}^{k}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

From (3.2) and (3.5), we have

$$\begin{aligned} \begin{aligned} &g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{ \lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr) - \bigl\langle {{\lambda ^{k}}, \mathbf{Ax}_{[1,n]}^{k+1}+ B{y^{k}} - b} \bigr\rangle + \frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k}} - b \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

Adding up the above mentioned formulas from $j=1,\ldots ,n$, we have

$$\begin{aligned} &\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k + 1} \bigr) + g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k} \bigr) + g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k}}} \Vert ^{2}} \\ &\qquad{} - \sum_{i=1}^{n}{ \bigl\langle {\Delta x_{i}^{k + 1} ,{ \nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle } + \frac{{{\tau }}}{{\mathrm{{2}}}}\sum _{i=1}^{n}{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} \\ &\qquad{} - \frac{{{\tau }}}{{\mathrm{{2}}}}\sum_{i=1}^{n}{ \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}} \bigl\Vert {y^{k + 1} - y^{k}} \bigr\Vert ^{2}, \end{aligned}$$

hence

$$\begin{aligned} \begin{aligned} &\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k + 1} \bigr) + g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k} \bigr) +g \bigl(u^{k} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k}}} \Vert ^{2}} \\ &\qquad{}+ \underbrace{g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr)- g \bigl(u^{k} \bigr) - \sum_{i=1}^{n} \bigl\langle {\Delta x_{i}^{k + 1}, {\nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle }_{ \mathcal{A}} \\ & \qquad{}\underbrace{+\frac{{{\tau }}}{{\mathrm{{2}}}}\sum_{i=1}^{n}{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} - \frac{\tau}{2}\sum _{i=1}^{n}{ \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}}}_{ \mathcal{B}} - \frac{\tau }{2} \bigl\Vert {\Delta y^{k + 1}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

One the one hand, from Lemma 2.2, part $\mathcal{A}$ can be written as

$$\begin{aligned} \begin{aligned} &g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr)- g \bigl(u^{k} \bigr) - \sum_{i=1}^{n} \bigl\langle {\Delta x_{i}^{k + 1}, {\nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle \\ &\quad=\sum_{i=1}^{n} \bigl\lbrace g \bigl( \mathbf{x}_{[1,i]}^{k+1}, \mathbf{x}_{[i+1,n]}^{k},{y^{k}} \bigr)\\ &\qquad{}- g \bigl(\mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k},y^{k} \bigr) - \bigl\langle {\Delta x_{i}^{k + 1},{\nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k},y^{k} \bigr)} \bigr\rangle \bigr\rbrace \\ &\quad\le\frac{l_{g}}{2} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

(3.15)

On the other hand, by the definitions of $z_{i}^{k}, i=1,2,\ldots ,n$, we have

$$\begin{aligned} \begin{aligned} &{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} - { \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}} \\ &\quad= \theta _{k}^{2}{ \bigl\Vert {x_{i}^{k-1} - x_{i}^{k}} \bigr\Vert ^{2}} - \bigl\Vert {x_{i}^{k+1} - x_{i}^{k} + {\theta _{k}} \bigl(x_{i}^{k} - x_{i}^{k - 1} \bigr)} \bigr\Vert { ^{2}} \\ &\quad= - { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} - 2{\theta _{k}} \bigl\langle {x_{i}^{k} - x_{i}^{k + 1},x_{i}^{k} - x_{i}^{k - 1}} \bigr\rangle \\ &\quad\le - { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} + {\theta _{k}} { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} + {\theta _{k}} { \bigl\Vert {x_{i}^{k} - x_{i}^{k - 1}} \bigr\Vert ^{2}} \\ &\quad=- (1 - {\theta _{k}}){ \bigl\Vert {x_{i}^{k+1}- x_{i}^{k}} \bigr\Vert ^{2}} + { \theta _{k}} { \bigl\Vert {x_{i}^{k} - x_{i}^{k - 1}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

Thus, it can be inferred from part $\mathcal{B}$ that

$$\begin{aligned} \begin{aligned} {\sum_{i=1}^{n} \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}}-{ \sum_{i=1}^{n} \bigl\Vert {x_{i}^{k+1} - z_{i}^{k}} \bigr\Vert ^{2}} \le - (1 - {\theta _{k}}){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \theta _{k}} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

(3.16)

From Lemma 2.2, (3.15) and (3.16), we obtain

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl( \mathbf{x}_{[1,n]}^{k+1},y^{k+1},\lambda ^{k} \bigr) \le {}& {L_{ \beta }} \bigl({w^{k}} \bigr) + \frac{{{l_{g}}}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{{{\tau (1-{\theta _{k}}) }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} \\ &{}-\frac{\tau}{2} { \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}} + \frac{{\tau \theta _{k} }}{\mathrm{{2}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} . \end{aligned} \end{aligned}$$

(3.17)

Recall that

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) &= {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) + \bigl\langle {\Delta{\lambda ^{k+1}} ,{r_{k + 1}}} \bigr\rangle \\ &= {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) + \frac{1}{\beta} \bigl\langle \Delta{\lambda ^{k+1}}, \Delta{\lambda ^{k+1}} \bigr\rangle \\ &\le {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) +\frac{1}{\beta} \bigl\Vert {\Delta{\lambda ^{k+1}}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

(3.18)

Submitting (3.9) and (3.17) into (3.18), we have

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{{{l_{g}}}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{{{\tau (1-{\theta _{k}}) }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{\tau}{2}{ \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{\theta _{k}} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} \\ &\qquad{}+ \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} { \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}}+ \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}} } { \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}} \\ &\quad={L_{\beta }} \bigl({w^{k}} \bigr) - \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} \\ &\qquad{}- \biggl( \frac{\tau}{2} - \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr) \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert ^{2} + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) + \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( \frac{\tau}{2} - \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Since $\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{3 (\tau ^{2}+l_{g}^{2} )}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \} $, which further implies $\frac{6(l_{g}^{2}+\tau ^{2})}{\beta \rho _{\min (B^{\mathrm{T}}B)}} < 1$ and $\frac{\tau \theta _{k}}{2} > \frac{3(\tau ^{2}+l_{g}^{2})}{\beta \rho _{\min (B^{\mathrm{T}}B)}}$, then have

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr)+ \biggl( { \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( \frac{\tau}{2} - 1 \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2} \\ &\quad\le {L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{\tau \theta _{k}}{2} \bigl\Vert \Delta{y^{k} } \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Let $\delta _{2}={\frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 , \delta _{1}=\frac{\tau}{2}\theta _{k}$. We get

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \bigl\Vert \Delta{{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + {\delta _{1}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} + { \bigl\Vert y^{k} \bigr\Vert ^{2}} \bigr). \end{aligned} \end{aligned}$$

(3.19)

Since , which further implies that ${ \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 > \frac{\tau \theta _{k}}{2} $, we obtain $\delta _{2} >\delta _{1}>0 $. That is, (3.14) holds.

Similarly, for Algorithm 2, we obtain

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) + \biggl( { \frac{\tau (1 - \theta _{k} )}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} \\ &\qquad{}+ \biggl( \frac{\tau}{2} - \frac{l_{g}}{2} - \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr)+ \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert {\Delta y^{k} } \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Since , which further implies and , it follows that

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr)+ \biggl( { \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{\tau \theta _{k}}{2} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Let $\delta _{2}={\frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2}-1 , \delta _{1}=\frac{\tau}{2}\theta _{k}$. We have

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \bigl\Vert {\Delta{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + {\delta _{1}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} + { \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}} \bigr). \end{aligned} \end{aligned}$$

Since , which further implies that ${ \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 > \frac{\tau \theta _{k}}{2} $, then we get $\delta _{2}'>\delta _{1}'>0$. That is, (3.14) holds. The lemma is proved. □

Remark 2

Based on Lemma 3.2, we can define the following function

$$\begin{aligned} {\hat{L}_{\beta }} ( \hat{w} ) = {\hat{L}_{\beta }} ( {u,\lambda ,v} ) = {L_{\beta }} ( {u,\lambda } ) + {\delta _{\mathrm{{1}}}} { \Vert {u - v} \Vert ^{2}}, \end{aligned}$$

(3.20)

where

$$\begin{aligned} \begin{aligned} u= ( \mathbf{x}_{[1,n]},y ), v =( \tilde{ \mathbf{x}}_{[1,n]}, \tilde{y} ), \hat{w} = (u,\lambda ,v) = ( \mathbf{x}_{[1,n]},y , \lambda , \tilde{\mathbf{x}}_{[1,n]}, \tilde{y} ) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} { \Vert {u - v} \Vert ^{2}} = { \Vert \mathbf{x}_{[1,n]} - \tilde{\mathbf{x}}_{[1,n]} \Vert ^{2}} + \Vert y - \tilde{y} \Vert ^{2} . \end{aligned} \end{aligned}$$

Set ${\hat{\omega} ^{k + 1}} = (\mathbf{x}_{[1,n]}^{k+1},y^{k+1},{ \lambda ^{{k + 1}}}, \mathbf{x}_{[1,n]}^{k},y^{k} ), u^{k+1} = (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) $. Thus,

$$\begin{aligned} \begin{aligned} {\hat{L}_{\beta }} \bigl(\hat{ \omega}^{k+1} \bigr)={\hat{L}_{\beta }} \bigl( {{u^{{k + 1}}},{ \lambda ^{{k + 1}}},{u^{k}}} \bigr) = {L_{\beta }} \bigl( u^{k+1},{\lambda ^{{k + 1}}} \bigr) + {\delta _{\mathrm{{1}}}} \bigl({{{ \bigl\Vert \Delta u^{k + 1} \bigr\Vert }^{2}}} \bigr). \end{aligned} \end{aligned}$$

(3.21)

The following lemma implies that the sequence ${\hat{L}_{\beta }} ( {{u^{k}},{\lambda ^{k}},{u^{k-1}}} )$ is decreasing monotonically.

Lemma 3.3

Suppose ${\hat{L}_{\beta }}( {\hat{\omega}^{k+1}} )$ is defined as (3.20). Then, under Assumption A, for Algorithm 1 and Algorithm 2, we have:

$$\begin{aligned} \hat{ L }_{\beta} \bigl(\hat{\omega}^{k+1} \bigr)+ \delta \bigl( \bigl\Vert \Delta u^{k+1} \bigr\Vert ^{2} \bigr) \le {\hat{L}_{\beta }} \bigl(\hat{w}^{k} \bigr). \end{aligned}$$

(3.22)

That is, the sequence $\{ {{{\hat{L}}_{\beta }}(\hat{\omega}^{k+1})}\}$ is decreasing.

Proof

Set $\delta = {\delta _{2}} - {\delta _{1}} > 0$. Then the result follows directly from Lemma 3.2. □

3.3 The cluster points of $\{\omega _{k}\}$ are contained in $critL$

In this subsection, together with the closeness of the limiting subdifferential mentioned above, we prove the subsequential convergence of the sequence $\{\omega ^{k}\}$. The proof of Algorithm 2 is similar to that of Algorithm 1, so we omit the proof of Algorithm 2 here.

Lemma 3.4

Suppose $\lbrace{\omega ^{k}}\rbrace $ is the sequence generated by Algorithm 1. If Assumption A holds, then the following statements are true:

(I) The sequence $\{\omega ^{k}\} $ is bounded. (II) $\hat{L}_{\beta}(\hat{\omega}^{k})$ is bounded from below and convergent, additionally,

$$\begin{aligned} \sum_{k\ge 0} \bigl\Vert \omega ^{k+1}- \omega ^{k} \bigr\Vert ^{2} < +\infty . \end{aligned}$$

(3.23)

(III) The sequences $\hat{L}_{\beta}(\hat{\omega}^{k})$ and ${L}_{\beta}({\omega}^{k})$ have the same limit $\hat{L}_{*}$.

Proof

(I) Because of the decreasing property of $\{\hat{L}_{\beta}(\hat{\omega}^{k})\} $, we get

$$\begin{aligned} \begin{aligned} L_{\beta} \bigl(\omega ^{k} \bigr) \le \hat{L}_{\beta} \bigl(\hat{\omega}^{k} \bigr)\le \hat{L}_{\beta} \bigl(\hat{\omega}^{0} \bigr) = L_{\beta} \bigl(\omega ^{0} \bigr) + \delta \bigl( \bigl\Vert u^{0}-u^{-1} \bigr\Vert ^{2} \bigr)=L_{\beta} \bigl(\omega ^{0} \bigr), \end{aligned} \end{aligned}$$

where $\|u^{0}-u^{-1}\|^{2}$ is due to the Initialization parameters $x_{i}^{0}=x_{i}^{-1}, i=1,\ldots ,n$ and $y^{0}=y^{-1}$ in Algorithm 1. Hence, $\{\omega ^{k}\}\subseteq \{\omega ^{k}\in X:L_{\beta}(\omega )\leq L_{ \beta}({\omega}^{0})\} $. By Assumption A(V), the sequence $\{\omega ^{k}\} $ is bounded.

(II) Since $\lbrace{\omega ^{k}}\rbrace $ is bounded, $\lbrace{\hat{\omega}^{k}}\rbrace $ is also bounded, and it has at least one cluster point. Let $\hat{\omega}^{*}$ be a cluster point of $\lbrace{\hat{\omega}^{k}}\rbrace $, and $\lim_{j\rightarrow +\infty}\hat{\omega}^{k_{j}}={\hat{\omega}^{*}}$. Because of the fact that $f_{i} (i=1,2,\ldots ,n)$ are proper lower semicontinuous, and g is continuously differentiable, then $\hat{L}_{\beta} (\cdot )$ is proper lower semicontinuous. Hence, we have

$$\begin{aligned} \begin{aligned} \lim_{j \to +\infty}\inf \hat{L}_{\beta} \bigl(\hat{\omega}^{k_{j}} \bigr) \ge \hat{L}_{\beta} \bigl(\hat{\omega}^{*} \bigr). \end{aligned} \end{aligned}$$

According to the boundedness of $f_{i}$, g, $\{\omega ^{k}\}_{k\ge 0}$ and the definition of $\hat{L}_{\beta}(\hat{\omega}^{k})$, we have $\hat{L}_{\beta}(\omega ^{k})$ is bounded from below. Thus, $\hat{L}_{\beta}(\hat{\omega}^{k_{j}})$ is also bounded from below. From Lemma 3.3, $\hat{L}_{\beta}(\hat{\omega}^{k})$ is monotonically decreasing, and we obtain that $\hat{L}_{\beta}(\hat{\omega}^{k_{j}})$ is convergent. Since $\hat{L}_{\beta}(\hat{\omega}^{k})$ is monotonically decreasing, $\hat{L}_{\beta}(\hat{\omega}^{k})$ is also convergent and $\hat{L}_{\beta}(\hat{\omega}^{*}) \le \hat{L}_{\beta}(\hat{\omega}^{k})$. It follows from (3.22) that

$$\begin{aligned} \delta \bigl( { {{ \bigl\Vert {\Delta{u^{k+1}} } \bigr\Vert }^{2}}} \bigr) \le {\hat{L}_{\beta }} \bigl(\hat{w}^{k} \bigr)-\hat{ L_{\beta }} \bigl(\hat{w}^{k+1} \bigr). \end{aligned}$$

Summing up the above inequality for $k =0,\ldots ,N$ and letting $N \to \infty $, we have

$$\begin{aligned} \begin{aligned} \delta \sum_{k=1}^{+\infty} \bigl( { {{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{ \bigl\Vert { \Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr) \le {\hat{L}_{\beta }} \bigl( \hat{w}^{0} \bigr)-\hat{ L_{\beta }} \bigl(\hat{w}^{*} \bigr) < +\infty . \end{aligned} \end{aligned}$$

Since $\delta > 0$, it follows that

$$\begin{aligned} \begin{aligned} \sum_{k=1}^{+\infty} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} < +\infty , \qquad\sum_{k=1}^{+\infty} \bigl\Vert {\Delta y^{k + 1}} \bigr\Vert ^{2} < +\infty . \end{aligned} \end{aligned}$$

(3.24)

Consequently, due to (3.9), we have

$$\begin{aligned} \begin{aligned} \sum_{k=1}^{+\infty} \bigl\Vert {\lambda ^{k + 1}} - {\lambda ^{{k}}} \bigr\Vert ^{2} < + \infty . \end{aligned} \end{aligned}$$

(3.25)

Then, .

(III) From (3.24), we have $\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \Vert ^{2} \to 0$ and $\Vert {\Delta y^{k + 1}} \Vert ^{2} \to 0$. Combining with the definition of ${\hat{L}_{\beta }}(\hat{w}^{k})$ in (3.21) yields $\hat{L}_{*} = \lim_{k\to +\infty}\hat{L}_{\beta}(\hat{\omega}^{k}) = \lim_{k\to +\infty}{L}_{\beta}({\omega}^{k}) $. The lemma is proved. □

The following lemma provides upper estimates for the limiting subgradients of $\hat{L}_{\beta}(\cdot )$, which is important for the convergence analysis of the sequence generated by Algorithm 1 and Algorithm 2.

Lemma 3.5

Let $\{ {{\omega ^{k}}} \}$ be a sequence generated by Algorithm 1. Then, there exists $C > 0$ such that

$$\begin{aligned} \begin{aligned} d \bigl( {0,\partial {L_{\beta }} \bigl( {{\omega ^{k + 1}}} \bigr)} \bigr) \le C \Biggl( \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k+1} \bigr\Vert + \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert + \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k} \bigr\Vert + \bigl\Vert \Delta{y^{k }} \bigr\Vert \Biggr). \end{aligned} \end{aligned}$$

(3.26)

Proof

By the definition of the augmented Lagrangian function ${L_{\beta }} ( \cdot )$, we have

$$\begin{aligned} \textstyle\begin{cases} {\partial _{{x_{j}}}}{L_{\beta }}(u^{k+1},{\lambda ^{k + 1}} ) = \partial {f_{j}}( {x_{j}^{{k + 1}}} ) + {\nabla _{{x_{j}}}}g ( \mathbf{x}_{[1,n]}^{k+1},y^{k+1} )- A_{j}^{T}({\lambda ^{k + 1}} - \beta{r^{k + 1}}), \\ {\partial _{y}}{L_{\beta }}( u^{k+1},{\lambda ^{k + 1}} ) = {\nabla _{y}}g (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) - {B^{T}}{\lambda ^{k + 1}} + \beta {B^{T}}{r^{k + 1}}, \\ {\partial _{\lambda }}{L_{\beta }}( {u^{k+1},{\lambda ^{k + 1}}}) = \frac{1}{\beta }({\lambda ^{k }} - {\lambda ^{k+1}}). \end{cases}\displaystyle \end{aligned}$$

(3.27)

From the optimality conditions of (3.1)–(3.2), we have

$$\begin{aligned} \textstyle\begin{cases} - {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,j-1]}^{k+1},\mathbf{x}_{[j,n]}^{k},y^{k} ) + A_{j}^{T}{\lambda ^{k + 1}} - \beta A_{j}^{T} \Delta \mathbf{Ax}_{[j+1,n]}^{k+1} - \beta A_{j}^{T}B({y^{k}} - {y^{k + 1}}) \\ \quad{} - {\tau}(x_{j}^{k+1} - z_{j}^{k}) \in \partial {f_{j}}( {x_{j}^{k+1}} ), \\ {B^{\mathrm{T}}}{\lambda ^{k + 1}} - {\tau}({y^{k + 1}} - y^{k}) = { \nabla _{y}}g( {{u^{k + 1}}} ), \\ {\lambda ^{k + 1}} = {\lambda ^{k}} - \beta ( \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k+1}} - b), \end{cases}\displaystyle \end{aligned}$$

(3.28)

where $\Delta \mathbf{Ax}_{[j+1,n]}^{k+1} =\mathbf{Ax}_{[j+1,n]}^{k+1} - \mathbf{Ax}_{[j+1,n]}^{k} $. Putting (3.28) into (3.27), we have

$$\begin{aligned} { \bigl( {\rho _{1}^{k + 1},\rho _{2}^{k + 1}, \ldots ,\rho _{n}^{k + 1}}, \rho _{n+1}^{k + 1}, \rho _{n+2}^{k + 1} \bigr)^{T}} \in \partial {L_{\beta }} \bigl( {x_{1}^{k + 1},x_{2}^{k + 1}, \ldots ,x_{n}^{k+1},{y^{k + 1}},{ \lambda ^{k + 1}}} \bigr), \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \textstyle\begin{cases} \rho _{j}^{{k + 1}} = {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) - {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,j-1]}^{k+1}, \mathbf{x}_{[j,n]}^{k},y^{k} ) + A_{j}^{T}({\lambda ^{k}} - {\lambda ^{k + 1}}) \\ \phantom{\rho _{j}^{{k + 1}} =}{} + \beta A_{j}^{T}\Delta \mathbf{Ax}_{[j+1,n]}^{k+1} + \beta A_{j}^{T}B({y^{k}} - {y^{k + 1}}) - {\tau}(x_{j}^{k+1} - z_{j}^{k}), (j=1,\ldots ,n), \\ \rho _{n+1}^{k+1} = \beta {B^{\mathrm{T}}}({\lambda ^{k}} - { \lambda ^{k + 1}}) - {\tau}({y^{k + 1}} - y^{\mathrm{{k}}}), \\ \rho _{n+2}^{k+1} = \frac{1}{\beta }({\lambda ^{k }} - { \lambda ^{k+1}}). \end{cases}\displaystyle \end{aligned} \end{aligned}$$

(3.29)

Since ∇g is Lipschitz continuous on bounded subsets and $\{ \omega ^{k} \}$ is bounded, by (III) of Assumption A, combining (3.14), there exists $C > 0$ such that

$$\begin{aligned} \begin{aligned} d \bigl(0,\partial L_{\beta} \bigl({\omega ^{k + 1}} \bigr) \bigr) \le C \Biggl( \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k+1} \bigr\Vert + \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert + \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k} \bigr\Vert + \bigl\Vert \Delta{y^{k }} \bigr\Vert \Biggr). \end{aligned} \end{aligned}$$

Similarly, we can derive the same conclusion for Algorithm 2. We omit the proof here. □

Theorem 3.1

Denote the set of the cluster points of the sequence $\{ {{\omega ^{k}}} \}$ and $\{ {{{\hat{\omega}}^{k}}}\} $ by Ω and Ω̂, respectively. We have that:

(I) If $\omega ^{*}$ is a cluster of $\{\omega ^{k}\}$, then it has a convergent subsequence $\{\omega ^{k_{j}}\}_{j\ge 0}$ such that $\lim_{j\to +\infty}w^{k_{j}} = w^{*} $, then

$$\begin{aligned} \begin{aligned} \lim_{j\to \infty} L_{\beta} \bigl( \omega ^{k_{j}} \bigr) = L_{\beta} \bigl(\omega ^{*} \bigr). \end{aligned} \end{aligned}$$

(II) $\Omega \subseteq critL_{\beta}$.

(III) $\lim_{k\to +\infty}d(\omega ^{k},\Omega )$.

(IV) $\{ {{\omega ^{k}}} \}$ is non-empty compact and connected sets.

Proof

(I) Since $x_{i}^{k_{j}+1}$ is the minimizer of $x_{i}$-subproblem, we have

$$\begin{aligned} &{f_{i}} \bigl(x_{i}^{k_{j} + 1} \bigr) + \bigl\langle {x_{i}^{k_{j} + 1} - x_{i}^{k_{j}},{ \nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k_{j}+1}, \mathbf{x}_{[i,n]}^{k_{j}},y^{k_{j}} \bigr)} \bigr\rangle - \bigl\langle {{\lambda ^{k_{j}}}, \mathbf{Ax}_{[1,i]}^{k_{j}+1} +\mathbf{Ax}_{[i+1,n]}^{k_{j}}+ B{y^{k_{j}}} - b} \bigr\rangle \\ &\qquad{}+ \frac{\beta }{2}{ \bigl\Vert {\mathbf{Ax}_{[1,i]}^{k_{j}+1} + \mathbf{Ax}_{[i+1,n]}^{k_{j}}+ B{y^{k_{j}}} - b} \bigr\Vert ^{2}} \\ &\quad\le {f_{i}} \bigl(x_{i}^{*} \bigr) + \bigl\langle {x_{i}^{*} - x_{i}^{k_{j}},{ \nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k_{j}+1}, \mathbf{x}_{[i,n]}^{k_{j}},y^{k_{j}} \bigr)} \bigr\rangle \\ &\qquad{} - \bigl\langle {{\lambda ^{k_{j}}}, \mathbf{Ax}_{[1,i-1]}^{k_{j}+1} +A_{i}x_{i}^{*} +\mathbf{Ax}_{[i+1,n]}^{k_{j}} + B{y^{k_{j}}} - b} \bigr\rangle \\ &\qquad{}+\frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,i-1]}^{k_{j}+1} +A_{i}x_{i}^{*} +\mathbf{Ax}_{[i+1,n]}^{k_{j}} + B{y^{k_{j}}} - b \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{i}^{*} - z_{i}^{k_{j}}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{i}^{{k_{j}} + 1} - z_{i}^{k_{j}}} \bigr\Vert ^{2}}. \end{aligned}$$

Combing the inequality above with $\lim_{j\to \infty}{\omega}^{k_{j}+1}=\omega ^{*}$, we have

$$\begin{aligned} \begin{aligned} \limsup_{j\to \infty}f_{i} \bigl(x_{i}^{k_{j}+1} \bigr)\le f_{i} \bigl({x^{*}} \bigr). \end{aligned} \end{aligned}$$

Since $f_{i} ( i=1,\ldots ,n)$ is lower semicontinous, $f_{i}(x_{i}^{*})\le \lim \inf_{j\to \infty}f_{i}(x_{i}^{k_{j}+1})$. It follows that

$$\begin{aligned} \lim_{j\to \infty}f_{i} \bigl(x_{i}^{k_{j}+1} \bigr)=f_{i} \bigl({x^{*}} \bigr). \end{aligned}$$

Since g is continuous, we further obtain

$$\begin{aligned} \begin{aligned} &\lim_{j\to +\infty} L_{\beta} \bigl( \omega ^{k_{j}} \bigr) \\ &\quad=\lim_{j\to +\infty} \Biggl( \sum_{i=1}^{n}{f_{i}} \bigl( {{x_{i}}}^{k_{j}} \bigr) +g \bigl( \mathbf{x}_{[1,n]}^{k_{j}},y^{k_{j}} \bigr) - \bigl\langle {\lambda ^{k_{j}} , \mathbf{Ax}_{[1,n]}^{k_{j}} + By^{k_{j}} - b} \bigr\rangle \\ &\qquad{} +\frac{\beta}{2} \bigl\Vert \mathbf{Ax}_{[1,n]}^{k_{j}} + By^{k_{j}} - b \bigr\Vert ^{2} \Biggr) \\ &\quad= \sum_{i=1}^{n}{f_{i}} \bigl( {{x_{i}}}^{*} \bigr) +g \bigl( \mathbf{x}_{[1,n]}^{*},y^{*} \bigr) - \bigl\langle { \lambda ^{*} , \mathbf{Ax}_{[1,n]}^{*} + By^{*} - b} \bigr\rangle + \frac{\beta}{2} \bigl\Vert \mathbf{Ax}_{[1,n]}^{*} + By^{*} - b \bigr\Vert ^{2} \\ &\quad=L_{\beta} \bigl(\omega ^{*} \bigr). \end{aligned} \end{aligned}$$

(II) From Lemma 3.4, we have that $x_{i}^{k+1} - \i ^{k} \to 0, y^{k+1} - y^{k} \to 0 $ and $\lambda ^{k+1} - \lambda ^{k} \to 0$. Thus, according to Lemma 3.5, it follows that $\partial L_{(}\omega ^{k_{j}}) \to 0$ as $j\to \infty $, while $\omega ^{k_{j}} \to \omega ^{*}$ and $L_{\beta}(\omega ^{k_{j}}) \to L_{\beta}(\omega ^{*}) $ as $j\to \infty $. Because of the closeness of $\partial f_{i}$, the continuity of ∇g and the relation above, we take limit $k=k_{j}\to \infty $ in (3.28), and then we have

$$\begin{aligned} \textstyle\begin{cases} - {\nabla _{{x_{j}}}}g( \mathbf{x}_{[1,n]}^{{{*}}},{y^{*}}) + A_{j}^{ \mathrm{T}}{\lambda ^{*}} \in \partial {f_{j}} ( {x_{j}^{*}} ), \quad j = 1,\ldots ,n, \\ {\nabla _{y}}g( {\mathbf{x}_{[1,n]}^{{{*}}},{y^{*}}}) = {B^{\mathrm{T}}}{ \lambda ^{*}}, \\ \mathbf{Ax}_{[1,n]}^{*} + B{y^{*}} - b = 0, \end{cases}\displaystyle \end{aligned}$$

which implies that $\omega ^{*}$ is a critial point of $L_{\beta} (\cdot )$. According to (3.23), $\{\omega ^{k}\}$ is convergent. Thus, $\omega ^{*}$ is a cluster point of $\{\omega ^{k}\}$, i.e., $\Omega \subseteq critL_{\beta}$.

(III), (IV) The proof follows a similar approach to that of [Theorems 5(ii) and (iii) in Bolte et al. [19]], while incorporating the insights from Remark 5 within the same reference. This remark establishes that the properties detailed in (III) and (IV) are inherent to sequences satisfying the convergence condition $w^{k+1}-w^{k} \to 0$ as $k\to +\infty $. Such generic nature is indeed applicable in our context, as demonstrated by (3.23). □

3.4 Global convergence under the Kurdyka–Łojasiewicz property

In this subsection, we prove the global convergence of $\{(\mathbf{x}_{[1,n]} , y^{k}, \lambda ^{k})\}$ generated by Algorithm 1 and Algorithm 2 with the help of the Kurdyka–Łojasiewicz property. Since the proofs of two algorithms are identical, in this subsection, we only prove the global convergence of Algorithm 1.

Theorem 3.2

(Global convergence)

Suppose that Assumption A holds, and $\hat{L} ( {\hat{\omega}} )$ satisfies the KŁ property at each point of Ω̂, then

(I) $\sum_{k = 1}^{\infty }{\| {{\omega ^{k}} - {\omega ^{k - 1}}}\|} < \infty $.

(II) $\{ {{\omega ^{k}}} \}$ converges to a critical point of $L ( \cdot )$.

Proof

From Theorem 3.1, we have $\mathop {\lim }_{k \to + \infty } \hat{L}( {{{\hat{\omega}}^{k}}} ) = \hat{L} ( {{{\hat{\omega}}^{*}}} )$ for all ${\hat{\omega}^{*}} \in \hat{\Omega}$. We consider two cases.

(i) If there exists an integer ${k_{0}}$ such that ${\hat{L}_{\beta }}( {{{\hat{\omega}}^{{k_{0}}}}}) = {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )$. From Lemma 3.3, for all $k > {k_{0}}$, we have

$$\begin{aligned} \begin{aligned} \delta \bigl( \Vert \Delta \mathbf{x}_{[1,n]} \Vert ^{2} + \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} \bigr) \le {{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) \le {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{{k_{0}}}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) = 0 . \end{aligned} \end{aligned}$$

(3.30)

Thus, for any $k > {k_{0}}$, we have $x_{i}^{k + 1} = x_{i}^{k}, i=1,2,\ldots ,n, {y^{k + 1}} = {y^{k}}$. Hence, for any $k > {k_{0}} + 1$, one has ${\hat{\omega}^{k + 1}} = {\hat{\omega}^{k}}$, and the assertion holds.

(ii) Since $\{ \hat{L}_{\beta}(\hat{\omega}^{k})\}$ is nonincreasing, it holds that ${\hat{L}_{\beta }}( {{{\hat{\omega}}^{k}}} ) > {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )$ for all $k >1$. Since $\mathop {\lim }_{{k} \to + \infty }d( {{{\hat{\omega}}^{k}}, \hat{\Omega}} )= 0$, for any given $\varepsilon > 0$, there exists ${k_{1}} > 0$, such that for any $k > {k_{1}}$, $d( {{{\hat{\omega}}^{k}},\hat{\Omega}}) < \varepsilon $. Since $\mathop {\lim }_{{k_{j}} \to + \infty } {\hat{L}_{\beta }}( {{{ \hat{\omega}}^{k}}} ) = {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )$, for any given $\eta > 0$, there exists ${k_{2}} > 0$,${ \hat{L}_{\beta }}( {{{\hat{\omega}}^{k}}}) < {\hat{L}_{\beta }} ( {{{ \hat{\omega}}^{*}}} ) + \eta $, for all $k > {k_{2}}$. Consequently, when $k > \tilde{k}: = \max \{ {{k_{1}},{k_{2}}} \}$,

$$\begin{aligned} d \bigl( {{{\hat{\omega}}^{k}},\hat{\Omega}} \bigr) < \varepsilon , { \hat{L}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) < {\hat{L}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) < {\hat{L}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) + \eta . \end{aligned}$$

(3.31)

Since ${\{ {{{\hat{\omega}}^{k}}} \}} $ is non-empty compact set, and ${\hat{L}_{\beta }} ( \cdot )$ is constant on Ω̂, applying Lemma 2.1, we have

$$\begin{aligned} \varphi ' \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{ \hat{L}}_{\beta }} {{{\hat{\omega}}^{*}}} } \bigr) d \bigl( {0, \partial {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr)} \bigr) \ge 1, \quad\forall k > \tilde{k}. \end{aligned}$$

(3.32)

Let ${a_{k}}:= \sum_{i=1}^{n}\|\Delta x_{i}^{k} \| + \|\Delta y^{k} \|$. $\forall k > \tilde{k}$. From Lemma (3.5), one has

$$\begin{aligned} \frac{1}{{\varphi '( {{{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k}}}) - {{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{*}}})} )}} \le d \bigl( {0,\partial {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr)} \bigr) \le C_{2} ( {a_{k}}+ {a_{k+1}} ). \end{aligned}$$

(3.33)

From the concavity of φ, we have

$$\begin{aligned} \begin{aligned} &\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \\ &\quad\ge \varphi ' \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k + 1}}} \bigr)} \bigr) \\ &\quad\ge \frac{{ {{{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} ) - {{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k + 1}}} )} }}{{d ( {0,\partial {{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} )} )}} \ge \frac{{{{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k}}} ) - {{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k + 1}}})}}{{C( {a_{k}}+ {a_{k+1}} )}}. \end{aligned} \end{aligned}$$

(3.34)

From Lemma 3.3, we have

$$\begin{aligned} & {{\delta \bigl( { {{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{ \bigl\Vert {\Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr)}} \\ &\quad\le \bigl(\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigr){{C({a_{k}}+ {a_{k+1}} )}}. \end{aligned}$$

From the inequality $\sum_{i=1}^{n}a_{i}\le \sqrt{n\sum_{i=1}^{n}a_{i}^{2}}$ and $\sqrt{ab}\le a+\frac{1}{4}b$, we obtain

$$\begin{aligned} \begin{aligned} a_{k+1} \le{}& \bigl( { {{(n+1) \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{(n+1) \bigl\Vert {\Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr)^{\frac{1}{2}} \\ \le{} &\sqrt{\frac{C(n+1)}{\delta} \bigl(\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigr){{({a_{k}}+ {a_{k+1}} )}}} \\ \le{} & \underbrace{\sqrt{\frac{C(n+1)}{\delta}} \bigl( {\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k+1}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr)} \bigr)}_{a} + \frac{1}{4}{\underbrace{({a_{k}}+ {a_{k+1}} )}_{b}}. \end{aligned} \end{aligned}$$

Summing up the above inequality from $k=k'+2,\ldots ,M$ yields

$$\begin{aligned} \begin{aligned} \sum_{k=k'+2}^{M} {a_{k+1}} \le {}& \sqrt{\frac{C(n+1)}{\delta}} \bigl( { \varphi \bigl( {{{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k'+2}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{M}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr)} \bigr) \\ &{}+ \frac{1}{4}\sum_{k=k'+1}^{M}{{( {a_{k}}+ {a_{k+1}} )}}. \end{aligned} \end{aligned}$$

Letting $M\to \infty $, we get

$$\begin{aligned} \sum_{k=k'+2}^{\infty} {a_{k+1}} \le 2 \sqrt{\frac{C(n+1)}{\delta}} \bigl( \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k'+2}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{ \hat{\omega}}^{*}}} \bigr)} \bigr) \bigr) - \frac{1}{2} {a_{k'+1}}. \end{aligned}$$

Since $\delta ,C>0$ and ${a_{k'+1}}$ is a constant, $\sum_{k=k'+2}^{\infty} {a_{k+1}} < \infty $. Therefore, $\sum_{k=1}^{\infty} \| \omega ^{k+1}-\omega ^{k}\| < \infty $. (I) is proved.

(II) $\{\omega ^{k}\}$ is a Cauchy sequence, and thus it is convergent. Combining (I) with Theorem 3.1, we obtain that $\{ {{\omega ^{k}}} \}$ converges to a critical point of $L_{\beta} ( \cdot )$. □

4 Numerical experiments

This section presents the numerical experiment outcomes of applying Algorithm 1 and Algorithm 2 to $l_{\frac{1}{2}}$-regularization problem and matrix decomposition problem. All experimental computations were executed using Matlab 2020b running on a Windows 11 system-equipped laptop with an AMD Ryzen 5 3550H CPU operating at 3.5 GHz and backed by 16 GB of RAM.

4.1 $l_{\frac{1}{2}}$-regularization problem

In compressed sensing, we consider the following optimization problem

$$\begin{aligned} \begin{aligned} \min_{x} \Vert Mx-b \Vert ^{2} + \varphi \Vert x \Vert _{0}, \end{aligned} \end{aligned}$$

(4.1)

where $M\in \mathbb{R}^{m\times n}$ is the measuring matrix, $b\in \mathbb{R}^{n}$ is the observation vector, φ is the regular parameter. $\| x\|_{0}$ denotes the number of nonzero components of x. However, the problem (4.1) is NP-hard, some scholars relax $l_{0}$ norm to $l_{\frac{1}{2}}$ norm in practical applications [28], then the problem is exported to the following nonconvex problem:

$$\begin{aligned} \begin{aligned} &\min \varphi \Vert x \Vert _{(1/2)}^{(1/2)}+\frac{1}{2}{{ \Vert y \Vert }^{2}} \\ &\quad\text{s.t}\text{. }Mx-y=b, \end{aligned} \end{aligned}$$

(4.2)

where $\|x\|_{\frac{1}{2}}=(\sum_{i=1}^{n} \| x_{i}\| ^{\frac{1}{2}})^{2}$.

Based on (4.2), we construct the following problem:

$$\begin{aligned} \begin{aligned} &\min_{x_{1},x_{2},y} c \Vert x_{1} \Vert _{(1/2)}^{(1/2)}+ \frac{1}{2}{{ \Vert x_{2} \Vert }^{2}}+ \frac{1}{2}{{ \Vert {{B}_{1}}x_{1}+{{B}_{2}}x_{2}+y \Vert }^{2}} \\ &\quad\text{s.t.} A_{1}x_{1}+A_{2}x_{2}+y=b. \end{aligned} \end{aligned}$$

(4.3)

To verify the validity of Algorithm 1 and Algorithm 2, we test them and compare them with LADMM.^{Footnote 1}

Applying Algorithm 1 to problem (4.3) yields

$$\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H \biggl(\frac{1}{\mu _{1}} \biggl[ \tau{{z}_{1}^{k}}-{{B}_{1}}^{T} \bigl({{B}_{1}} {{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}} \bigr)\\ \phantom{{{x_{1}}^{k+1}}=}{}- \beta {{A_{1}}^{T}} \biggl(A_{2}{{x_{2}}^{k}} +y^{k} -b- \frac{{{\lambda }^{k}}}{\beta } \biggr) \biggr] , \frac{2c}{{{\mu }_{1}}} \biggr), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} \biggl[\tau z_{2}^{k}-B_{2}^{T} \bigl(B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} \bigr) -\beta A_{2}^{T} \biggl( A_{1}x_{1}^{k+1}+y^{k}-b- \frac{\lambda _{k}}{\beta} \biggr) \biggr] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{3}}} \biggl[\tau{{y}^{k}}- \bigl({{B}_{1}} {{x}^{k+1}}+{{B}_{2}} {{x_{2}}^{k+1}} \bigr)- \beta \biggl( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta } \biggr) \biggr], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta \bigl(A{{x}^{k+1}}+B{{y}^{k+1}}-b \bigr), \end{cases}\displaystyle \end{aligned}$$

where $\mu _{1}={\tau +\beta \rho _{{\max}{ ( A_{1}^{\mathrm{{T}}}A_{1} )}}}, \mu _{2}=1+\tau +\beta \rho _{{\max} ( A_{2}^{\mathrm{{T}}}A_{2} )},\mu _{3}=1+\tau +\beta $, and $H(\cdot ,\cdot )$ is the half shrinkage operator [29] defined as $H ( x,\alpha ) = \{ h_{\alpha}^{1}, h_{\alpha}^{2},\ldots h_{ \alpha}^{n} \} $ with

x_{1} (i) = {\begin{matrix} \frac{2 x_{i}}{3} (1 + cos (\frac{2}{3} (π - ϕ (| h_{α}^{i} |)))) & | h_{α}^{i} | > \frac{\sqrt[3]{54}}{4} α^{2 / 3}; \\ 0 & otherwise; \end{matrix}

(4.4)

where

$$\begin{aligned} \begin{aligned} \phi \bigl( \bigl\lvert h_{\alpha}^{i} \bigr\rvert \bigr)=\arccos \biggl( \frac{\alpha}{8} \biggl( \frac{\lvert h_{\alpha}^{i} \rvert }{3} \biggr)^{-(3/2)} \biggr) . \end{aligned} \end{aligned}$$

Applying Algorithm 2 to problem (4.3) yields

$$\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H (\frac{1}{\mu _{1}} [ \tau{{z}_{1}^{k}}-{{B}_{1}}^{T}({{B}_{1}}{{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}})- \beta {{A_{1}}^{T}}(A_{2}{{x_{2}}^{k}}-b- \frac{{{\lambda }^{k}}}{\beta }) ] , \frac{2c}{{{\mu }_{1}}} ), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} [\tau z_{2}^{k}-B_{2}^{T} (B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} ) -\beta A_{2}^{T} ( A_{1}x_{1}^{k+1}+By^{k}-b-\frac{\lambda _{k}}{\beta} ) ] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{4}}}[\tau{{y}^{k}}-({{B}_{1}}{{x}^{k+1}}+{{B}_{2}}{{x_{2}}^{k+1}}+y^{k})- \beta ( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta } ) ], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta (A{{x}^{k+1}}+B{{y}^{k+1}}-b), \end{cases}\displaystyle \end{aligned}$$

where $\mu _{4}=\tau +\beta $. Applying LADMM to problem (4.3), we obtain

$$\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H (\frac{1}{\mu _{1}} [ \tau{{x}_{1}^{k}}-{{B}_{1}}^{T}({{B}_{1}}{{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}})- \beta {{A_{1}}^{T}}(A_{2}{{x_{2}}^{k}}-b- \frac{{{\lambda }^{k}}}{\beta }) ] , \frac{2c}{{{\mu }_{1}}} ), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} [\tau x_{2}^{k}-B_{2}^{T} (B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} ) -\beta A_{2}^{T}( A_{1}x_{1}^{k+1}+By^{k}-b- \frac{\lambda _{k}}{\beta}) ] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{3}}}[\tau{{y}^{k}}-({{B}_{1}}{{x}^{k+1}}+{{B}_{2}}{{x_{2}}^{k+1}})- \beta ( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta }) ], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta (A{{x}^{k+1}}+B{{y}^{k+1}}-b). \end{cases}\displaystyle \end{aligned}$$

In experiment, we configure the parameter as follows: the dimensions are set to $m=5000, n=1000$, the regularization parameter is chosen as $\beta =1000$. $b=0$, $c=1$, and the inertial parameter is fixed at $\theta =0.15$. The initial points are selected as $x_{1}^{-1}= x_{1}^{0}=0$, $x_{2}^{-1}= x_{2}^{0}=0$, $y^{0}=0$, and $\lambda ^{0}=0$. $A_{1}, A_{2}, B_{1}, B_{2}$ are random matrices. The stop** criterion of all these methods are defined as

$$\begin{aligned} \begin{aligned} \Vert r_{k} \Vert = \bigl\Vert A_{1}x_{1}^{k}+A_{2}x_{2}^{k} +y-b \bigr\Vert \le 10^{-8}. \end{aligned} \end{aligned}$$

Throughout the testing phase, we conduct experiments with four cases $\tau =30, \tau =35, \tau =40$ and $\tau =45$, respectively. The numerical results of the three algorithms are reported in Table 1. We report the number of iterations required to satisfy the stop** criterion (“Iter”), the total computing time in seconds (“times”), and the value of the stop** criterion (“log(Crit)”). Moreover, to visually illustrate the convergence behavior, the curves of the objective value and $\log (\|r_{k}\|)$ at $\tau =45\cdot $ are presented in Fig. 1.

Table 1 Numerical results under different τ

Full size table

From Table 1, we can see that the two proposed algorithms have higher time efficiency and fewer iterations in comparison with LADMM. Figure 1(a) illustrates the trends of the objective value under the same iterations, clearly indicating that SPLIADMM and SCLIADMM have better performance of convergence than LADMM. Figure 1(b) again demonstrates the high time efficiency of our two algorithms, especially when “log(Crit)” is less than −4.

4.2 Matrix decomposition

Now, we consider the matrix decomposition problem, which has the following form:

$$\begin{aligned} \min \Vert L \Vert _{*}+\alpha \Vert S \Vert _{1}+ \frac{\omega}{2} \Vert T-M \Vert ^{2}\quad \text{s.t. } L+S=T, \end{aligned}$$

(4.5)

where $M\in \mathbb{R}^{p\times n}$ is the observed matrix, and $L,S,T \in \mathbb{R}^{p\times n}$ are the decision variables. The nuclear norm $\|L\|_{*}:=\sum_{i=1}^{\min(p,n)}\vert \sigma _{i}(L)\vert ^{ \frac{1}{2}}$, the spares term $\|S\|_{1}:=\sum_{i=1}^{n}\sum_{i=1}^{p}\vert S_{ij}\vert $, ω is the penalty factor, and α is the trade-off parameter between the nuclear norm $\|L\|_{*}$ and the $l_{1}$-norm $\|S\|_{1}$. The ALF of problem (4.5) is defined as

$$\begin{aligned} \begin{aligned} L_{\beta} (L,S,T,\lambda )= \Vert L \Vert _{*}+\alpha \Vert S \Vert _{1}+ \frac{\omega}{2} \Vert T-M \Vert ^{2} -\langle \lambda , L+S-T\rangle + \frac{\beta}{2} \Vert L+S-T \Vert ^{2}, \end{aligned} \end{aligned}$$

where λ is the Lagrange multiplier.

Applying SPLI-ADMM to problem (4.5), we get the closed-form iterative formulas:

$$\begin{aligned} \textstyle\begin{cases} z_{L}^{k}=L^{k}+\theta (L^{k}-{L}^{k-1} ), z_{S}^{k}=S^{k}+ \theta (S^{k}-{S}^{k-1} ), \\ L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau z_{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau z_{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\tau T^{k}+\beta (L^{k+1}+S^{k+1} )+\omega M-\lambda ^{k}}{\beta +\omega +\tau}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ), \end{cases}\displaystyle \end{aligned}$$

where $V(\cdot ,\mu )$ is the singular value thresholding operator [30], $S(\cdot ,\mu ) $ is the softshrinkage operator [31]. Applying SCLI-ADMM to problem (4.5), we get

$$\begin{aligned} \textstyle\begin{cases} z_{L}^{k}=L^{k}+\theta (L^{k}-{L}^{k-1} ), z_{S}^{k}=S^{k}+ \theta (S^{k}-{S}^{k-1} ), \\ L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau z_{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau z_{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\tau T^{k}+\beta (L^{k+1}+S^{k+1} )+\omega (M-T^{k})-\lambda ^{k}}{\beta +\tau}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ), \end{cases}\displaystyle \end{aligned}$$

Applying LADMM to problem (4.5), we have

$$\begin{aligned} \textstyle\begin{cases} L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\beta (L^{k+1}+S^{k+1} )+\omega M-\lambda ^{k}}{\beta +\omega}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ). \end{cases}\displaystyle \end{aligned}$$

We set $p=n=100$, and take 8 different $(r.,spr.)$. Besides, we choose $\alpha =\frac{0.2}{\sqrt{m}},\theta =0.3,\omega =1000$, the matrix $L,S$ and T are initialized to be zero. We take $\beta =5, \tau =1$, M was generated in MATLAB randomly. The stop** criterion is defined as

$$\begin{aligned} \begin{aligned} \operatorname { RelChg }:= \frac{ \Vert (L^{k+1}, S^{k+1}, T^{k+1} )- (L^{k}, S^{k}, T^{k} ) \Vert _{F}}{ \Vert (L^{k}, S^{k}, T^{k} ) \Vert _{F}+1} \leqslant 10^{-8} \quad\text{or}\quad k>3000. \end{aligned} \end{aligned}$$

Let Ŝ and T̂ be a numerical solution of problem (4.5). We measure the quality of the recovery by the relative error, which is defined by

$$\begin{aligned} \begin{aligned} \operatorname{RelErr}:= \frac{ \Vert (\hat{L},\hat{S}, \hat{T})- (L^{*},S^{*}, T^{*} ) \Vert _{F}}{ \Vert (L^{*},S^{*}, T^{*} ) \Vert _{F}+1} . \end{aligned} \end{aligned}$$

Table 2 illustrates the comparison between different $(r.,spr.)$, where “r.” represents the rank of matrix L, “$spr$.” represents the sparsity of the sparse matrix S, “Iter” represents the number of iterations. $\|S\|_{0}$ denotes the number of nonzero elements of S. Besides, the iterative curves of the stop** criterion and relative error of the three algorithms are plotted in Fig. 2, respectively.

Table 2 Summary of three algorithms for eight different (r., $spr$)

Full size table

Table 2 shows that SPLIADMM and SCLIADMM take less time and fewer iterations under the same condition, which demonstrates that our proposed two algorithms are more efficient than LADMM for different rank and sparse ratios. In Fig. 2, the curves of stop** criterion (see Fig. 2(a) and (c)) in two trials demonstrate that SPLI-ADMM and SCLIADMM converge faster than LADMM. Figure 2(b) and (d) indicate clearly that the matrices L and S are better recovered by SPLI-ADMM and SCLI-ADMM because “RelErr” of LADMM is greater than that of SPLI-ADMM for the same “Iter”.

5 Conclusion

This paper made some extensions in the field of nonconvex optimization through the development and convergence analysis of two linearized ADMM algorithms, SPLI-ADMM and SCLI-ADMM. By integrating inertial strategy within a linearized framework, these algorithms improve the efficacy for solving linear constrained problems with nonseparable structure. A key novelty lies in the utilization of sequential gradients of the mixed term, which is not typically found in conventional ADMM approaches, enabling the proposed algorithms to use the latest information to update each variable. The KŁ property has been used to guarantee the convergence of the generated sequences. Finally, the results of numerical experiments show that the proposed algorithms exhibit superior time efficiency and validity.

Data Availability

No datasets were generated or analysed during the current study.

Notes

LADMM is a special case of SPLI-ADMM that the inertial parameter $\theta _{k} = 0$.

References

Yang, J., Zhang, Y.: Alternating direction algorithms for $\ell_1$-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)
Article MathSciNet Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet Google Scholar
Ding, J., Zhang, X., Chen, M., Xue, K., Zhang, C., Pan, M.: Differentially private robust admm for distributed machine learning. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 1302–1311. IEEE, Los Angeles (2019)
Chapter Google Scholar
Wang, Y., Yin, W., Zeng, J.: Global convergence of admm in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63 (2019)
Article MathSciNet Google Scholar
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)
Article MathSciNet Google Scholar
Peng, Z., Xu, Y., Yan, M., Arock, W.Y.: An algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)
Article MathSciNet Google Scholar
Chen, L., Sun, D., Toh, K.-C.: A note on the convergence of admm for linearly constrained convex optimization problems. Comput. Optim. Appl. 66(2), 327–343 (2017)
Article MathSciNet Google Scholar
Chang, X., Liu, S., Zhao, P., Song, D.: A generalization of linearized alternating direction method of multipliers for solving two-block separable convex programming. J. Comput. Appl. Math. 357, 251–272 (2019)
Article MathSciNet Google Scholar
Zhang, C., Song, Y., Cai, X., Han, D.: An extended proximal admm algorithm for three-block nonconvex optimization problems. J. Comput. Appl. Math. 398, 113681 (2021)
Article MathSciNet Google Scholar
Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM J. Optim. 25(2), 882–915 (2015)
Article MathSciNet Google Scholar
Wang, X., Shao, H., Liu, P., Wu, T.: An inertial proximal partially symmetric admm-based algorithm for linearly constrained multi-block nonconvex optimization problems with applications. J. Comput. Appl. Math. 420, 114821 (2023)
Article MathSciNet Google Scholar
Hien, L.T.K., Phan, D.N., Gillis, N.: Inertial alternating direction method of multipliers for non-convex non-smooth optimization. Comput. Optim. Appl. 83(1), 247–285 (2022)
Article MathSciNet Google Scholar
Chao, M., Deng, Z., Jian, J.: Convergence of linear Bregman admm for nonconvex and nonsmooth problems with nonseparable structure. Complexity 2020, 1–14 (2020)
Google Scholar
Li, X., Mo, L., Yuan, X., Zhang, J.: Linearized alternating direction method of multipliers for sparse group and fused lasso models. Comput. Stat. Data Anal. 79, 203–221 (2014)
Article MathSciNet Google Scholar
Ling, Q., Shi, W., Wu, G., Dlm, A.R.: Decentralized linearized alternating direction method of multipliers. IEEE Trans. Signal Process. 63(15), 4051–4064 (2015)
Article MathSciNet Google Scholar
Liu, Q., Shen, X., Gu, Y.: Linearized admm for nonconvex nonsmooth optimization with convergence analysis. IEEE Access 7, 76131–76144 (2019)
Article Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Article Google Scholar
Zavriev, S.K., Kostyuk, F.V.: Heavy-ball method in nonconvex optimization problems. Comput. Math. Model. 4(4), 336–341 (1993)
Article Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Article MathSciNet Google Scholar
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with dam**. Set-Valued Anal. 9(1), 3–11 (2001)
Article MathSciNet Google Scholar
Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (ipalm) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)
Article MathSciNet Google Scholar
Hien, L.T.K., Papadimitriou, D.: An inertial admm for a class of nonconvex composite optimization with nonlinear coupling constraints (2022). ar**v preprint. ar**v:2212.11336
Boţ, R.I., Nguyen, D.-K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45(2), 682–712 (2020)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016)
Article MathSciNet Google Scholar
Goncalves, M.L.N., Melo, J.G., Monteiro, R.D.C.: Convergence rate bounds for a proximal admm with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). ar**v preprint. ar**v:1702.01850
Zeng, J., Lin, S., Wang, Y., Xu, Z.: $l_{1/2}$ regularization: convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process. 62(9), 2317–2329 (2014)
Article MathSciNet Google Scholar
Xu, Z., Chang, X., Xu, F., Zhang, H.: $l_{1/2}$ regularization: a thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1013–1027 (2012)
Article Google Scholar
Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MathSciNet Google Scholar
Bonesky, T., Maass, P.: Iterated soft shrinkage with adaptive operator evaluations. J. Inverse Ill-Posed Probl. 17(4), 337–358 (2009)
Article MathSciNet Google Scholar

Download references

Funding

This work is supported by National Natural Science Foundation of China under grants 72071130, 71901145 and 12371308; The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(No.TP2022126); Key Lab of Intelligent and Green Flexographic Printing(No.KLIGFP-01)

Author information

Authors and Affiliations

Shanghai Publishing and Printing College, Shuifeng Road 100, Yangpu District, Shanghai, 200093, Shanghai, China
Zhonghui Xue & Qianfeng Ma
School of Managemant, University of Shanghai for Science and Technology, Jungong Road 516, Yangpu District, Shanghai, 200093, Shanghai, China
Kaiyuan Yang & Yazheng Dang

Authors

Zhonghui Xue
View author publications
You can also search for this author in PubMed Google Scholar
Kaiyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qianfeng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yazheng Dang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z. write the introduction. Z. and K. finsih the theoretical framework and establish Convergence analysis for the entire study. K. and Q. Assisted in Numerical experiment and preparing the figures and tables. Y. finalized the manuscript content and structure, ensuring consistency and coherence. Z. and Y. acquired of the financial support for the project leading to this publication All authors reviewed the manuscript.

Corresponding author

Correspondence to Yazheng Dang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xue, Z., Yang, K., Ma, Q. et al. Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure. J Inequal Appl 2024, 65 (2024). https://doi.org/10.1186/s13660-024-03141-1

Download citation

Received: 16 January 2024
Accepted: 24 April 2024
Published: 08 May 2024
DOI: https://doi.org/10.1186/s13660-024-03141-1

Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure

Abstract

Similar content being viewed by others

Primal and dual mixed-integer least-squares: distributional statistics and global algorithm

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

The Parameterized Augmentation Block Preconditioner for Nonsymmetric Saddle Point Problems

1 Introduction

2 Preliminaries

Definition 2.1

Proposition 2.1

Definition 2.2

Definition 2.3

Lemma 2.1

Lemma 2.2

Lemma 2.3

3 Algorithms and their convergence

3.1 Two linear inertial algorithms

Remark 1

3.2 A descent inequality

Assumption A

Lemma 3.1

Proof

Lemma 3.2

Proof

Remark 2

Lemma 3.3

Proof

3.3 The cluster points of \(\{\omega _{k}\}\) are contained in \(critL\)

Lemma 3.4

Proof

Lemma 3.5

Proof

Theorem 3.1

Proof

3.4 Global convergence under the Kurdyka–Łojasiewicz property

Theorem 3.2

Proof

4 Numerical experiments

4.1 \(l_{\frac{1}{2}}\)-regularization problem

4.2 Matrix decomposition

5 Conclusion

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation