1 Introduction

In this paper, we consider the following linearly constrained nonconvex optimization problem with multiple block variables:

$$\begin{aligned} \begin{aligned}& \underset{x_{i}, y}{\min } \sum_{i = 1}^{n} {{f_{i}} ( {{x_{i}}} )} + g ( {{x_{1}}, {x_{2}},\ldots , {x_{n}} ,y} ), \\ &\quad \text{s.t. }\sum_{i = 1}^{n} {{A_{i}} {x_{i}}} + By =b, \end{aligned} \end{aligned}$$
(1.1)

where \({x_{i}} \in {\mathbb{R} ^{{p_{i}}}} ( {i = 1,2, \ldots n} )\) and \(y \in {\mathbb{R} ^{q}}\) are variables, each \({f_{i}}:{\mathbb{R} ^{{p_{i}}}} \to \mathbb{R} \cup \{ { + \infty } \} ( {i = 1,2, \ldots n} )\) are proper lower semicontinuous functions, which are nonconvex and (possibly) nonsmooth, \(g:{\mathbb{R} ^{p_{1}}}\times{\mathbb{R} ^{p_{2}}}\times \cdots \times{\mathbb{R} ^{p_{n}}}\times{\mathbb{R} ^{q}} \to \mathbb{R}\) is continuously differentiable, and ∇g is Lipschitz continuous with modulus \(l_{g}>0\), \({A_{i}} \in {\mathbb{R}^{m \times {p_{i}}}} ( {i = 1,2, \ldots n} ), B \in {\mathbb{R}^{m \times {q}}}\) are given matrices, and \(b \in {\mathbb{R} ^{m}}\). Denote \(\mathbf{x}_{[i,j]} = (x_{i},x_{i+1}, \ldots , x_{j-1},x_{j})\) and \(\mathbf{Ax}_{[j,k]} = \sum_{i=j}^{k}A_{i}x_{i}\).

The Augmented Lagrangian Function (ALF) of (1.1) is defined as

$$\begin{aligned} \begin{aligned} L ( {\mathbf{x}_{[1,n]} ,y,\lambda } ) = \sum _{i=1}^{n}{f_{i}} ( {{x_{i}}} ) +g ( \mathbf{x}_{[1,n]},y ) - \langle {\lambda , \mathbf{Ax}_{[1,n]} + By - b} \rangle +\frac{\beta}{2} \Vert \mathbf{Ax}_{[1,n]} + By - b \Vert ^{2}, \end{aligned} \end{aligned}$$

where \(\lambda \in \mathbb{R}^{m}\) is the Lagrangian dual variable, and \(\beta >0\) is a penalty parameter.

The problem (1.1) encapsulates a multitude of nonconvex optimization problems across various domains, including signal processing, image reconstruction, matrix decomposition, machine learning, etc. [13]. When the number of blocks n equals 2, and \(g(\cdot )\) is identically zero, this problem degenerates into two-block separable problem. If the problem contains merely a mixed term, it becomes similar to the problem in [4]. On the other hand, if variable y is absent, the problem becomes the study in [5]. Hence, problem (1.1) extends the scope of the objective functions found in the literature [46], encompassing a broader range of scenarios with additional variables and potential mixed terms, thereby reflecting the versatility and complexity encountered in contemporary applications.

Indeed, ADMM has been established as a powerful tool for solving two-block separable convex optimization problems [7, 8]. However, its effectiveness and convergence guarantees become much more intricate when dealing with nonconvex problems, especially when the number of blocks exceeds two. Zhang et al. [9] tackled this challenge by proposing a proximal ADMM for solving three-block nonconvex optimization tasks, building upon the groundwork laid by Sun et al. [10]. Meanwhile, Wang et al. [11] proposed an inertial proximal partially symmetric ADMM, suitable for handling multiblock separable nonconvex optimization problems. Hien et al. [12] developed an inertial version of ADMM, referred to as iADMM, which integrated the majorization-minimization principle within each block update step to address a specific class of nonconvex low-rank representation problems. Chao et al. [13] contributed to this area with a linear Bregman ADMM algorithm for nonconvex multiblock optimization problems featuring nonseparable structures.

Linearized Alternating Direction Method of Multipliers (LADMM) simplifies the problem-solving process and significantly decreases the computational overhead associated with traditional ADMM. By linearizing certain components of the optimization problem at each iteration, LADMM allows for more straightforward and efficient updates. Li et al. [14] effectively utilized LADMM in the context of the least absolute shrinkage and selection operator (LASSO) problem, demonstrating that this linearized approach is simple and highly efficient. Ling et al. [15] further extended the application of LADMM by introducing a decentralized linearized ADMM algorithm, which solely linearizes the objective functions at each iterative step. This method facilitates distributed computation and can handle large-scale problems more effectively. Specifically addressing nonconvex and nonsmooth scenarios, Liu et al. [16] proposed a two-block linearized ADMM. This variant linearizes the mixed term and the quadratic penalty term in the Augmented Lagrangian Function (ALF), thereby providing a viable solution strategy for such challenging optimization problems. Chao et al. [13] presented a linear Bregman ADMM, which only linearized the mixed term for solving three-block nonseparable problems. This approach maintains the efficiency gains of LADMM while adapting it to accommodate the complexities inherent in multiblock and nonseparable optimizations.

Inertial technique, initially conceived by Polyak [17], serves as an acceleration strategy that takes into account the dynamics of the optimization process by incorporating information from the last two iterations, thereby mitigating substantial differences between consecutive points. Subsequently, Zavriv et al. [18] expanded the use of the inertial technique to tackle nonconvex optimization problems, marking a significant milestone in broadening the applicability of this methodology. Recently, the inertial technique has seen widespread adoption in conjunction with various optimization algorithms to enhance their performance in solving nonconvex optimization problems. Bot et al. [19] proposed an inertial forward-backward algorithm for the minimization of the sum of two non-convex functions. Attouch et al. [20] introduced an inertial proximal method and a proximal alternating projection method for maximal-monotone problems and minimization problems, respectively. Pock et al. [21] went on to propose a linear Inertial Proximal Alternating Minimization Algorithm (IPAMA) for a diverse range of nonconvex and nonsmooth optimization problems. Building upon these advancements, researchers have successfully integrated the inertial technique with the Alternating Direction Method of Multipliers (ADMM). Hien et al. [22] developed an Inertial Alternating Direction Method of Multipliers (iADMM) specifically designed for a class of nonconvex multiblock optimization problems with nonlinear coupling constraints. Wang et al. [11] also introduced an Inertial Proximal Partially Symmetric ADMM, tailored for nonconvex settings, further highlighting the versatility and efficacy of combining inertial techniques with ADMM in modern optimization methodologies.

Inspired by the previous works [11, 13, 16, 23], in this paper, we construct two new variant linear inertial ADMM algorithms, sequential partial linear inertial ADMM (SPLI-ADMM) and sequential complete linear inertial ADMM (SCLI-ADMM) for problem (1.1).

The novelty of this paper can be summarized as follows:

(I) The proposed algorithms combine the inertial effect with the linearization skill. The former improves the feasibility of the algorithms, while the latter contributes to fast convergence.

(II) Unlike conventional approaches such as those in [13], during the linearization phase, the gradient of the mixed term of the \(x_{j}\)-sub-problem is calculated as \({\nabla _{{x_{j}}}}g( {\mathbf{x}_{[1,j-1]}^{k+1} ,\mathbf{x}_{[j,k]}^{k} ,{y^{k}}}) \) rather than \({\nabla _{{x_{j}}}}g( {\mathbf{x}_{[1,n]}^{k} ,{y^{k}}} ) \). This distinctive characteristic enables us to linearize the mixed term dynamically based on the progress of the indicator sequence, meaning that each update depends on the current state of the indicators. Consequently, it is referred to as a sequential gradient iteration scheme.

The rest of this paper is organized as follows: In Sect. 2, some necessary preliminaries for further analysis are summarized. Then, we establish the convergence of the two algorithms in Sect. 3. Section 4 shows the validity of the algorithms by some numerical experiments. Finally, some conclusions are drawn in Sect. 5.

2 Preliminaries

In this section, we recall some basic notations and preliminary results, which will be used in this paper. Throughout, \({\mathbb{R}^{n}}\) denotes the n-dimensional Euclidean space, \(\mathbb{R} \cup \{ { + \infty } \}\) denotes the extended real number set, and \(\mathbb{N}\) denotes the natural number set. The image space of a matrix \(Q \in {\mathbb{R} ^{m \times n}}\) is defined as \({\mathop{\mathrm{Im}} } Q: = \{ {Qx:x \in { \mathbb{R}^{n}}} \}\). If matrix \(Q \ne 0\), let \({\rho _{\min (Q^{\mathrm{T}}Q)}}\) denote the smallest positive singular value of the matrix \({Q^{\mathrm{T}}Q}\). \(\Vert \cdot \Vert \) represents the Euclidean norm. \(\operatorname{dom} f: = \{ {x \in {\mathbb{R} ^{n}}:f ( x ) < + \infty } \}\) is the domain of a function \(f:{\mathbb{R} ^{n}} \to \mathbb{R} \cup \{ { + \infty } \}\), \(\langle {x,y} \rangle = {x^{\mathrm{T}}}y = \sum_{i = 1}^{n} {{x_{i}}{y_{i}}} \).

Definition 2.1

([24])

Let \(f:\mathbb{R}^{n}\to \mathbb{R}\bigcup \{+\infty \}\) be a proper lower semicontinuous function.

(I) The Fréchet subdifferential, or regular subdifferential, of f at \(x\in {\mathrm{dom}} f \), written \(\hat{\partial} f(x) \), is defined as

$$\begin{aligned} \hat{\partial f}(x)= \biggl\{ x^{*}\in \mathbb{R}^{n}:\lim _{y\neq x}\inf_{y \neq x}\frac{f(y)-f(x)-\langle x^{*},y-x\rangle}{ \Vert y-x \Vert }\geq 0 \biggr\} , \end{aligned}$$

when \(x\notin \operatorname{dom}f \), we set \(\hat{\partial} f( x) = \emptyset \).

(II) The limiting-subdifferential, or simply the subdifferential, of f at \(x\in {\mathrm{dom}}f\), written \(\partial f(x)\), is defined as

$$\begin{aligned} \partial f(x)= \bigl\{ x^{*}\in \mathbb{R}^{n}:\exists x_{k}\to x, s.t. f(x_{k}) \to f(x),x_{k}^{*} \in \hat{\partial}f(x), x_{k}^{*}\to x^{*} \bigr\} . \end{aligned}$$

(III) A point that satisfies

$$\begin{aligned} 0\in \partial f(x) \end{aligned}$$

is called a critical point or a stationary point of the function f. The set of critical points of f is denoted by crit f.

Proposition 2.1

We collect some basic properties of the subdifferential [24].

(I) \(\hat{f}(x) \subseteq \partial f(x) \) for each \(x\in \mathbb{R}^{n}\), where the first set is closed convex, and the second set is only closed.

(II) Let \(x_{k}^{*}\in \partial f(x_{k})\) and \(\lim_{k\to \infty}(x_{k},x_{k}^{*})=(x,x^{*})\), then, \(x^{*}\in \partial f(x)\).

(III) If \(f: \mathbb{R}^{n}\to \mathbb{R}\bigcup \{ + \infty \} \) is proper lower semicontinuous, and \(g:\mathbb{R}^{m}\to \mathbb{R}\) is continuous differentiable, then \(\partial (f+g)(x)=\partial f(x)+\nabla g(x)\) for any \(x\in \operatorname{dom}f\).

Definition 2.2

If \({\omega ^{*}} = { ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}},{ \lambda ^{*}}} )^{T}}\) such that

$$\begin{aligned} \textstyle\begin{cases} A_{i}^{\mathrm{T}}{\lambda ^{*}} \in \partial {f_{i}} ( {x_{i}^{*}} ) + {\nabla _{{x_{i}}}}g ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}}} ),\quad i = 1,2, \ldots n, \\ {B^{\mathrm{T}}}{\lambda ^{*}} = {\nabla _{y}}g ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}}} ), \\ {A_{1}}x_{1}^{*} + \cdots + {A_{n}}x_{n}^{*}+B{y^{*}} = b, \end{cases}\displaystyle \end{aligned}$$
(2.1)

then \({\omega ^{*}}\) is called a critical point or stationary point of the Lagrangian function \(L ( {x_{1}}, \ldots, {x_{n}},y,\lambda )\).

A very important technique to prove the convergence of ADMM for nonconvex optimization problems is the assumption that the Lagrangian function satisfies the Kurdyka-Łojasiewicz property (KŁ property) [19, 25]. For notational simplicity, we use \({\Phi _{\eta }} ( {\eta > 0} )\) to denote the set of concave functions \(\varphi : [ 0, \eta ) \to [ 0, \infty ) \) such that

(I) \(\varphi ( 0 ) = 0\);

(II) φ is continuously differentiable on \(( {0,\eta } )\) and continuous at 0;

(III) \(\varphi ' ( s ) > 0\) for all \(s \in ( {0,\eta } )\).

The KŁ property can be described as follows.

Definition 2.3

(see [19, 26]) (KŁ property) Let \(f:{\mathbb{R}^{n}} \to \mathbb{R} \cup \{ { + \infty } \}\) be a proper lower semicontinuous function. If there exist \(\eta \in ( 0 , { + \infty } ]\), a neighborhood U of \({x^{*}}\), and a continuous concave function \(\varphi \in {\Phi _{\eta }}\) such that for all \(x \in U \cap \{ {x \in {R^{m}}:f ( {{x^{*}}} ) < f ( x ) < f ( {{x^{*}}} ) + \eta } \}\), it holds that

$$\begin{aligned} \varphi ' \bigl( {f ( x ) - f \bigl( {{x^{*}}} \bigr)} \bigr)\operatorname{dist} \bigl( {0,\partial f ( x )} \bigr) \ge 1, \end{aligned}$$
(2.2)

where the distance from x to S is defined by \(d(x,S):=\inf \{\|y-x\|:y\in S\}\). Then, f is said to have the KŁ property at \({x^{*}}\).

Lemma 2.1

(see [25]) (Uniformized KŁ property) Suppose that \(f:{\mathbb{R}^{n}} \to \mathbb{R} \cup \{ { + \infty } \}\) is a proper lower semicontinuous function, and Ω is a compact set. If \(f ( x ) \equiv {f^{*}}\) for all \(x \in \Omega \) and satisfies the KŁ property at each point of Ω, then there exist \(\varepsilon > 0,\eta > 0\) and \(\varphi \in {\Phi _{\eta }}\) such that

$$\begin{aligned} \varphi ' \bigl( {f ( x ) - {f^{*}}} \bigr) \operatorname{dist} \bigl( {0, \partial f ( x )} \bigr) \ge 1, \end{aligned}$$
(2.3)

for all \(x \in \{ {x \in {\mathbb{R}^{m}}:\operatorname{dist} ( {x,\Omega } ) < \varepsilon } \} \cap \{ {{f^{*}} < f ( x ) < {f^{*}} + \eta } \}\).

Lemma 2.2

(see [25]) (Descent lemma) Let \(h:{\mathbb{R}^{n}} \to \mathbb{R}\) be a continuous differentiable function where gradienth is Lipschitz continuous with the modulus \({l_{h}} > 0\), then for any \(x,y \in {\mathbb{R}^{n}}\), we have

$$\begin{aligned} \bigl\vert {h ( y ) - h ( x ) - \bigl\langle { \nabla h ( x ),y - x} \bigr\rangle } \bigr\vert ^{2} \le \frac{{{l_{h}}}}{2}{ \Vert {y - x} \Vert ^{2}}. \end{aligned}$$
(2.4)

Lemma 2.3

(see [27]) Let \(Q \in {\mathbb{R}^{m \times n}}\) be a nonzero matrix, and let \({\rho _{\min (Q^{\mathrm{T}}Q)}}\) denote the smallest positive eigenvalue of \({Q^{\mathrm{T}}Q}\). Then, for every \(u \in {\mathbb{R}^{n}}\), it holds that

$$\begin{aligned} \sqrt {\rho _{\min (Q^{\mathrm{T}}Q)}} \Vert {P_{Q}u} \Vert \le \Vert {Qu} \Vert , \end{aligned}$$
(2.5)

where \({P_{Q}}\) denotes the Euclidean projection onto \({\mathrm{Im}}(Q)\).

3 Algorithms and their convergence

In this section, we propose two linear inertial ADMM algorithms, sequential partial linear inertial ADMM (SPLI-ADMM), and sequential complete linear inertial ADMM (SCLI-ADMM) and prove their convergence with some suitable conditions. Furthermore, we prove the boundedness of the sequence.

3.1 Two linear inertial algorithms

First, we present Algorithm 1 for (1.1).

In every iteration of the subproblems, our approach utilizes sequential gradient to update the variables. Specifically, for the \((k+1)\)th iteration of \(x_{i}\) \((i=1,\ldots ,n)\), the mixed term \(g(\mathbf{x}_{[1,i-1]}^{k+1} ,x_{i},\mathbf{x}_{[i+1,n]}^{k} ,y^{k})\) is replaced with a linearized approximation that includes an inertial proximal term: \(g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k}) + \langle x_{i}-x_{i}^{k}, \nabla g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k}) \rangle + \frac{\tau}{2}\| x_{i}-z_{i}^{k} \|^{2}\). Here, the sequential gradient \(\nabla g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k})\) is refreshed for each subproblem, reflecting the most recent variable updates. Note that the y-subproblem remains unlinearized, so we call it sequential partial linear inertial ADMM.

For \(x_{j}\)-subproblem \((i=1,\ldots ,n)\) and y-subproblem, respectively, we get the following auxiliary functions:

$$\begin{aligned} &\begin{aligned} \hat{f}_{j}^{k}(x_{j})={}&{f_{j}} ( {{x_{j}}} ) + \bigl\langle x_{j}-x_{j}^{k}, \nabla _{x_{j}} g \bigl(\mathbf{x}_{[1,j-1]}^{k+1} , \mathbf{x}_{[j,n]}^{k} ,y^{k} \bigr) \bigr\rangle \\ &{}+ \frac{\beta }{2}{ \biggl\Vert {\mathbf{Ax}_{[1,j-1]}^{k+1} + {A_{j}}x_{j} +\mathbf{Ax}_{[j+1,n]}^{k} + B{y^{k}} - b - \frac{{{\lambda ^{k}}}}{\beta }} \biggr\Vert ^{2}} + \frac{{{\tau }}}{2}{ \bigl\Vert {{x_{j}} - z_{j}^{k}} \bigr\Vert ^{2}}, \end{aligned} \end{aligned}$$
(3.1)
$$\begin{aligned} &\begin{aligned} \hat{h}^{k}(y)=g \bigl( \mathbf{x}_{[1,n]}^{k+1} ,y \bigr) + \frac{\beta }{2} \biggl\Vert \mathbf{Ax}_{[1,n]}^{k+1} + By - b - \frac{\lambda ^{k}}{\beta } \biggr\Vert ^{2} + \frac{{{\tau }}}{2}{ \bigl\Vert {y - y^{k}} \bigr\Vert ^{2}}, \end{aligned} \end{aligned}$$
(3.2)

where

$$\begin{aligned} \textstyle\begin{cases} z_{1}^{k} = x_{1}^{k} + {\theta _{k}} ( {x_{1}^{k-1} - x_{1}^{k}} ), \\ z_{2}^{k} = x_{2}^{k} + {\theta _{k}} ( {x_{2}^{k-1} - x_{2}^{k}} ), \\ \vdots \\ z_{n}^{k} = x_{n}^{k} + {\theta _{k}} ( {x_{n}^{k-1} - x_{n}^{k}} ), \end{cases}\displaystyle \end{aligned}$$
(3.3)

and \(\theta _{k}\in [0,\frac{1}{2})\). Utilizing the auxiliary functions above, the update rules are summarized in Algorithm 1 as follows:

Algorithm 1
figure a

SPLI-ADMM (Sequential Partial Linear Inertial ADMM)

Remark 1

(I) The auxiliary functions defined in (3.1) own the inertial term \(\frac{\tau}{2}\|x_{i}-z_{i}^{k}\|^{2}\), \(i=1,2,\ldots ,n \), respectively. The inertial schemes update the new iteration by employing the two previous iterations. By adding the inertial term to \(x_{i} \) subproblems, the iteration trends to the direction \(x_{i}^{k}-x_{i}^{k-1}\).

(II) The purpose of linearizing the mixed term in \(x_{i}\)-subproblem is to use the properties of differentiable blocks and simplify the calculation of each iteration.

(III) The initial point \(\mathbf{x}_{[1,n]}^{-1} =\mathbf{x}_{[1,n]}^{0} = 0, y^{-1}=y^{0}=0\) was designed for demonstrating the boundedness of the sequence \(\{\omega ^{k}\}\) generated by the algorithm.

The update rules of Algorithm 2 can be written as follows:

Algorithm 2
figure b

SCLI-ADMM (Sequential Complete Linear Inertial ADMM)

Algorithm 2 is obtained by further linearization on the basis of Algorithm 1. The \(x_{i}\)-subproblems \((i=1,\ldots ,n)\) are same to that of Algorithm 1, the iterative scheme can be written as (3.4). During the \((k+1)\)th iteration for updating y, we replace the function in \(g(\mathbf{x}_{[1,n]}^{k+1} ,y)\) with a linearized approximation plus a regularization term \(g_{y}(\mathbf{x}_{[1,n]}^{k+1},y^{k}) + \langle y-y^{k}, \nabla g_{y}(\mathbf{x}_{[1,n]}^{k+1},y^{k}) \rangle + \frac{\tau}{2}\|y-y^{k} \|^{2}\). In Algorithm 2, all the subproblems were linearized and sequential updated, hence we call it the Sequential Complete Linear Inertial ADMM.

The auxiliary function of y-subproblem is as follows

$$\begin{aligned} \begin{aligned} \bar{{h}}^{k}(y)= \bigl\langle y-y^{k}, \nabla _{y} g \bigl( \mathbf{x}_{[1,n]}^{k+1},y^{k} \bigr) \bigr\rangle + \frac{\beta }{2}{ \biggl\Vert {\mathbf{Ax}_{[1,n]}^{k+1} + By - b - \frac{{{\lambda ^{k}}}}{\beta }} \biggr\Vert ^{2}} + \frac{{{\tau }}}{2}{ \bigl\Vert {y - y^{k}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$
(3.8)

3.2 A descent inequality

A crucial element in establishing the convergence of these algorithms is to verify the descent property of the regularized augmented Lagrangian function sequence. To facilitate our analysis, the following notations are introduced throughout this paper. For \(k\ge 1\),

$$\begin{aligned} \begin{aligned}&\Delta x_{i}^{k+1} = x_{i}^{k+1}-x_{i}^{k}, \qquad\Delta y^{k+1}=y^{k+1}-y^{k},\qquad \Delta \lambda ^{k+1}=\lambda ^{k+1}-\lambda ^{k}. \\ & \Delta \mathbf{x}_{[i,j]}^{k+1} = \bigl(\Delta x_{i}^{k+1},\ldots , \Delta x_{j}^{k+1} \bigr), \qquad \theta \bigl\Vert \Delta \mathbf{x}_{[i,j]}^{k+1} \bigr\Vert =\sum_{s=i}^{j}\theta \bigl\Vert \Delta x_{s}^{k+1} \bigr\Vert . \end{aligned} \end{aligned}$$

The convergence analysis relies on the following assumptions:

Assumption A

(I) g is \(l_{g}\)-Lipschitz differentiable, and g is bounded from below. ∇g is \(l_{g}\)-Lipschitz continuous, i.e.,\(\Vert { \nabla g(u) - \nabla g(v)} \Vert \le {l_{g}} \Vert {u - v} \Vert \) for all \(u,v \in {\mathbb{R} ^{p_{1}}}\times{\mathbb{R} ^{p_{2}}}\times \cdots \times{\mathbb{R} ^{p_{n}}}\times{\mathbb{R} ^{q}}\);

(II) \(f_{i}\), \(i=1,\ldots ,n\) are proper lower semicontinuous, and \(f_{i} \) are bounded from below;

(III) The linear operator B is surjective, i.e., \(B\neq 0\) and \(\{b\}\bigcup \{\bigcup_{i=1}^{n} \mathop{\mathrm{Im}}A_{i} \} \subset \mathop{\mathrm{Im}}B \);

(IV) For Algorithm 1 and Algorithm 2, \(\theta _{k} \in [0,\frac{1}{2} )\), τ>0 and β is large enough such that \(\tau > \frac{2+l_{g}}{1-2\theta _{k}}\), \(\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{6 (\tau ^{2}+l_{g}^{2} )}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \} \);

(V) Let \(X:= {\mathbb{R}}^{p_{1}}\times \cdots \times{\mathbb{R}}^{p_{n}} \times{\mathbb{R}}^{q}\times{\mathbb{R}}^{m}\). The set \(\{\omega \in X:L_{\beta}(\omega )\leq L_{\beta}({\omega}^{0}) \}\) is bounded.

For showing the descent property, the following lemmas are necessary.

Lemma 3.1

For Algorithm 1, for each \(k \in { N}\), we have

$$\begin{aligned} \begin{aligned} { \bigl\Vert {\Delta \lambda ^{k + 1}} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$
(3.9)

For Algorithm 2, for each \(k \in { N}\), we have

$$\begin{aligned} \begin{aligned} { \bigl\Vert {\Delta \lambda ^{k + 1}} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$
(3.10)

Proof

Using Assumption A(III) and Lemma 2.3, we have

$$\begin{aligned} \bigl\Vert {{\Delta \lambda ^{k + 1}}} \bigr\Vert \le \frac{1}{{\sqrt {\rho _{\min (B^{\mathrm{T}}B)}} }} \bigl\Vert {B^{\mathrm{T}}} { \Delta \lambda ^{k + 1}} \bigr\Vert . \end{aligned}$$
(3.11)

For Algorithm 1, the optimal condition of y-subproblem in (3.2) yields

$$\begin{aligned} \begin{aligned} 0 = {\nabla _{y}}g \bigl( \mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}} \bigr) - {B^{\mathrm{T}}} {\lambda ^{k}} + \beta {B^{\mathrm{T}}} \bigl( \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k + 1}} - b \bigr) + {\tau } \bigl({\Delta y^{k + 1}} \bigr) . \end{aligned} \end{aligned}$$

Since \({\lambda ^{k + 1}} = {\lambda ^{k}} - \beta ( {\mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k + 1}} - b}) \), we have

$$\begin{aligned} \begin{aligned} {B^{\mathrm{T}}} {\lambda ^{k + 1}} = {\nabla _{y}}g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}} \bigr)+\tau \bigl(\Delta y^{k+1} \bigr) . \end{aligned} \end{aligned}$$
(3.12)

Let \({u^{k}}=(\mathbf{x}_{[1,n]}^{k},{y^{k}})\). Using Assumption A (I) and (3.12), we have

$$\begin{aligned} \begin{aligned} &{ \bigl\Vert {{B^{\mathrm{T}}} { \lambda ^{k + 1}} - {B^{\mathrm{T}}} {\lambda ^{k}}} \bigr\Vert ^{2}} \\ &\quad={ \bigl\Vert {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) + \tau \Delta y^{k+1} - \tau \Delta y^{k} \bigr\Vert ^{2}} \\ &\quad= \bigl\Vert {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) \bigr\Vert ^{2} + \bigl\Vert \tau \Delta y^{k+1} \bigr\Vert ^{2} + \bigl\Vert \tau \Delta y^{k} \bigr\Vert ^{2} - 2 \bigl\langle \tau \Delta y^{k+1} , \tau \Delta y^{k} \bigr\rangle \\ &\qquad{} - 2 \bigl\langle {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) , \tau \Delta y^{k} \bigr\rangle + 2 \bigl\langle {\nabla _{y}}g \bigl(u^{k+1} \bigr) - {\nabla _{y}}g \bigl(u^{k} \bigr) , \tau \Delta y^{k+1} \bigr\rangle \\ &\quad\le 3l_{g}^{2}{ \bigl\Vert {\Delta u^{k+1}} \bigr\Vert ^{2}}+3\tau ^{2} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}+3\tau ^{2} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2} \\ &\quad\le 3l_{g}^{2} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} + 3 \bigl(l_{g}^{2}+ \tau ^{2} \bigr) \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} +3\tau ^{2} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$
(3.13)

It follows from the above mentioned formula and (3.11) that

$$\begin{aligned} \begin{aligned} { \bigl\Vert \Delta \lambda ^{k + 1} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

For Algorithm 2, similarly, we get

$$\begin{aligned} \begin{aligned} \bigl\Vert \Delta{\lambda ^{k + 1}} \bigr\Vert ^{2} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

The proof is completed. □

To brief the analysis, some notations are given below. Let \({w^{k}} = (\mathbf{x}_{[1,n]}^{k},{y^{k}},{\lambda ^{k}}),{u^{k}}=( \mathbf{x}_{[1,n]}^{k},y^{k})\), \({r_{k}}=\mathbf{Ax}_{[1,n]}^{k} + B{y^{k}} - b \). The following lemma is important to prove the monotonicity of the sequence \(\{\hat{L}_{\beta }(\hat{w}^{k+1})\}\) defined as (3.20).

Lemma 3.2

For Algorithm 1 and Algorithm 2, select \(\theta _{k} \in [0,\frac{1}{2} )\) and \({\tau},{\beta} \) large enough to assure \(\tau > \frac{2+l_{g}}{1-2\theta _{k}} \), \(\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{6(\tau ^{2}+l_{g}^{2})}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \} \).

Then, for each \(k \in {\mathrm{N}}\), we have

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl( \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}+ { \bigl\Vert \Delta{{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + { \delta _{1}} \bigl( \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2} + { \bigl\Vert \Delta{{y^{k}} } \bigr\Vert ^{2}} \bigr), \end{aligned} \end{aligned}$$
(3.14)

where \(\delta _{2} >\delta _{1}>0 \).

Proof

We first give the proof of Algorithm 1.

From (3.1) and (3.4), for \(j=1,\ldots ,n\), we have

$$\begin{aligned} \begin{aligned} &{f_{j}} \bigl(x_{j}^{k + 1} \bigr) + \bigl\langle {\Delta x_{j}^{k + 1},{ \nabla _{{x_{j}}}}g \bigl( \mathbf{x}_{[1,j-1]}^{k+1}, \mathbf{x}_{[j,n]}^{k}, y^{k} \bigr)} \bigr\rangle \\ &\qquad{}- \bigl\langle {{\lambda ^{k}},\mathbf{Ax}_{[1,j]}^{k+1} + \mathbf{Ax}_{[j+1,n]}^{k} + B{y^{k}} - b} \bigr\rangle + \frac{\beta }{2}{ \bigl\Vert {\mathbf{Ax}_{[1,j-1]}^{k+1} +\mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b} \bigr\Vert ^{2}} \\ &\quad\le{f_{j}} \bigl(x_{j}^{k} \bigr) - \bigl\langle {{\lambda ^{k}},\mathbf{Ax}_{[1,j-1]}^{k+1} + \mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b} \bigr\rangle \\ &\qquad{}+\frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,j-1]}^{k+1} + \mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{j}^{k} - z_{j}^{k}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{j}^{k + 1} - z_{j}^{k}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

From (3.2) and (3.5), we have

$$\begin{aligned} \begin{aligned} &g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{ \lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr) - \bigl\langle {{\lambda ^{k}}, \mathbf{Ax}_{[1,n]}^{k+1}+ B{y^{k}} - b} \bigr\rangle + \frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k}} - b \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

Adding up the above mentioned formulas from \(j=1,\ldots ,n\), we have

$$\begin{aligned} &\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k + 1} \bigr) + g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k} \bigr) + g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k}}} \Vert ^{2}} \\ &\qquad{} - \sum_{i=1}^{n}{ \bigl\langle {\Delta x_{i}^{k + 1} ,{ \nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle } + \frac{{{\tau }}}{{\mathrm{{2}}}}\sum _{i=1}^{n}{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} \\ &\qquad{} - \frac{{{\tau }}}{{\mathrm{{2}}}}\sum_{i=1}^{n}{ \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}} \bigl\Vert {y^{k + 1} - y^{k}} \bigr\Vert ^{2}, \end{aligned}$$

hence

$$\begin{aligned} \begin{aligned} &\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k + 1} \bigr) + g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k} \bigr) +g \bigl(u^{k} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k}}} \Vert ^{2}} \\ &\qquad{}+ \underbrace{g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr)- g \bigl(u^{k} \bigr) - \sum_{i=1}^{n} \bigl\langle {\Delta x_{i}^{k + 1}, {\nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle }_{ \mathcal{A}} \\ & \qquad{}\underbrace{+\frac{{{\tau }}}{{\mathrm{{2}}}}\sum_{i=1}^{n}{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} - \frac{\tau}{2}\sum _{i=1}^{n}{ \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}}}_{ \mathcal{B}} - \frac{\tau }{2} \bigl\Vert {\Delta y^{k + 1}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

One the one hand, from Lemma 2.2, part \(\mathcal{A}\) can be written as

$$\begin{aligned} \begin{aligned} &g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr)- g \bigl(u^{k} \bigr) - \sum_{i=1}^{n} \bigl\langle {\Delta x_{i}^{k + 1}, {\nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle \\ &\quad=\sum_{i=1}^{n} \bigl\lbrace g \bigl( \mathbf{x}_{[1,i]}^{k+1}, \mathbf{x}_{[i+1,n]}^{k},{y^{k}} \bigr)\\ &\qquad{}- g \bigl(\mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k},y^{k} \bigr) - \bigl\langle {\Delta x_{i}^{k + 1},{\nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k},y^{k} \bigr)} \bigr\rangle \bigr\rbrace \\ &\quad\le\frac{l_{g}}{2} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$
(3.15)

On the other hand, by the definitions of \(z_{i}^{k}, i=1,2,\ldots ,n\), we have

$$\begin{aligned} \begin{aligned} &{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} - { \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}} \\ &\quad= \theta _{k}^{2}{ \bigl\Vert {x_{i}^{k-1} - x_{i}^{k}} \bigr\Vert ^{2}} - \bigl\Vert {x_{i}^{k+1} - x_{i}^{k} + {\theta _{k}} \bigl(x_{i}^{k} - x_{i}^{k - 1} \bigr)} \bigr\Vert { ^{2}} \\ &\quad= - { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} - 2{\theta _{k}} \bigl\langle {x_{i}^{k} - x_{i}^{k + 1},x_{i}^{k} - x_{i}^{k - 1}} \bigr\rangle \\ &\quad\le - { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} + {\theta _{k}} { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} + {\theta _{k}} { \bigl\Vert {x_{i}^{k} - x_{i}^{k - 1}} \bigr\Vert ^{2}} \\ &\quad=- (1 - {\theta _{k}}){ \bigl\Vert {x_{i}^{k+1}- x_{i}^{k}} \bigr\Vert ^{2}} + { \theta _{k}} { \bigl\Vert {x_{i}^{k} - x_{i}^{k - 1}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$

Thus, it can be inferred from part \(\mathcal{B}\) that

$$\begin{aligned} \begin{aligned} {\sum_{i=1}^{n} \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}}-{ \sum_{i=1}^{n} \bigl\Vert {x_{i}^{k+1} - z_{i}^{k}} \bigr\Vert ^{2}} \le - (1 - {\theta _{k}}){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \theta _{k}} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}$$
(3.16)

From Lemma 2.2, (3.15) and (3.16), we obtain

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl( \mathbf{x}_{[1,n]}^{k+1},y^{k+1},\lambda ^{k} \bigr) \le {}& {L_{ \beta }} \bigl({w^{k}} \bigr) + \frac{{{l_{g}}}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{{{\tau (1-{\theta _{k}}) }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} \\ &{}-\frac{\tau}{2} { \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}} + \frac{{\tau \theta _{k} }}{\mathrm{{2}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} . \end{aligned} \end{aligned}$$
(3.17)

Recall that

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) &= {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) + \bigl\langle {\Delta{\lambda ^{k+1}} ,{r_{k + 1}}} \bigr\rangle \\ &= {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) + \frac{1}{\beta} \bigl\langle \Delta{\lambda ^{k+1}}, \Delta{\lambda ^{k+1}} \bigr\rangle \\ &\le {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) +\frac{1}{\beta} \bigl\Vert {\Delta{\lambda ^{k+1}}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$
(3.18)

Submitting (3.9) and (3.17) into (3.18), we have

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{{{l_{g}}}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{{{\tau (1-{\theta _{k}}) }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{\tau}{2}{ \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{\theta _{k}} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} \\ &\qquad{}+ \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} { \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}}+ \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}} } { \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}} \\ &\quad={L_{\beta }} \bigl({w^{k}} \bigr) - \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} \\ &\qquad{}- \biggl( \frac{\tau}{2} - \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr) \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert ^{2} + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) + \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( \frac{\tau}{2} - \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Since \(\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{3 (\tau ^{2}+l_{g}^{2} )}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \} \), which further implies \(\frac{6(l_{g}^{2}+\tau ^{2})}{\beta \rho _{\min (B^{\mathrm{T}}B)}} < 1\) and \(\frac{\tau \theta _{k}}{2} > \frac{3(\tau ^{2}+l_{g}^{2})}{\beta \rho _{\min (B^{\mathrm{T}}B)}}\), then have

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr)+ \biggl( { \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( \frac{\tau}{2} - 1 \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2} \\ &\quad\le {L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{\tau \theta _{k}}{2} \bigl\Vert \Delta{y^{k} } \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Let \(\delta _{2}={\frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 , \delta _{1}=\frac{\tau}{2}\theta _{k}\). We get

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \bigl\Vert \Delta{{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + {\delta _{1}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} + { \bigl\Vert y^{k} \bigr\Vert ^{2}} \bigr). \end{aligned} \end{aligned}$$
(3.19)

Since , which further implies that \({ \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 > \frac{\tau \theta _{k}}{2} \), we obtain \(\delta _{2} >\delta _{1}>0 \). That is, (3.14) holds.

Similarly, for Algorithm 2, we obtain

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) + \biggl( { \frac{\tau (1 - \theta _{k} )}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} \\ &\qquad{}+ \biggl( \frac{\tau}{2} - \frac{l_{g}}{2} - \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr)+ \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert {\Delta y^{k} } \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Since , which further implies and , it follows that

$$\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr)+ \biggl( { \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{\tau \theta _{k}}{2} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}$$

Let \(\delta _{2}={\frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2}-1 , \delta _{1}=\frac{\tau}{2}\theta _{k}\). We have

$$\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \bigl\Vert {\Delta{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + {\delta _{1}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} + { \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}} \bigr). \end{aligned} \end{aligned}$$

Since , which further implies that \({ \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 > \frac{\tau \theta _{k}}{2} \), then we get \(\delta _{2}'>\delta _{1}'>0\). That is, (3.14) holds. The lemma is proved. □

Remark 2

Based on Lemma 3.2, we can define the following function

$$\begin{aligned} {\hat{L}_{\beta }} ( \hat{w} ) = {\hat{L}_{\beta }} ( {u,\lambda ,v} ) = {L_{\beta }} ( {u,\lambda } ) + {\delta _{\mathrm{{1}}}} { \Vert {u - v} \Vert ^{2}}, \end{aligned}$$
(3.20)

where

$$\begin{aligned} \begin{aligned} u= ( \mathbf{x}_{[1,n]},y ), v =( \tilde{ \mathbf{x}}_{[1,n]}, \tilde{y} ), \hat{w} = (u,\lambda ,v) = ( \mathbf{x}_{[1,n]},y , \lambda , \tilde{\mathbf{x}}_{[1,n]}, \tilde{y} ) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} { \Vert {u - v} \Vert ^{2}} = { \Vert \mathbf{x}_{[1,n]} - \tilde{\mathbf{x}}_{[1,n]} \Vert ^{2}} + \Vert y - \tilde{y} \Vert ^{2} . \end{aligned} \end{aligned}$$

Set \({\hat{\omega} ^{k + 1}} = (\mathbf{x}_{[1,n]}^{k+1},y^{k+1},{ \lambda ^{{k + 1}}}, \mathbf{x}_{[1,n]}^{k},y^{k} ), u^{k+1} = (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) \). Thus,

$$\begin{aligned} \begin{aligned} {\hat{L}_{\beta }} \bigl(\hat{ \omega}^{k+1} \bigr)={\hat{L}_{\beta }} \bigl( {{u^{{k + 1}}},{ \lambda ^{{k + 1}}},{u^{k}}} \bigr) = {L_{\beta }} \bigl( u^{k+1},{\lambda ^{{k + 1}}} \bigr) + {\delta _{\mathrm{{1}}}} \bigl({{{ \bigl\Vert \Delta u^{k + 1} \bigr\Vert }^{2}}} \bigr). \end{aligned} \end{aligned}$$
(3.21)

The following lemma implies that the sequence \({\hat{L}_{\beta }} ( {{u^{k}},{\lambda ^{k}},{u^{k-1}}} )\) is decreasing monotonically.

Lemma 3.3

Suppose \({\hat{L}_{\beta }}( {\hat{\omega}^{k+1}} )\) is defined as (3.20). Then, under Assumption A, for Algorithm 1 and Algorithm 2, we have:

$$\begin{aligned} \hat{ L }_{\beta} \bigl(\hat{\omega}^{k+1} \bigr)+ \delta \bigl( \bigl\Vert \Delta u^{k+1} \bigr\Vert ^{2} \bigr) \le {\hat{L}_{\beta }} \bigl(\hat{w}^{k} \bigr). \end{aligned}$$
(3.22)

That is, the sequence \(\{ {{{\hat{L}}_{\beta }}(\hat{\omega}^{k+1})}\}\) is decreasing.

Proof

Set \(\delta = {\delta _{2}} - {\delta _{1}} > 0\). Then the result follows directly from Lemma 3.2. □

3.3 The cluster points of \(\{\omega _{k}\}\) are contained in \(critL\)

In this subsection, together with the closeness of the limiting subdifferential mentioned above, we prove the subsequential convergence of the sequence \(\{\omega ^{k}\}\). The proof of Algorithm 2 is similar to that of Algorithm 1, so we omit the proof of Algorithm 2 here.

Lemma 3.4

Suppose \(\lbrace{\omega ^{k}}\rbrace \) is the sequence generated by Algorithm 1. If Assumption A holds, then the following statements are true:

(I) The sequence \(\{\omega ^{k}\} \) is bounded. (II) \(\hat{L}_{\beta}(\hat{\omega}^{k})\) is bounded from below and convergent, additionally,

$$\begin{aligned} \sum_{k\ge 0} \bigl\Vert \omega ^{k+1}- \omega ^{k} \bigr\Vert ^{2} < +\infty . \end{aligned}$$
(3.23)

(III) The sequences \(\hat{L}_{\beta}(\hat{\omega}^{k})\) and \({L}_{\beta}({\omega}^{k})\) have the same limit \(\hat{L}_{*}\).

Proof

(I) Because of the decreasing property of \(\{\hat{L}_{\beta}(\hat{\omega}^{k})\} \), we get

$$\begin{aligned} \begin{aligned} L_{\beta} \bigl(\omega ^{k} \bigr) \le \hat{L}_{\beta} \bigl(\hat{\omega}^{k} \bigr)\le \hat{L}_{\beta} \bigl(\hat{\omega}^{0} \bigr) = L_{\beta} \bigl(\omega ^{0} \bigr) + \delta \bigl( \bigl\Vert u^{0}-u^{-1} \bigr\Vert ^{2} \bigr)=L_{\beta} \bigl(\omega ^{0} \bigr), \end{aligned} \end{aligned}$$

where \(\|u^{0}-u^{-1}\|^{2}\) is due to the Initialization parameters \(x_{i}^{0}=x_{i}^{-1}, i=1,\ldots ,n\) and \(y^{0}=y^{-1}\) in Algorithm 1. Hence, \(\{\omega ^{k}\}\subseteq \{\omega ^{k}\in X:L_{\beta}(\omega )\leq L_{ \beta}({\omega}^{0})\} \). By Assumption A(V), the sequence \(\{\omega ^{k}\} \) is bounded.

(II) Since \(\lbrace{\omega ^{k}}\rbrace \) is bounded, \(\lbrace{\hat{\omega}^{k}}\rbrace \) is also bounded, and it has at least one cluster point. Let \(\hat{\omega}^{*}\) be a cluster point of \(\lbrace{\hat{\omega}^{k}}\rbrace \), and \(\lim_{j\rightarrow +\infty}\hat{\omega}^{k_{j}}={\hat{\omega}^{*}}\). Because of the fact that \(f_{i} (i=1,2,\ldots ,n)\) are proper lower semicontinuous, and g is continuously differentiable, then \(\hat{L}_{\beta} (\cdot )\) is proper lower semicontinuous. Hence, we have

$$\begin{aligned} \begin{aligned} \lim_{j \to +\infty}\inf \hat{L}_{\beta} \bigl(\hat{\omega}^{k_{j}} \bigr) \ge \hat{L}_{\beta} \bigl(\hat{\omega}^{*} \bigr). \end{aligned} \end{aligned}$$

According to the boundedness of \(f_{i}\), g, \(\{\omega ^{k}\}_{k\ge 0}\) and the definition of \(\hat{L}_{\beta}(\hat{\omega}^{k})\), we have \(\hat{L}_{\beta}(\omega ^{k})\) is bounded from below. Thus, \(\hat{L}_{\beta}(\hat{\omega}^{k_{j}})\) is also bounded from below. From Lemma 3.3, \(\hat{L}_{\beta}(\hat{\omega}^{k})\) is monotonically decreasing, and we obtain that \(\hat{L}_{\beta}(\hat{\omega}^{k_{j}})\) is convergent. Since \(\hat{L}_{\beta}(\hat{\omega}^{k})\) is monotonically decreasing, \(\hat{L}_{\beta}(\hat{\omega}^{k})\) is also convergent and \(\hat{L}_{\beta}(\hat{\omega}^{*}) \le \hat{L}_{\beta}(\hat{\omega}^{k})\). It follows from (3.22) that

$$\begin{aligned} \delta \bigl( { {{ \bigl\Vert {\Delta{u^{k+1}} } \bigr\Vert }^{2}}} \bigr) \le {\hat{L}_{\beta }} \bigl(\hat{w}^{k} \bigr)-\hat{ L_{\beta }} \bigl(\hat{w}^{k+1} \bigr). \end{aligned}$$

Summing up the above inequality for \(k =0,\ldots ,N\) and letting \(N \to \infty \), we have

$$\begin{aligned} \begin{aligned} \delta \sum_{k=1}^{+\infty} \bigl( { {{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{ \bigl\Vert { \Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr) \le {\hat{L}_{\beta }} \bigl( \hat{w}^{0} \bigr)-\hat{ L_{\beta }} \bigl(\hat{w}^{*} \bigr) < +\infty . \end{aligned} \end{aligned}$$

Since \(\delta > 0\), it follows that

$$\begin{aligned} \begin{aligned} \sum_{k=1}^{+\infty} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} < +\infty , \qquad\sum_{k=1}^{+\infty} \bigl\Vert {\Delta y^{k + 1}} \bigr\Vert ^{2} < +\infty . \end{aligned} \end{aligned}$$
(3.24)

Consequently, due to (3.9), we have

$$\begin{aligned} \begin{aligned} \sum_{k=1}^{+\infty} \bigl\Vert {\lambda ^{k + 1}} - {\lambda ^{{k}}} \bigr\Vert ^{2} < + \infty . \end{aligned} \end{aligned}$$
(3.25)

Then, .

(III) From (3.24), we have \(\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \Vert ^{2} \to 0\) and \(\Vert {\Delta y^{k + 1}} \Vert ^{2} \to 0\). Combining with the definition of \({\hat{L}_{\beta }}(\hat{w}^{k})\) in (3.21) yields \(\hat{L}_{*} = \lim_{k\to +\infty}\hat{L}_{\beta}(\hat{\omega}^{k}) = \lim_{k\to +\infty}{L}_{\beta}({\omega}^{k}) \). The lemma is proved. □

The following lemma provides upper estimates for the limiting subgradients of \(\hat{L}_{\beta}(\cdot )\), which is important for the convergence analysis of the sequence generated by Algorithm 1 and Algorithm 2.

Lemma 3.5

Let \(\{ {{\omega ^{k}}} \}\) be a sequence generated by Algorithm 1. Then, there exists \(C > 0\) such that

$$\begin{aligned} \begin{aligned} d \bigl( {0,\partial {L_{\beta }} \bigl( {{\omega ^{k + 1}}} \bigr)} \bigr) \le C \Biggl( \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k+1} \bigr\Vert + \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert + \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k} \bigr\Vert + \bigl\Vert \Delta{y^{k }} \bigr\Vert \Biggr). \end{aligned} \end{aligned}$$
(3.26)

Proof

By the definition of the augmented Lagrangian function \({L_{\beta }} ( \cdot )\), we have

$$\begin{aligned} \textstyle\begin{cases} {\partial _{{x_{j}}}}{L_{\beta }}(u^{k+1},{\lambda ^{k + 1}} ) = \partial {f_{j}}( {x_{j}^{{k + 1}}} ) + {\nabla _{{x_{j}}}}g ( \mathbf{x}_{[1,n]}^{k+1},y^{k+1} )- A_{j}^{T}({\lambda ^{k + 1}} - \beta{r^{k + 1}}), \\ {\partial _{y}}{L_{\beta }}( u^{k+1},{\lambda ^{k + 1}} ) = {\nabla _{y}}g (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) - {B^{T}}{\lambda ^{k + 1}} + \beta {B^{T}}{r^{k + 1}}, \\ {\partial _{\lambda }}{L_{\beta }}( {u^{k+1},{\lambda ^{k + 1}}}) = \frac{1}{\beta }({\lambda ^{k }} - {\lambda ^{k+1}}). \end{cases}\displaystyle \end{aligned}$$
(3.27)

From the optimality conditions of (3.1)–(3.2), we have

$$\begin{aligned} \textstyle\begin{cases} - {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,j-1]}^{k+1},\mathbf{x}_{[j,n]}^{k},y^{k} ) + A_{j}^{T}{\lambda ^{k + 1}} - \beta A_{j}^{T} \Delta \mathbf{Ax}_{[j+1,n]}^{k+1} - \beta A_{j}^{T}B({y^{k}} - {y^{k + 1}}) \\ \quad{} - {\tau}(x_{j}^{k+1} - z_{j}^{k}) \in \partial {f_{j}}( {x_{j}^{k+1}} ), \\ {B^{\mathrm{T}}}{\lambda ^{k + 1}} - {\tau}({y^{k + 1}} - y^{k}) = { \nabla _{y}}g( {{u^{k + 1}}} ), \\ {\lambda ^{k + 1}} = {\lambda ^{k}} - \beta ( \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k+1}} - b), \end{cases}\displaystyle \end{aligned}$$
(3.28)

where \(\Delta \mathbf{Ax}_{[j+1,n]}^{k+1} =\mathbf{Ax}_{[j+1,n]}^{k+1} - \mathbf{Ax}_{[j+1,n]}^{k} \). Putting (3.28) into (3.27), we have

$$\begin{aligned} { \bigl( {\rho _{1}^{k + 1},\rho _{2}^{k + 1}, \ldots ,\rho _{n}^{k + 1}}, \rho _{n+1}^{k + 1}, \rho _{n+2}^{k + 1} \bigr)^{T}} \in \partial {L_{\beta }} \bigl( {x_{1}^{k + 1},x_{2}^{k + 1}, \ldots ,x_{n}^{k+1},{y^{k + 1}},{ \lambda ^{k + 1}}} \bigr), \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \textstyle\begin{cases} \rho _{j}^{{k + 1}} = {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) - {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,j-1]}^{k+1}, \mathbf{x}_{[j,n]}^{k},y^{k} ) + A_{j}^{T}({\lambda ^{k}} - {\lambda ^{k + 1}}) \\ \phantom{\rho _{j}^{{k + 1}} =}{} + \beta A_{j}^{T}\Delta \mathbf{Ax}_{[j+1,n]}^{k+1} + \beta A_{j}^{T}B({y^{k}} - {y^{k + 1}}) - {\tau}(x_{j}^{k+1} - z_{j}^{k}), (j=1,\ldots ,n), \\ \rho _{n+1}^{k+1} = \beta {B^{\mathrm{T}}}({\lambda ^{k}} - { \lambda ^{k + 1}}) - {\tau}({y^{k + 1}} - y^{\mathrm{{k}}}), \\ \rho _{n+2}^{k+1} = \frac{1}{\beta }({\lambda ^{k }} - { \lambda ^{k+1}}). \end{cases}\displaystyle \end{aligned} \end{aligned}$$
(3.29)

Since ∇g is Lipschitz continuous on bounded subsets and \(\{ \omega ^{k} \}\) is bounded, by (III) of Assumption A, combining (3.14), there exists \(C > 0\) such that

$$\begin{aligned} \begin{aligned} d \bigl(0,\partial L_{\beta} \bigl({\omega ^{k + 1}} \bigr) \bigr) \le C \Biggl( \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k+1} \bigr\Vert + \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert + \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k} \bigr\Vert + \bigl\Vert \Delta{y^{k }} \bigr\Vert \Biggr). \end{aligned} \end{aligned}$$

Similarly, we can derive the same conclusion for Algorithm 2. We omit the proof here. □

Theorem 3.1

Denote the set of the cluster points of the sequence \(\{ {{\omega ^{k}}} \}\) and \(\{ {{{\hat{\omega}}^{k}}}\} \) by Ω and Ω̂, respectively. We have that:

(I) If \(\omega ^{*}\) is a cluster of \(\{\omega ^{k}\}\), then it has a convergent subsequence \(\{\omega ^{k_{j}}\}_{j\ge 0}\) such that \(\lim_{j\to +\infty}w^{k_{j}} = w^{*} \), then

$$\begin{aligned} \begin{aligned} \lim_{j\to \infty} L_{\beta} \bigl( \omega ^{k_{j}} \bigr) = L_{\beta} \bigl(\omega ^{*} \bigr). \end{aligned} \end{aligned}$$

(II) \(\Omega \subseteq critL_{\beta}\).

(III) \(\lim_{k\to +\infty}d(\omega ^{k},\Omega )\).

(IV) \(\{ {{\omega ^{k}}} \}\) is non-empty compact and connected sets.

Proof

(I) Since \(x_{i}^{k_{j}+1}\) is the minimizer of \(x_{i}\)-subproblem, we have

$$\begin{aligned} &{f_{i}} \bigl(x_{i}^{k_{j} + 1} \bigr) + \bigl\langle {x_{i}^{k_{j} + 1} - x_{i}^{k_{j}},{ \nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k_{j}+1}, \mathbf{x}_{[i,n]}^{k_{j}},y^{k_{j}} \bigr)} \bigr\rangle - \bigl\langle {{\lambda ^{k_{j}}}, \mathbf{Ax}_{[1,i]}^{k_{j}+1} +\mathbf{Ax}_{[i+1,n]}^{k_{j}}+ B{y^{k_{j}}} - b} \bigr\rangle \\ &\qquad{}+ \frac{\beta }{2}{ \bigl\Vert {\mathbf{Ax}_{[1,i]}^{k_{j}+1} + \mathbf{Ax}_{[i+1,n]}^{k_{j}}+ B{y^{k_{j}}} - b} \bigr\Vert ^{2}} \\ &\quad\le {f_{i}} \bigl(x_{i}^{*} \bigr) + \bigl\langle {x_{i}^{*} - x_{i}^{k_{j}},{ \nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k_{j}+1}, \mathbf{x}_{[i,n]}^{k_{j}},y^{k_{j}} \bigr)} \bigr\rangle \\ &\qquad{} - \bigl\langle {{\lambda ^{k_{j}}}, \mathbf{Ax}_{[1,i-1]}^{k_{j}+1} +A_{i}x_{i}^{*} +\mathbf{Ax}_{[i+1,n]}^{k_{j}} + B{y^{k_{j}}} - b} \bigr\rangle \\ &\qquad{}+\frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,i-1]}^{k_{j}+1} +A_{i}x_{i}^{*} +\mathbf{Ax}_{[i+1,n]}^{k_{j}} + B{y^{k_{j}}} - b \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{i}^{*} - z_{i}^{k_{j}}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{i}^{{k_{j}} + 1} - z_{i}^{k_{j}}} \bigr\Vert ^{2}}. \end{aligned}$$

Combing the inequality above with \(\lim_{j\to \infty}{\omega}^{k_{j}+1}=\omega ^{*}\), we have

$$\begin{aligned} \begin{aligned} \limsup_{j\to \infty}f_{i} \bigl(x_{i}^{k_{j}+1} \bigr)\le f_{i} \bigl({x^{*}} \bigr). \end{aligned} \end{aligned}$$

Since \(f_{i} ( i=1,\ldots ,n)\) is lower semicontinous, \(f_{i}(x_{i}^{*})\le \lim \inf_{j\to \infty}f_{i}(x_{i}^{k_{j}+1})\). It follows that

$$\begin{aligned} \lim_{j\to \infty}f_{i} \bigl(x_{i}^{k_{j}+1} \bigr)=f_{i} \bigl({x^{*}} \bigr). \end{aligned}$$

Since g is continuous, we further obtain

$$\begin{aligned} \begin{aligned} &\lim_{j\to +\infty} L_{\beta} \bigl( \omega ^{k_{j}} \bigr) \\ &\quad=\lim_{j\to +\infty} \Biggl( \sum_{i=1}^{n}{f_{i}} \bigl( {{x_{i}}}^{k_{j}} \bigr) +g \bigl( \mathbf{x}_{[1,n]}^{k_{j}},y^{k_{j}} \bigr) - \bigl\langle {\lambda ^{k_{j}} , \mathbf{Ax}_{[1,n]}^{k_{j}} + By^{k_{j}} - b} \bigr\rangle \\ &\qquad{} +\frac{\beta}{2} \bigl\Vert \mathbf{Ax}_{[1,n]}^{k_{j}} + By^{k_{j}} - b \bigr\Vert ^{2} \Biggr) \\ &\quad= \sum_{i=1}^{n}{f_{i}} \bigl( {{x_{i}}}^{*} \bigr) +g \bigl( \mathbf{x}_{[1,n]}^{*},y^{*} \bigr) - \bigl\langle { \lambda ^{*} , \mathbf{Ax}_{[1,n]}^{*} + By^{*} - b} \bigr\rangle + \frac{\beta}{2} \bigl\Vert \mathbf{Ax}_{[1,n]}^{*} + By^{*} - b \bigr\Vert ^{2} \\ &\quad=L_{\beta} \bigl(\omega ^{*} \bigr). \end{aligned} \end{aligned}$$

(II) From Lemma 3.4, we have that \(x_{i}^{k+1} - \i ^{k} \to 0, y^{k+1} - y^{k} \to 0 \) and \(\lambda ^{k+1} - \lambda ^{k} \to 0\). Thus, according to Lemma 3.5, it follows that \(\partial L_{(}\omega ^{k_{j}}) \to 0\) as \(j\to \infty \), while \(\omega ^{k_{j}} \to \omega ^{*}\) and \(L_{\beta}(\omega ^{k_{j}}) \to L_{\beta}(\omega ^{*}) \) as \(j\to \infty \). Because of the closeness of \(\partial f_{i}\), the continuity of ∇g and the relation above, we take limit \(k=k_{j}\to \infty \) in (3.28), and then we have

$$\begin{aligned} \textstyle\begin{cases} - {\nabla _{{x_{j}}}}g( \mathbf{x}_{[1,n]}^{{{*}}},{y^{*}}) + A_{j}^{ \mathrm{T}}{\lambda ^{*}} \in \partial {f_{j}} ( {x_{j}^{*}} ), \quad j = 1,\ldots ,n, \\ {\nabla _{y}}g( {\mathbf{x}_{[1,n]}^{{{*}}},{y^{*}}}) = {B^{\mathrm{T}}}{ \lambda ^{*}}, \\ \mathbf{Ax}_{[1,n]}^{*} + B{y^{*}} - b = 0, \end{cases}\displaystyle \end{aligned}$$

which implies that \(\omega ^{*}\) is a critial point of \(L_{\beta} (\cdot )\). According to (3.23), \(\{\omega ^{k}\}\) is convergent. Thus, \(\omega ^{*}\) is a cluster point of \(\{\omega ^{k}\}\), i.e., \(\Omega \subseteq critL_{\beta}\).

(III), (IV) The proof follows a similar approach to that of [Theorems 5(ii) and (iii) in Bolte et al. [19]], while incorporating the insights from Remark 5 within the same reference. This remark establishes that the properties detailed in (III) and (IV) are inherent to sequences satisfying the convergence condition \(w^{k+1}-w^{k} \to 0\) as \(k\to +\infty \). Such generic nature is indeed applicable in our context, as demonstrated by (3.23). □

3.4 Global convergence under the Kurdyka–Łojasiewicz property

In this subsection, we prove the global convergence of \(\{(\mathbf{x}_{[1,n]} , y^{k}, \lambda ^{k})\}\) generated by Algorithm 1 and Algorithm 2 with the help of the Kurdyka–Łojasiewicz property. Since the proofs of two algorithms are identical, in this subsection, we only prove the global convergence of Algorithm 1.

Theorem 3.2

(Global convergence)

Suppose that Assumption A holds, and \(\hat{L} ( {\hat{\omega}} )\) satisfies the KŁ property at each point of Ω̂, then

(I) \(\sum_{k = 1}^{\infty }{\| {{\omega ^{k}} - {\omega ^{k - 1}}}\|} < \infty \).

(II) \(\{ {{\omega ^{k}}} \}\) converges to a critical point of \(L ( \cdot )\).

Proof

From Theorem 3.1, we have \(\mathop {\lim }_{k \to + \infty } \hat{L}( {{{\hat{\omega}}^{k}}} ) = \hat{L} ( {{{\hat{\omega}}^{*}}} )\) for all \({\hat{\omega}^{*}} \in \hat{\Omega}\). We consider two cases.

(i) If there exists an integer \({k_{0}}\) such that \({\hat{L}_{\beta }}( {{{\hat{\omega}}^{{k_{0}}}}}) = {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )\). From Lemma 3.3, for all \(k > {k_{0}}\), we have

$$\begin{aligned} \begin{aligned} \delta \bigl( \Vert \Delta \mathbf{x}_{[1,n]} \Vert ^{2} + \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} \bigr) \le {{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) \le {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{{k_{0}}}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) = 0 . \end{aligned} \end{aligned}$$
(3.30)

Thus, for any \(k > {k_{0}}\), we have \(x_{i}^{k + 1} = x_{i}^{k}, i=1,2,\ldots ,n, {y^{k + 1}} = {y^{k}}\). Hence, for any \(k > {k_{0}} + 1\), one has \({\hat{\omega}^{k + 1}} = {\hat{\omega}^{k}}\), and the assertion holds.

(ii) Since \(\{ \hat{L}_{\beta}(\hat{\omega}^{k})\}\) is nonincreasing, it holds that \({\hat{L}_{\beta }}( {{{\hat{\omega}}^{k}}} ) > {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )\) for all \(k >1\). Since \(\mathop {\lim }_{{k} \to + \infty }d( {{{\hat{\omega}}^{k}}, \hat{\Omega}} )= 0\), for any given \(\varepsilon > 0\), there exists \({k_{1}} > 0\), such that for any \(k > {k_{1}}\), \(d( {{{\hat{\omega}}^{k}},\hat{\Omega}}) < \varepsilon \). Since \(\mathop {\lim }_{{k_{j}} \to + \infty } {\hat{L}_{\beta }}( {{{ \hat{\omega}}^{k}}} ) = {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )\), for any given \(\eta > 0\), there exists \({k_{2}} > 0\),\({ \hat{L}_{\beta }}( {{{\hat{\omega}}^{k}}}) < {\hat{L}_{\beta }} ( {{{ \hat{\omega}}^{*}}} ) + \eta \), for all \(k > {k_{2}}\). Consequently, when \(k > \tilde{k}: = \max \{ {{k_{1}},{k_{2}}} \}\),

$$\begin{aligned} d \bigl( {{{\hat{\omega}}^{k}},\hat{\Omega}} \bigr) < \varepsilon , { \hat{L}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) < {\hat{L}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) < {\hat{L}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) + \eta . \end{aligned}$$
(3.31)

Since \({\{ {{{\hat{\omega}}^{k}}} \}} \) is non-empty compact set, and \({\hat{L}_{\beta }} ( \cdot )\) is constant on Ω̂, applying Lemma 2.1, we have

$$\begin{aligned} \varphi ' \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{ \hat{L}}_{\beta }} {{{\hat{\omega}}^{*}}} } \bigr) d \bigl( {0, \partial {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr)} \bigr) \ge 1, \quad\forall k > \tilde{k}. \end{aligned}$$
(3.32)

Let \({a_{k}}:= \sum_{i=1}^{n}\|\Delta x_{i}^{k} \| + \|\Delta y^{k} \|\). \(\forall k > \tilde{k}\). From Lemma (3.5), one has

$$\begin{aligned} \frac{1}{{\varphi '( {{{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k}}}) - {{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{*}}})} )}} \le d \bigl( {0,\partial {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr)} \bigr) \le C_{2} ( {a_{k}}+ {a_{k+1}} ). \end{aligned}$$
(3.33)

From the concavity of φ, we have

$$\begin{aligned} \begin{aligned} &\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \\ &\quad\ge \varphi ' \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k + 1}}} \bigr)} \bigr) \\ &\quad\ge \frac{{ {{{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} ) - {{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k + 1}}} )} }}{{d ( {0,\partial {{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} )} )}} \ge \frac{{{{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k}}} ) - {{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k + 1}}})}}{{C( {a_{k}}+ {a_{k+1}} )}}. \end{aligned} \end{aligned}$$
(3.34)

From Lemma 3.3, we have

$$\begin{aligned} & {{\delta \bigl( { {{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{ \bigl\Vert {\Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr)}} \\ &\quad\le \bigl(\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigr){{C({a_{k}}+ {a_{k+1}} )}}. \end{aligned}$$

From the inequality \(\sum_{i=1}^{n}a_{i}\le \sqrt{n\sum_{i=1}^{n}a_{i}^{2}}\) and \(\sqrt{ab}\le a+\frac{1}{4}b\), we obtain

$$\begin{aligned} \begin{aligned} a_{k+1} \le{}& \bigl( { {{(n+1) \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{(n+1) \bigl\Vert {\Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr)^{\frac{1}{2}} \\ \le{} &\sqrt{\frac{C(n+1)}{\delta} \bigl(\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigr){{({a_{k}}+ {a_{k+1}} )}}} \\ \le{} & \underbrace{\sqrt{\frac{C(n+1)}{\delta}} \bigl( {\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k+1}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr)} \bigr)}_{a} + \frac{1}{4}{\underbrace{({a_{k}}+ {a_{k+1}} )}_{b}}. \end{aligned} \end{aligned}$$

Summing up the above inequality from \(k=k'+2,\ldots ,M\) yields

$$\begin{aligned} \begin{aligned} \sum_{k=k'+2}^{M} {a_{k+1}} \le {}& \sqrt{\frac{C(n+1)}{\delta}} \bigl( { \varphi \bigl( {{{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k'+2}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{M}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr)} \bigr) \\ &{}+ \frac{1}{4}\sum_{k=k'+1}^{M}{{( {a_{k}}+ {a_{k+1}} )}}. \end{aligned} \end{aligned}$$

Letting \(M\to \infty \), we get

$$\begin{aligned} \sum_{k=k'+2}^{\infty} {a_{k+1}} \le 2 \sqrt{\frac{C(n+1)}{\delta}} \bigl( \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k'+2}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{ \hat{\omega}}^{*}}} \bigr)} \bigr) \bigr) - \frac{1}{2} {a_{k'+1}}. \end{aligned}$$

Since \(\delta ,C>0\) and \({a_{k'+1}}\) is a constant, \(\sum_{k=k'+2}^{\infty} {a_{k+1}} < \infty \). Therefore, \(\sum_{k=1}^{\infty} \| \omega ^{k+1}-\omega ^{k}\| < \infty \). (I) is proved.

(II) \(\{\omega ^{k}\}\) is a Cauchy sequence, and thus it is convergent. Combining (I) with Theorem 3.1, we obtain that \(\{ {{\omega ^{k}}} \}\) converges to a critical point of \(L_{\beta} ( \cdot )\). □

4 Numerical experiments

This section presents the numerical experiment outcomes of applying Algorithm 1 and Algorithm 2 to \(l_{\frac{1}{2}}\)-regularization problem and matrix decomposition problem. All experimental computations were executed using Matlab 2020b running on a Windows 11 system-equipped laptop with an AMD Ryzen 5 3550H CPU operating at 3.5 GHz and backed by 16 GB of RAM.

4.1 \(l_{\frac{1}{2}}\)-regularization problem

In compressed sensing, we consider the following optimization problem

$$\begin{aligned} \begin{aligned} \min_{x} \Vert Mx-b \Vert ^{2} + \varphi \Vert x \Vert _{0}, \end{aligned} \end{aligned}$$
(4.1)

where \(M\in \mathbb{R}^{m\times n}\) is the measuring matrix, \(b\in \mathbb{R}^{n}\) is the observation vector, φ is the regular parameter. \(\| x\|_{0}\) denotes the number of nonzero components of x. However, the problem (4.1) is NP-hard, some scholars relax \(l_{0}\) norm to \(l_{\frac{1}{2}}\) norm in practical applications [28], then the problem is exported to the following nonconvex problem:

$$\begin{aligned} \begin{aligned} &\min \varphi \Vert x \Vert _{(1/2)}^{(1/2)}+\frac{1}{2}{{ \Vert y \Vert }^{2}} \\ &\quad\text{s.t}\text{. }Mx-y=b, \end{aligned} \end{aligned}$$
(4.2)

where \(\|x\|_{\frac{1}{2}}=(\sum_{i=1}^{n} \| x_{i}\| ^{\frac{1}{2}})^{2}\).

Based on (4.2), we construct the following problem:

$$\begin{aligned} \begin{aligned} &\min_{x_{1},x_{2},y} c \Vert x_{1} \Vert _{(1/2)}^{(1/2)}+ \frac{1}{2}{{ \Vert x_{2} \Vert }^{2}}+ \frac{1}{2}{{ \Vert {{B}_{1}}x_{1}+{{B}_{2}}x_{2}+y \Vert }^{2}} \\ &\quad\text{s.t.} A_{1}x_{1}+A_{2}x_{2}+y=b. \end{aligned} \end{aligned}$$
(4.3)

To verify the validity of Algorithm 1 and Algorithm 2, we test them and compare them with LADMM.Footnote 1

Applying Algorithm 1 to problem (4.3) yields

$$\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H \biggl(\frac{1}{\mu _{1}} \biggl[ \tau{{z}_{1}^{k}}-{{B}_{1}}^{T} \bigl({{B}_{1}} {{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}} \bigr)\\ \phantom{{{x_{1}}^{k+1}}=}{}- \beta {{A_{1}}^{T}} \biggl(A_{2}{{x_{2}}^{k}} +y^{k} -b- \frac{{{\lambda }^{k}}}{\beta } \biggr) \biggr] , \frac{2c}{{{\mu }_{1}}} \biggr), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} \biggl[\tau z_{2}^{k}-B_{2}^{T} \bigl(B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} \bigr) -\beta A_{2}^{T} \biggl( A_{1}x_{1}^{k+1}+y^{k}-b- \frac{\lambda _{k}}{\beta} \biggr) \biggr] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{3}}} \biggl[\tau{{y}^{k}}- \bigl({{B}_{1}} {{x}^{k+1}}+{{B}_{2}} {{x_{2}}^{k+1}} \bigr)- \beta \biggl( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta } \biggr) \biggr], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta \bigl(A{{x}^{k+1}}+B{{y}^{k+1}}-b \bigr), \end{cases}\displaystyle \end{aligned}$$

where \(\mu _{1}={\tau +\beta \rho _{{\max}{ ( A_{1}^{\mathrm{{T}}}A_{1} )}}}, \mu _{2}=1+\tau +\beta \rho _{{\max} ( A_{2}^{\mathrm{{T}}}A_{2} )},\mu _{3}=1+\tau +\beta \), and \(H(\cdot ,\cdot )\) is the half shrinkage operator [29] defined as \(H ( x,\alpha ) = \{ h_{\alpha}^{1}, h_{\alpha}^{2},\ldots h_{ \alpha}^{n} \} \) with

x 1 (i)= { 2 x i 3 ( 1 + cos ( 2 3 ( π ϕ ( | h α i | ) ) ) ) | h α i | > 54 3 4 α 2 / 3 ; 0 otherwise ;
(4.4)

where

$$\begin{aligned} \begin{aligned} \phi \bigl( \bigl\lvert h_{\alpha}^{i} \bigr\rvert \bigr)=\arccos \biggl( \frac{\alpha}{8} \biggl( \frac{\lvert h_{\alpha}^{i} \rvert }{3} \biggr)^{-(3/2)} \biggr) . \end{aligned} \end{aligned}$$

Applying Algorithm 2 to problem (4.3) yields

$$\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H (\frac{1}{\mu _{1}} [ \tau{{z}_{1}^{k}}-{{B}_{1}}^{T}({{B}_{1}}{{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}})- \beta {{A_{1}}^{T}}(A_{2}{{x_{2}}^{k}}-b- \frac{{{\lambda }^{k}}}{\beta }) ] , \frac{2c}{{{\mu }_{1}}} ), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} [\tau z_{2}^{k}-B_{2}^{T} (B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} ) -\beta A_{2}^{T} ( A_{1}x_{1}^{k+1}+By^{k}-b-\frac{\lambda _{k}}{\beta} ) ] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{4}}}[\tau{{y}^{k}}-({{B}_{1}}{{x}^{k+1}}+{{B}_{2}}{{x_{2}}^{k+1}}+y^{k})- \beta ( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta } ) ], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta (A{{x}^{k+1}}+B{{y}^{k+1}}-b), \end{cases}\displaystyle \end{aligned}$$

where \(\mu _{4}=\tau +\beta \). Applying LADMM to problem (4.3), we obtain

$$\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H (\frac{1}{\mu _{1}} [ \tau{{x}_{1}^{k}}-{{B}_{1}}^{T}({{B}_{1}}{{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}})- \beta {{A_{1}}^{T}}(A_{2}{{x_{2}}^{k}}-b- \frac{{{\lambda }^{k}}}{\beta }) ] , \frac{2c}{{{\mu }_{1}}} ), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} [\tau x_{2}^{k}-B_{2}^{T} (B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} ) -\beta A_{2}^{T}( A_{1}x_{1}^{k+1}+By^{k}-b- \frac{\lambda _{k}}{\beta}) ] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{3}}}[\tau{{y}^{k}}-({{B}_{1}}{{x}^{k+1}}+{{B}_{2}}{{x_{2}}^{k+1}})- \beta ( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta }) ], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta (A{{x}^{k+1}}+B{{y}^{k+1}}-b). \end{cases}\displaystyle \end{aligned}$$

In experiment, we configure the parameter as follows: the dimensions are set to \(m=5000, n=1000\), the regularization parameter is chosen as \(\beta =1000\). \(b=0\), \(c=1\), and the inertial parameter is fixed at \(\theta =0.15\). The initial points are selected as \(x_{1}^{-1}= x_{1}^{0}=0\), \(x_{2}^{-1}= x_{2}^{0}=0\), \(y^{0}=0\), and \(\lambda ^{0}=0\). \(A_{1}, A_{2}, B_{1}, B_{2}\) are random matrices. The stop** criterion of all these methods are defined as

$$\begin{aligned} \begin{aligned} \Vert r_{k} \Vert = \bigl\Vert A_{1}x_{1}^{k}+A_{2}x_{2}^{k} +y-b \bigr\Vert \le 10^{-8}. \end{aligned} \end{aligned}$$

Throughout the testing phase, we conduct experiments with four cases \(\tau =30, \tau =35, \tau =40\) and \(\tau =45\), respectively. The numerical results of the three algorithms are reported in Table 1. We report the number of iterations required to satisfy the stop** criterion (“Iter”), the total computing time in seconds (“times”), and the value of the stop** criterion (“log(Crit)”). Moreover, to visually illustrate the convergence behavior, the curves of the objective value and \(\log (\|r_{k}\|)\) at \(\tau =45\cdot \) are presented in Fig. 1.

Figure 1
figure 1

\(m=5000,n=1000,\tau =45,\beta =1000\), the convergence results for LADMM (\(\theta =0\)), SPLI-ADMM, and SCLI-ADMM ( \(\theta =0.2\))

Table 1 Numerical results under different τ

From Table 1, we can see that the two proposed algorithms have higher time efficiency and fewer iterations in comparison with LADMM. Figure 1(a) illustrates the trends of the objective value under the same iterations, clearly indicating that SPLIADMM and SCLIADMM have better performance of convergence than LADMM. Figure 1(b) again demonstrates the high time efficiency of our two algorithms, especially when “log(Crit)” is less than −4.

4.2 Matrix decomposition

Now, we consider the matrix decomposition problem, which has the following form:

$$\begin{aligned} \min \Vert L \Vert _{*}+\alpha \Vert S \Vert _{1}+ \frac{\omega}{2} \Vert T-M \Vert ^{2}\quad \text{s.t. } L+S=T, \end{aligned}$$
(4.5)

where \(M\in \mathbb{R}^{p\times n}\) is the observed matrix, and \(L,S,T \in \mathbb{R}^{p\times n}\) are the decision variables. The nuclear norm \(\|L\|_{*}:=\sum_{i=1}^{\min(p,n)}\vert \sigma _{i}(L)\vert ^{ \frac{1}{2}}\), the spares term \(\|S\|_{1}:=\sum_{i=1}^{n}\sum_{i=1}^{p}\vert S_{ij}\vert \), ω is the penalty factor, and α is the trade-off parameter between the nuclear norm \(\|L\|_{*}\) and the \(l_{1}\)-norm \(\|S\|_{1}\). The ALF of problem (4.5) is defined as

$$\begin{aligned} \begin{aligned} L_{\beta} (L,S,T,\lambda )= \Vert L \Vert _{*}+\alpha \Vert S \Vert _{1}+ \frac{\omega}{2} \Vert T-M \Vert ^{2} -\langle \lambda , L+S-T\rangle + \frac{\beta}{2} \Vert L+S-T \Vert ^{2}, \end{aligned} \end{aligned}$$

where λ is the Lagrange multiplier.

Applying SPLI-ADMM to problem (4.5), we get the closed-form iterative formulas:

$$\begin{aligned} \textstyle\begin{cases} z_{L}^{k}=L^{k}+\theta (L^{k}-{L}^{k-1} ), z_{S}^{k}=S^{k}+ \theta (S^{k}-{S}^{k-1} ), \\ L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau z_{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau z_{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\tau T^{k}+\beta (L^{k+1}+S^{k+1} )+\omega M-\lambda ^{k}}{\beta +\omega +\tau}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ), \end{cases}\displaystyle \end{aligned}$$

where \(V(\cdot ,\mu )\) is the singular value thresholding operator [30], \(S(\cdot ,\mu ) \) is the softshrinkage operator [31]. Applying SCLI-ADMM to problem (4.5), we get

$$\begin{aligned} \textstyle\begin{cases} z_{L}^{k}=L^{k}+\theta (L^{k}-{L}^{k-1} ), z_{S}^{k}=S^{k}+ \theta (S^{k}-{S}^{k-1} ), \\ L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau z_{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau z_{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\tau T^{k}+\beta (L^{k+1}+S^{k+1} )+\omega (M-T^{k})-\lambda ^{k}}{\beta +\tau}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ), \end{cases}\displaystyle \end{aligned}$$

Applying LADMM to problem (4.5), we have

$$\begin{aligned} \textstyle\begin{cases} L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\beta (L^{k+1}+S^{k+1} )+\omega M-\lambda ^{k}}{\beta +\omega}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ). \end{cases}\displaystyle \end{aligned}$$

We set \(p=n=100\), and take 8 different \((r.,spr.)\). Besides, we choose \(\alpha =\frac{0.2}{\sqrt{m}},\theta =0.3,\omega =1000\), the matrix \(L,S\) and T are initialized to be zero. We take \(\beta =5, \tau =1\), M was generated in MATLAB randomly. The stop** criterion is defined as

$$\begin{aligned} \begin{aligned} \operatorname { RelChg }:= \frac{ \Vert (L^{k+1}, S^{k+1}, T^{k+1} )- (L^{k}, S^{k}, T^{k} ) \Vert _{F}}{ \Vert (L^{k}, S^{k}, T^{k} ) \Vert _{F}+1} \leqslant 10^{-8} \quad\text{or}\quad k>3000. \end{aligned} \end{aligned}$$

Let Ŝ and be a numerical solution of problem (4.5). We measure the quality of the recovery by the relative error, which is defined by

$$\begin{aligned} \begin{aligned} \operatorname{RelErr}:= \frac{ \Vert (\hat{L},\hat{S}, \hat{T})- (L^{*},S^{*}, T^{*} ) \Vert _{F}}{ \Vert (L^{*},S^{*}, T^{*} ) \Vert _{F}+1} . \end{aligned} \end{aligned}$$

Table 2 illustrates the comparison between different \((r.,spr.)\), where “r.” represents the rank of matrix L, “\(spr\).” represents the sparsity of the sparse matrix S, “Iter” represents the number of iterations. \(\|S\|_{0}\) denotes the number of nonzero elements of S. Besides, the iterative curves of the stop** criterion and relative error of the three algorithms are plotted in Fig. 2, respectively.

Figure 2
figure 2

The performance comparison between LADMM (\(\theta =0\)) , SPLI-ADMM ( \(\theta =0.3\)), and SCLI-ADMM (\(\theta =0.3\)) with different \((r.,spr.)\)

Table 2 Summary of three algorithms for eight different (r., \(spr\))

Table 2 shows that SPLIADMM and SCLIADMM take less time and fewer iterations under the same condition, which demonstrates that our proposed two algorithms are more efficient than LADMM for different rank and sparse ratios. In Fig. 2, the curves of stop** criterion (see Fig. 2(a) and (c)) in two trials demonstrate that SPLI-ADMM and SCLIADMM converge faster than LADMM. Figure 2(b) and (d) indicate clearly that the matrices L and S are better recovered by SPLI-ADMM and SCLI-ADMM because “RelErr” of LADMM is greater than that of SPLI-ADMM for the same “Iter”.

5 Conclusion

This paper made some extensions in the field of nonconvex optimization through the development and convergence analysis of two linearized ADMM algorithms, SPLI-ADMM and SCLI-ADMM. By integrating inertial strategy within a linearized framework, these algorithms improve the efficacy for solving linear constrained problems with nonseparable structure. A key novelty lies in the utilization of sequential gradients of the mixed term, which is not typically found in conventional ADMM approaches, enabling the proposed algorithms to use the latest information to update each variable. The KŁ property has been used to guarantee the convergence of the generated sequences. Finally, the results of numerical experiments show that the proposed algorithms exhibit superior time efficiency and validity.