Log in

A difference-of-convex approach for split feasibility with applications to matrix factorizations and outlier detection

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The split feasibility problem is to find an element in the intersection of a closed set C and the linear preimage of another closed set D, assuming the projections onto C and D are easy to compute. This class of problems arises naturally in many contemporary applications such as compressed sensing. While the sets C and D are typically assumed to be convex in the literature, in this paper, we allow both sets to be possibly nonconvex. We observe that, in this setting, the split feasibility problem can be formulated as an optimization problem with a difference-of-convex objective so that standard majorization-minimization type algorithms can be applied. Here we focus on the nonmonotone proximal gradient algorithm with majorization studied in Liu et al. (Math Program, 2019. https://doi.org/10.1007/s10107-018-1327-8, Appendix A). We show that, when this algorithm is applied to a split feasibility problem, the sequence generated clusters at a stationary point of the problem under mild assumptions. We also study local convergence property of the sequence under suitable assumptions on the closed sets involved. Finally, we perform numerical experiments to illustrate the efficiency of our approach on solving split feasibility problems that arise in completely positive matrix factorization, (uniformly) sparse matrix factorization, and outlier detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This is because in this case, we have \(r_C=2\) and hence \(\textsf {SpFeas}_{\mathrm{DC}}\) corresponds to (3) with \(\gamma = \frac{1}{L}\).

  2. We note that in this case D is convex and hence \(Q\mapsto \frac{1}{2}d^2(BQ,D)\) is smooth. Our algorithm \(\textsf {SpFeas}_{\mathrm{DC}_{\textsf {ls}}}\) reduces to the standard gradient projection algorithm with nonmonotone linesearch.

  3. We say that the instance is solved by the algorithm successfully if the algorithm is terminated with the desired accuracy achieved, i.e, \(\mathrm{min}\{BQ^t\}_{ij}\ge -10^{-16}\) for \(\textsf {SpFeas}_{\mathrm{DC}_{\textsf {ls}}}\), and \(\mathrm{min}\{BQ^t\}_{ij}\ge -10^{-15}\) for Algorithm 3.

  4. We do not present results with \( (n, 3n+1) = (10i, 30i+1) \) for \( i=30, 40 \) because they take too much CPU time.

  5. This choice follows the one used in [17, Table 2].

  6. As in the previous experiment, we say that the instance is solved by the algorithm successfully if the algorithm is terminated with the desired accuracy achieved, i.e, \(\mathrm{min}\{BQ^t\}_{ij}\ge -10^{-16}\) for \(\textsf {SpFeas}_{\mathrm{DC}_{\textsf {ls}}}\), and \(\mathrm{min}\{BQ^t\}_{ij}\ge -10^{-15}\) for Algorithm 3.

  7. As discussed in [21, Section 6], this termination criterion is motivated by the fact that (8) holds if and only if \(0 \in \partial \var** ({\bar{x}},{\bar{\eta }})\) for some \({\bar{\eta }}\).

References

  1. Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22, 135–158 (2012)

    Article  MathSciNet  Google Scholar 

  2. Asplund, E.: Differentiability of the metric projection in finite-dimensional Euclidean space. Proc. Am. Math. Soc. 38, 218–219 (1973)

    MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)

    Article  MathSciNet  Google Scholar 

  4. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  5. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)

    Article  MathSciNet  Google Scholar 

  6. Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer, Berlin (2003)

    MATH  Google Scholar 

  7. Berman, A., Dür, M., Shaked-Monderer, N.: Open problems in the theory of completely positive and copositive matrices. Electron. J. Linear Algebra 29, 46–58 (2015)

    Article  MathSciNet  Google Scholar 

  8. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)

    Article  Google Scholar 

  9. Bomze, I.M., Dickinson, P.J.C., Still, G.: The structure of completely positive matrices according to their CP-rank and CP-plus-rank. Linear Algebra Appl. 482, 191–206 (2015)

    Article  MathSciNet  Google Scholar 

  10. Byrne, C.: Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 18, 441–453 (2002)

    Article  MathSciNet  Google Scholar 

  11. Censor, Y., Elfving, T.: A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 8, 221–239 (1994)

    Article  MathSciNet  Google Scholar 

  12. Censor, Y., Elfving, T., Kopf, N., Bortfeld, T.: The multiple-sets split feasibility problem and its applications for inverse problems. Inverse Probl. 21, 2071–2084 (2005)

    Article  MathSciNet  Google Scholar 

  13. Censor, Y., Motova, A., Segal, A.: Perturbed projections and subgradient projections for the multiple-sets split feasibility problem. J. Math. Anal. Appl. 327, 1244–1256 (2007)

    Article  MathSciNet  Google Scholar 

  14. Dickinson, P.J.C., Gijben, L.: On the computational complexity of membership problems for the completely positive cone and its dual. Comput. Optim. Appl. 57, 403–415 (2014)

    Article  MathSciNet  Google Scholar 

  15. Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)

    Article  Google Scholar 

  16. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. I. Springer, Berlin (2003)

    MATH  Google Scholar 

  17. Groetzner, P., Dür, M.: A factorization method for completely positive matrices. Linear Algebra Appl. 591, 1–24 (2020)

    Article  MathSciNet  Google Scholar 

  18. Kyrillidis, A., Becker, S., Cevher, V., Koch, C.: Sparse projections onto the simplex. In: Proceedings of the 30th International Conference on Machine Learning, pp. 280–288 (2013)

  19. Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged nonconvex projections. Found. Comput. Math. 9, 485–513 (2009)

    Article  MathSciNet  Google Scholar 

  20. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)

    Article  MathSciNet  Google Scholar 

  21. Liu, T., Pong, T.K., Takeda, A.: A refined convergence analysis of pDCA\(_e\) with applications to simultaneous sparse recovery and outlier detection. Comput. Optim. Appl. (2019). https://doi.org/10.1007/s10589-019-00067-z

    Article  MathSciNet  MATH  Google Scholar 

  22. Liu, T., Pong, T.K., Takeda, A.: A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems. Math. Program. (2019). https://doi.org/10.1007/s10107-018-1327-8

    Article  MathSciNet  MATH  Google Scholar 

  23. López, G., Martín-Márquez, V., Wang, F., Xu, H.-K.: Solving the split feasibility problem without prior knowledge of matrix norms. Inverse Probl. 28, 085004 (2012)

    Article  MathSciNet  Google Scholar 

  24. Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23, 2448–2478 (2013)

    Article  MathSciNet  Google Scholar 

  25. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation, I: Basic Theory. Springer, Berlin (2006)

    Book  Google Scholar 

  26. Neyshabur, B., Panigrahy, R.: Sparse matrix factorization (2014). ar**v:1311.3315

  27. Polania, L.F., Carrillo, R.E., Blanco-Velasco, M., Barner, K.E.: Compressive sensing for ECG signals in the presence of electromyographic noise. In: Proceedings of the 38th Annual Northeast Bioengineering Conference, pp. 295–296 (2012)

  28. Poliquin, R.A., Rockafellar, R.T., Thibault, L.: Local differentiability of distance functions. Trans. Am. Math. Soc. 352, 5231–5249 (2000)

    Article  MathSciNet  Google Scholar 

  29. Qu, B., **u, N.: A note on the CQ algorithm for the split feasibility problem. Inverse Probl. 21, 1655–1665 (2005)

    Article  MathSciNet  Google Scholar 

  30. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)

    Book  Google Scholar 

  31. Shehu, Y., Iyiola, O.S.: Strong convergence result for proximal split feasibility problem in Hilbert spaces. Optimization 66, 2275–2290 (2017)

    Article  MathSciNet  Google Scholar 

  32. Wang, Z., Yang, Q., Yang, Y.: The relaxed inexact projection methods for the split feasibility problem. Appl. Math. Comput. 217, 5347–5359 (2011)

    MathSciNet  MATH  Google Scholar 

  33. Xu, J., Chi, E.C., Yang, M., Lange, K.: A majorization–minimization algorithm for split feasibility problems. Comput. Optim. Appl. 71, 795–828 (2018)

    Article  MathSciNet  Google Scholar 

  34. Yang, Q.: The relaxed CQ algorithm solving the split feasibility problem. Inverse Probl. 20, 1261–1266 (2004)

    Article  MathSciNet  Google Scholar 

  35. Zhao, J., Yang, Q.: Several solution methods for the split feasibility problem. Inverse Probl. 21, 1791–1799 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lulin Tan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ting Kei Pong’s research was supported partly by Hong Kong Research Grants Council PolyU153085/16p. Lulin Tan’s research was supported partly by Natural Science Foundation of China (No. 11601162) and Natural Science Foundation of Guangdong Province, China (No. 2017A030310167).

Proof of Theorem 2

Proof of Theorem 2

Proof

The boundedness of \(\{x^t\}\) follows from Theorem 1(i). We now prove convergence of the whole sequence. By assumption, \(x^*\) is an accumulation point of \(\{x^t\}\) so that the function

$$\begin{aligned} \kappa (u):= \frac{1}{2}d^2(u,D) \end{aligned}$$

is continuously differentiable at \(Ax^*\) with locally Lipschitz gradient. Then we have from [30, Example 8.53], [25, Theorem 1.110(ii)] and the chain rule that

$$\begin{aligned} \nabla (\kappa \circ A)(x^*) = A^T(Ax^* - \mathrm{Proj}_D(Ax^*)). \end{aligned}$$

Using this and the fact that \(x^*\) is a stationary point of the split feasibility problem (2) (see Theorem 1(ii)), we deduce further that

$$\begin{aligned} \begin{aligned} 0&\in A^T(Ax^* - \mathrm{Proj}_D(Ax^*)) + N_C(x^*) \\&= \nabla (\kappa \circ A)(x^*) + N_C(x^*) = \partial F(x^*), \end{aligned} \end{aligned}$$

where the last equality follows from [30, Exercise 8.8(c)]. In particular, it holds that \(x^*\in \mathrm{dom}\partial F\).

Since F is a KL function and \(x^*\in \mathrm{dom}\partial F\), there exist \(\epsilon > 0\) and a continuous concave function \(\psi \) as in Definition 1 so that

$$\begin{aligned} \psi '(F(x) - F(x^*))\cdot d(0,\partial F(x))\ge 1 \end{aligned}$$
(35)

whenever \(\Vert x - x^*\Vert \le \epsilon \) and \(F(x^*)< F(x) < F(x^*) + \epsilon \). Moreover, by shrinking \(\epsilon \) if necessary, we may assume without loss of generality that \(\nabla \kappa \) is globally Lipschitz in \(\{Ax:\; x\in B(x^*,\epsilon )\}\) with Lipschitz modulus \(\tau \).

Next, observe from (7) with \(M = 0\) that \(\{F(x^t)\}\) is nonincreasing. Since F is also nonnegative, we deduce that the limit \(\lim \limits _{t\rightarrow \infty }F(x^t)\) exists. In addition, notice that F is continuous in its closed domain and \(x^*\) is an accumulation point of \(\{x^t\}\). Thus, we conclude that \(\lim \limits _{t\rightarrow \infty }F(x^t) = F(x^*)\).

Now, if \(F(x^{t_0}) = F(x^*)\) for some \(t_0 \ge 0\), then we see from (7) with \(M = 0\) and \(\lim \limits _{t\rightarrow \infty }F(x^t) = F(x^*)\) that \(x^{t+1}=x^t\) for all \(t\ge t_0\), which implies that the sequence \(\{x^t\}\) converges (finitely). Thus, from now on, we focus on the case that \(F(x^t) > F(x^*)\) for all \(t\ge 0\).

In this case, note from Lemma 2 that there exists \(N_0 > 1\) so that \(\Vert x^{t}-x^{t-1}\Vert \le \frac{\epsilon }{2}\) whenever \(t\ge N_0\). Also, using Lemma 2, the definition of accumulation point and the fact that \(\lim \limits _{t\rightarrow \infty }F(x^t) = F(x^*)\), there exists \(N_1 \ge N_0\) so that

  1. (i)

    \(\Vert x^{N_1}-x^*\Vert \le \frac{\epsilon }{2}\) and \(F(x^*)< F(x^{N_1}) < F(x^*) + \epsilon \).

  2. (ii)

    \(\Vert x^{N_1}-x^*\Vert + \Vert x^{N_1} - x^{N_1-1}\Vert + C_1\psi (F(x^{N_1})-F(x^*))\le \frac{\epsilon }{2}\),

where \(C_1 := \frac{2(\tau \lambda _{\max }(A^TA) + \beta )}{c}\), c is as in (7), \(\tau \) is the Lipschitz continuity modulus of \(\nabla \kappa \) on \(\{Ax:\; x\in B(x^*,\epsilon )\}\), \(\beta = \sup _t {\bar{L}}_t\) with \({\bar{L}}_t\) defined in Step 2 of \(\textsf {SpFeas}_{\mathrm{DC}_{\textsf {ls}}}\), and \(\beta \) is finite according to Lemma 2.

We claim that if \(t\ge N_1\) and \(\Vert x^t - x^*\Vert \le \epsilon /2\), then

$$\begin{aligned} 2\Vert x^{t+1}-x^t\Vert \le \Vert x^t-x^{t-1}\Vert + C_1 \left[ \psi (F(x^t) - F(x^*)) - \psi (F(x^{t+1}) - F(x^*))\right] . \end{aligned}$$
(36)

To this end, note that since \(x^t\in B(x^*,\epsilon /2)\) and \(t\ge N_1\ge N_0\), we have \(\Vert x^{t}-x^{t-1}\Vert \le \frac{\epsilon }{2}\) and hence \(\Vert x^{t-1} - x^*\Vert \le \epsilon \). Thus, \(\kappa \) is continuously differentiable at \(Ax^{t-1}\) and \(Ax^t\). Moreover, we see from [30, Example 8.53] and [25, Theorem 1.110(ii)] (see also (10)) that \(\nabla (\kappa \circ A)(x^{t-1}) = A^T(Ax^{t-1} - \mathrm{Proj}_D(Ax^{t-1}))\). Using this and the definition of \(x^t\), we deduce that

$$\begin{aligned} x^{t}\in \mathrm{Proj}_C\left( x^{t-1}-\frac{\nabla (\kappa \circ A)(x^{t-1})}{\bar{L}_{t-1}}\right) . \end{aligned}$$

Thus, according to (1),

$$\begin{aligned} v^t:={{\bar{L}}}_{t-1}(x^{t-1}-x^t)-\nabla (\kappa \circ A)(x^{t-1})\in N_C(x^t). \end{aligned}$$

Moreover, using the definition of \(v^t\), we have

$$\begin{aligned} \begin{aligned} \Vert v^t+\nabla (\kappa \circ A)(x^t)\Vert&\le \Vert \nabla (\kappa \circ A)(x^t)-\nabla (\kappa \circ A)(x^{t-1})\Vert +{{\bar{L}}}_{t-1}\Vert x^{t}-x^{t-1}\Vert \\&\le (\tau \lambda _{\max }(A^TA) + \beta )\Vert x^{t}-x^{t-1}\Vert , \end{aligned} \end{aligned}$$
(37)

where the second inequality holds because \(\beta = \sup _{t}{\bar{L}}_t\) and \(\nabla \kappa \) is globally Lipschitz in \(\{Ax:\;x\in B(x^*,\epsilon )\}\) with Lipschitz modulus \(\tau \). Since \(v^t+\nabla (\kappa \circ A)(x^t)\in N_C(x^t) + \nabla (\kappa \circ A)(x^t) = \partial F(x^t)\), we obtain from (37) that

$$\begin{aligned} d(0,\partial F(x^t))\le \left( \tau \lambda _{\max }(A^TA) +\beta \right) \Vert x^t-x^{t-1}\Vert . \end{aligned}$$

Making use of this, the concavity of \(\psi \) and (7) with \(M = 0\), we see further that

$$\begin{aligned} \begin{aligned}&\left( \tau \lambda _{\max }(A^TA) + \beta \right) \Vert x^t-x^{t-1}\Vert \cdot \left[ \psi (F(x^t) - F(x^*)) - \psi (F(x^{t+1}) - F(x^*))\right] \\&\quad \ge d(0,\partial F(x^t))\cdot \left[ \psi (F(x^t) - F(x^*)) - \psi (F(x^{t+1}) - F(x^*))\right] \\&\quad \ge d(0,\partial F(x^t))\cdot \psi '(F(x^t) - F(x^*))\cdot \left[ F(x^t) - F(x^{t+1})\right] \\&\quad \ge \frac{c}{2}\Vert x^{t+1}-x^t\Vert ^2, \end{aligned} \end{aligned}$$

where the last inequality follows from (7) with \(M=0\), (35), and the facts that \(\Vert x^t-x^*\Vert \le \epsilon /2\) and that \(F(x^*)< F(x^t)\le F(x^{N_1}) < F(x^*)+\epsilon \) (since \(t\ge N_1\)). Dividing both sides of the above inequality by \(\frac{c}{2}\), taking square root, using the relation \(\sqrt{ab}\le \frac{a+b}{2}\) for any nonnegative numbers a and b and invoking the definition of \(C_1\), we obtain further that

$$\begin{aligned} \begin{aligned} \Vert x^{t+1}-x^t\Vert&\le \sqrt{\Vert x^t-x^{t-1}\Vert \cdot C_1\left[ \psi (F(x^t) - F(x^*)) - \psi (F(x^{t+1}) - F(x^*))\right] }\\&\le \frac{1}{2}\left( \Vert x^t-x^{t-1}\Vert + C_1\left[ \psi (F(x^t) - F(x^*)) - \psi (F(x^{t+1}) - F(x^*))\right] \right) , \end{aligned} \end{aligned}$$

from which (36) follows immediately.

Next, we show by induction that \(x^t\in B(x^*,\epsilon /2)\) whenever \(t\ge N_1\). The case \(t = N_1\) follows from construction. Suppose that \(x^t\in B(x^*,\epsilon /2)\) whenever \(t = N_1, \ldots , N_1+k-1\) for some \(k\ge 1\). Then

$$\begin{aligned} \Vert x^{N_1+k} - x^*\Vert\le & {} \Vert x^{N_1}-x^*\Vert + \sum _{t=N_1}^{N_1+k-1}\Vert x^{t+1}-x^t\Vert \overset{\mathrm{(a)}}{\le }\Vert x^{N_1}-x^*\Vert \\&+ \sum _{t=N_1}^{N_1+k-1}\left( \Vert x^t-x^{t-1}\Vert - \Vert x^{t+1}-x^t\Vert \right. \\&\left. + C_1 \left[ \psi (F(x^t) - F(x^*)) - \psi (F(x^{t+1}) - F(x^*))\right] \right) \\\le & {} \Vert x^{N_1}-x^*\Vert + \Vert x^{N_1}-x^{N_1-1}\Vert + C_1\psi (F(x^{N_1}) - F(x^*)) \overset{\mathrm{(b)}}{\le }\frac{\epsilon }{2}, \end{aligned}$$

where (a) follows from the induction hypothesis and (36), and (b) follows from the definition of \(N_1\). Thus, \(x^t\in B(x^*,\epsilon /2)\) whenever \(t\ge N_1\) by induction.

Since \(x^t\in B(x^*,\epsilon /2)\) whenever \(t\ge N_1\), we can sum both sides of (36) from \(N_1\) to \(\infty \) and obtain

$$\begin{aligned} \begin{aligned}&\sum _{t=N_1}^\infty \Vert x^{t+1}-x^t\Vert \\&\quad \le \sum _{t=N_1}^\infty \bigg (\Vert x^t-x^{t-1}\Vert - \Vert x^{t+1}-x^t\Vert + C_1 \left[ \psi (F(x^t) - F(x^*)) - \psi (F(x^{t+1}) - F(x^*))\right] \bigg )\\&\quad \le \Vert x^{N_1}-x^{N_1-1}\Vert +C_1\psi (F(x^{N_1}) - F(x^*)) < \infty . \end{aligned} \end{aligned}$$

Thus, the sequence \(\{x^t\}\) is Cauchy and is hence convergent. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Pong, T.K., Tan, L. et al. A difference-of-convex approach for split feasibility with applications to matrix factorizations and outlier detection. J Glob Optim 78, 107–136 (2020). https://doi.org/10.1007/s10898-020-00899-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-020-00899-8

Keywords

Navigation