Abstract
We present a new, stochastic variant of the projective splitting (PS) family of algorithms for inclusion problems involving the sum of any finite number of maximal monotone operators. This new variant uses a stochastic oracle to evaluate one of the operators, which is assumed to be Lipschitz continuous, and (deterministic) resolvents to process the remaining operators. Our proposal is the first version of PS with such stochastic capabilities. We envision the primary application being machine learning (ML) problems, with the method’s stochastic features facilitating “mini-batch” sampling of datasets. Since it uses a monotone operator formulation, the method can handle not only Lipschitz-smooth loss minimization, but also min–max and noncooperative game formulations, with better convergence properties than the gradient descent-ascent methods commonly applied in such settings. The proposed method can handle any number of constraints and nonsmooth regularizers via projection and proximal operators. We prove almost-sure convergence of the iterates to a solution and a convergence rate result for the expected residual, and close with numerical experiments on a distributionally robust sparse logistic regression problem.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10589-023-00528-6/MediaObjects/10589_2023_528_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10589-023-00528-6/MediaObjects/10589_2023_528_Fig1_HTML.png)
Similar content being viewed by others
Data Availability
The data analyzed during the current study are from the public LIBSVM repository available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Change history
27 October 2023
A Correction to this paper has been published: https://doi.org/10.1007/s10589-023-00539-3
Notes
Original data source http://largescale.ml.tu-berlin.de/instructions/.
Original data source https://people.cs.umass.edu/~mccallum/data.html.
References
Alacaoglu, A., Malitsky, Y., Cevher, V.: Forward-reflected-backward method with variance reduction. Comput. Optim. Appl. (2021)
Alotaibi, A., Combettes, P.L., Shahzad, N.: Solving coupled composite monotone inclusions by successive Fejér approximations of their Kuhn–Tucker set. SIAM J. Optim. 24(4), 2076–2095 (2014)
Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5(1), 1–9 (2014)
Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., Graepel, T.: The mechanics of \(n\)-player differentiable games. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 354–363. PMLR (2018)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)
Boţ, R.I., Mertikopoulos, P., Staudigl, M., Vuong, P.T.: Minibatch forward-backward-forward methods for solving stochastic variational inequalities. Stoch. Syst. 11(2), 112–139 (2021)
Böhm, A., Sedlmayer, M., Csetnek, E.R., Boţ, R.I.: Two steps at a time—taking GAN training in stride with Tseng’s method. ar**v preprint ar**v:2006.09033 (2020)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010: 19th International Conference on Computational Statistics, pp. 177–186. Springer, Berlin (2010)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Briceño-Arias, L.M., Combettes, P.L.: A monotone+skew splitting model for composite monotone inclusions in duality. SIAM J. Optim. 21(4), 1230–1250 (2011)
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015)
Celis, L.E., Keswani, V.: Improved adversarial learning for fair classification. ar**v preprint ar**v:1901.10443 (2019)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chavdarova, T., Pagliardini, M., Stich, S.U., Fleuret, F., Jaggi, M.: Taming GANs with lookahead-minmax. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=ZW0yXJyNmoG
Combettes, P.L., Eckstein, J.: Asynchronous block-iterative primal-dual decomposition methods for monotone inclusions. Math. Program. 168(1–2), 645–672 (2018)
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Bauschke, H., Burachik, R., Combettes, P., Elser, V., Luke, D., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, Berlin (2011)
Combettes, P.L., Pesquet, J.C.: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal. 20(2), 307–330 (2012)
Combettes, P.L., Pesquet, J.C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random swee**. SIAM J. Optim. 25(2), 1221–1248 (2015)
Daskalakis, C., Ilyas, A., Syrgkanis, V., Zeng, H.: Training GANs with optimism. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=SJJySbbAZ
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)
Diakonikolas, J.: Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities. In: Conference on Learning Theory, pp. 1428–1451. PMLR (2020)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Eckstein, J.: A simplified form of block-iterative operator splitting and an asynchronous algorithm resembling the multi-block alternating direction method of multipliers. J. Optim. Theory Appl. 173(1), 155–182 (2017)
Eckstein, J., Svaiter, B.F.: A family of projective splitting methods for the sum of two maximal monotone operators. Math. Program. 111(1), 173–199 (2008)
Eckstein, J., Svaiter, B.F.: General projective splitting methods for sums of maximal monotone operators. SIAM J. Control. Optim. 48(2), 787–811 (2009)
Edwards, H., Storkey, A.: Censoring representations with an adversary. ar**v preprint ar**v:1511.05897 (2015)
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems, chap. IX, pp. 299–340. North-Holland, Amsterdam (1983)
Gidel, G., Berard, H., Vignoud, G., Vincent, P., Lacoste-Julien, S.: A variational inequality perspective on generative adversarial networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=r1laEnA5Ym
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates (2014)
Grnarova, P., Kilcher, Y., Levy, K.Y., Lucchi, A., Hofmann, T.: Generative minimization networks: training GANs without competition. ar**v preprint ar**v:2103.12685 (2021)
Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the convergence of single-call stochastic extra-gradient methods. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates (2019)
Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: Explore aggressively, update conservatively: Stochastic extragradient methods with variable stepsize scaling. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 16223–16234. Curran Associates (2020)
Huang, C., Kairouz, P., Chen, X., Sankar, L., Rajagopal, R.: Context-aware generative adversarial privacy. Entropy 19(12), 656 (2017)
Johnstone, P.R., Eckstein, J.: Convergence rates for projective splitting. SIAM J. Optim. 29(3), 1931–1957 (2019)
Johnstone, P.R., Eckstein, J.: Single-forward-step projective splitting: exploiting cocoercivity. ar**v preprint ar**v:1902.09025 (2019)
Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps only requires continuity. Optim. Lett. 14(1), 229–247 (2020)
Johnstone, P.R., Eckstein, J.: Single-forward-step projective splitting: exploiting cocoercivity. Comput. Optim. Appl. 78(1), 125–166 (2021)
Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps. Math. Program. 191(2), 631–670 (2022)
Korpelevich, G.: Extragradient method for finding saddle points and other problems. Matekon 13(4), 35–49 (1977)
Kuhn, D., Esfahani, P.M., Nguyen, V.A., Shafieezadeh-Abadeh, S.: Wasserstein distributionally robust optimization: theory and applications in machine learning. In: Netessine, S. (ed.) Operations Research & Management Science in the Age of Analytics, Tutorials in Operations Research, pp. 130–166. INFORMS (2019)
Li, C.J., Yu, Y., Loizou, N., Gidel, G., Ma, Y., Roux, N.L., Jordan, M.I.: On the convergence of stochastic extragradient for bilinear games with restarted iteration averaging. ar**v preprint ar**v:2107.00464 (2021)
Lin, T., **, C., Jordan, M.: On gradient descent ascent for nonconvex-concave minimax problems. In: Singh, H.D. III, A (ed.) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119, pp. 6083–6093. PMLR (2020)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Malitsky, Y., Tam, M.K.: A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM J. Optim. 30(2), 1451–1472 (2020)
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 3481–3490. PMLR (2018)
Mescheder, L., Nowozin, S., Geiger, A.: The numerics of GANs. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates (2017)
Monteiro, R.D., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)
Nagarajan, V., Kolter, J.Z.: Gradient descent GAN optimization is locally stable. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates (2017)
Namkoong, H., Duchi, J.C.: Stochastic gradient methods for distributionally robust optimization with \(f\)-divergences. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates (2016)
Nemirovski, A.: Prox-method with rate of convergence O\((1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)
Pedregosa, F., Fatras, K., Casotto, M.: Proximal splitting meets variance reduction. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 89, pp. 1–10. PMLR (2019)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Rockafellar, R.T.: Monotone operators associated with saddle-functions and minimax problems. Nonlinear Funct. Anal. 18(part 1), 397–407 (1970)
Ryu, E.K., Boyd, S.: Primer on monotone operator methods. Appl. Comput. Math 15(1), 3–43 (2016)
Shafieezadeh-Abadeh, S., Esfahani, P.M., Kuhn, D.: Distributionally robust logistic regression. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1576–1584. Curran Associates (2015)
Sinha, A., Namkoong, H., Duchi, J.: Certifying some distributional robustness with principled adversarial training. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=Hk6kPgZA-
Tseng, P.: A modified forward–backward splitting method for maximal monotone map**s. SIAM J. Control. Optim. 38(2), 431–446 (2000)
Van Dung, N., Vu, B.C.: Convergence analysis of the stochastic reflected forward–backward splitting algorithm. ar**v preprint ar**v:2102.08906 (2021)
Wadsworth, C., Vera, F., Piech, C.: Achieving fairness through adversarial learning: an application to recidivism prediction. ar**v preprint ar**v:1807.00199 (2018)
Yu, Y., Lin, T., Mazumdar, E., Jordan, M.I.: Fast distributionally robust learning with variance reduced min-max optimization. ar**v preprint ar**v:2104.13326 (2021)
Yurtsever, A., Vu, B.C., Cevher, V.: Stochastic three-composite convex minimization. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates (2016)
Zhang, B.H., Lemoine, B., Mitchell, M.: Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335–340 (2018)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare with regards to the current study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Approximation residuals
In this section we derive the approximation residual used to assess the performance of the algorithms in the numerical experiments. This residual relies on the following product-space reformulation of (1).
1.1 Appendix A.1: Product-space reformulation and residual principle
Recall (1), the monotone inclusion we are solving:
In this section we demonstrate a “product-space” reformulation of (1) which allows us to rewrite it in a standard form involving just two operators, one maximal monotone and the other monotone and Lipschitz. This approach was pioneered in [10, 17]. Along with allowing for a simple definition of an approximation residual as a measure of approximation error in solving (1), it allows one to apply operator splitting methods originally formulated for two operators to problems such as (1) for any finite n.
Observe that solving (1) is equivalent to
This formulation resembles that of the extended solution set \(\mathcal {S}\) used in projective spitting, as given in (2), except that it combines the final two conditions in the definition of \(\mathcal {S}\), and thus does not need the final dual variable \(w_{n+1}\). From the definition of the inverse of an operator, the above formulation is equivalent to
These conditions are in turn equivalent to finding \((w_1,\ldots ,w_n,z)\in \mathbb {R}^{(n+1)d}\) such that
where \({\mathscr {A}}\) is the set-valued map
and \({\mathscr {B}}\) is the single-valued operator
It is easily established that \({\mathscr {B}}\) is maximal monotone and Lipschitz continuous, while \({\mathscr {A}}\) is maximal monotone. Letting \( \mathscr {T}\doteq {\mathscr {A}}+ {\mathscr {B}}, \) it follows from [5, Prop. 20.23] that \(\mathscr {T}\) is maximal monotone. Thus, we have reformulated (1) as the monotone inclusion \(0\in \mathscr {T}(q)\) for q in the product space \(\mathbb {R}^{(n+1)d}\). A vector \(z\in \mathbb {R}^d\) solves (1) if and only if there exists \((w_1,\ldots ,w_n)\in \mathbb {R}^{nd}\) such that \(0\in \mathscr {T}(q)\), where \(q=(w_1,\ldots ,w_n,z)\).
For any pair (q, v) such that \(v\in \mathscr {T}(q)\), \(\Vert v\Vert ^2\) represents an approximation residual for q in the sense that \(v=0\) implies q is a solution to (60). One may take \(\Vert v\Vert ^2\) as a measure of the error of q as an approximate solution to (60), and it can only be 0 if q is a solution. Given two approximate solutions \(q_1\) and \(q_2\) with certificates \(v_1\in T(q_1)\) and \(v_2\in \mathscr {T}(q_2)\), we will treat \(q_1\) as a “better” approximate solution than \(q_2\) if \(\Vert v_1\Vert ^2<\Vert v_2\Vert ^2\). Doing so is somewhat analogous to the practice, common in optimization, of using the gradient \(\Vert \nabla f(x)\Vert ^2\) as a measure of quality of an approximate minimizer of some differentiable function f. However, note that since \(\mathscr {T}(q_1)\) is a set, there may exist elements of \(\mathscr {T}(q_1)\) with smaller norm than \(v_1\). Thus any given certificate \(v_1\) only corresponds to an upper bound on \({{\,\textrm{dist}\,}}^2(0,\mathscr {T}(q_1))\).
1.2 Appendix A.2: Approximation residual for projective splitting
In SPS (Algorithm 1), for \(i\in 1..n\), the pairs \((x_i^k,y_i^k)\) are chosen so that \(y_i^k\in A_i(x_i^k)\). This can be seen from the definition of the resolvent. Thus \(x_i^k\in A_i^{-1}(y_i^k)\). Observe that
The approximation residual for SPS is thus
which is an approximation residual for \((y_1^k,\ldots ,y_n^k,z^k)\) in the sense defined above. We may relate \(R_k\) to the approximation residual \({\mathcal {G}}_k\) for SPS from Sect. 5.2 as follows:
where in the second equality we have used the fact that \(\sum _{i=1}^{n+1}w_i^k = 0\). Thus, \(R_k\) has the same convergence rate as \({\mathcal {G}}_k\) given in Theorem 2.
Note that while the certificate given in (63) focuses on the primal iterate \(z^k\), it may be changed to focus on any \(x_i^k\) for \(i=1,\ldots ,n\), by using
The approximation residual \(\Vert v^k_i\Vert ^2\) may also be shown to have the same rate as \({\mathcal {G}}_k\) by following similar derivations to those above for \(R_k\).
1.3 Appendix A.3: Tseng’s method
Tseng’s method [58] can be applied to (60), resulting in the following recursion with iterates \(q^k,{\bar{q}}^k \in \mathbb {R}^{(n+1)d}\):
where \({\mathscr {A}}\) and \({\mathscr {B}}\) are defined in (61) and (62). The resolvent of \({\mathscr {A}}\) may be readily computed from the resolvents of the \(A_i\) using Moreau’s identity [5, Prop. 23.20].
Analogous to SPS, Tseng’s method has an approximation residual, which in this case is an element of \(\mathscr {T}({\bar{q}}^k)\). In particular, using the general properties of resolvent operators as applied to \(J_{\alpha {\mathscr {A}}}\), we have
Also, rearranging (66) produces
Adding these two relations produces
Therefore,
represents a measure of the approximation error for Tseng’s method equivalent to \(R_k\) defined in (64) for SPS.
1.4 Appendix A.4: FRB
The forward-reflected-backward method (FRB) [44] is another method that may be applied to the splitting \(\mathscr {T}= {\mathscr {A}}+ {\mathscr {B}}\) for \({\mathscr {A}}\) and \({\mathscr {B}}\) as defined in (61) and (62). Doing so yields recursion
Following similar arguments to those for Tseng’s method, it can be shown that
Thus, FRB admits the following approximation residual equivalent to \(R_k\) for SPS:
Finally, we remark that the stepsizes used in both the Tseng and FRB methods can be chosen via a linesearch procedure that we do not detail here.
1.5 Appendix A.5: Stochastic Tseng Method
The stochastic version of Tseng’s method of [7] (S-Tseng) may be applied to the inclusion \(0\in {\mathscr {A}}(q)+{\mathscr {B}}(q)\), since the operator \({\mathscr {A}}\) may be written as a subdifferential. However, unlike the deterministic Tseng method, it does not produce a valid residual. Note also that S-Tseng outputs an ergodic sequence \(q_{\text {erg}}^k\). To construct a residual for the ergodic sequence, we compute a deterministic step of Tseng’s method according to (65)-(66), starting at \(q_{\text {erg}}^k\). That is, letting
we can then compute essentially the same residual as in Sect. 1,
To construct the stochastic oracle for S-Tseng, we assumed \(B(z)=\frac{1}{m}\sum _{i=1}^m B_i(z)\). Then we used
for some minibatch \({\textbf{B}}\in \{1,\ldots ,m\}\).
1.6 Appendix A.6: Variance-reduced FRB
The FRB-VR method of [1] can also be applied to \(0\in {\mathscr {A}}(q)+{\mathscr {B}}(q)\), using the same stochastic oracle \({\tilde{{\mathscr {B}}}}\) defined in (67). if we let the iterates of FRB-VR be \((q^k,p^k)\), then line 4 of Algorithm 1 of [1] can be written as
Once again, the method does not directly produce a residual, but one can be developed from the algorithm definition as follows: (69) yields \(\tau ^{-1}({\hat{q}}^k - q^{k+1}) \in {\mathscr {A}}(q^{k+1})\) and hence
Therefore we use the residual
Figure 1 plots \(R_k\) for SPS, \(R^{\text {Tseng}}_k\) for Tseng’s method, \(R^{\text {FRB}}_k\) for FRB, \(R^\text {S-Tseng}_k\) for S-Tseng, and \(R^\text {FRB-VR}_k\) for FRB-VR.
Appendix B: Additional information about the numerical experiments
We now show how we converted Problem (59) to the form (1) for our experiments. Let z be a shorthand for \((\lambda ,\beta ,\gamma )\), and define
The first-order necessary and sufficient conditions for the convex–concave saddlepoint problem in (59) are
where the vector field B(z) is defined as
with
and
It is readily confirmed that B defined in this manner is Lipschitz. The monotonicity of B follows from its being the generalized gradient of a convex–concave saddle function [54]. For the set-valued operators, \(A_1(z)\) corresponds to the constraints and \(A_2(z)\) to the nonsmooth \(\ell _1\) regularizer, and are defined as
where
and
Here, the notation \({\textbf{0}}_{p\times 1}\) denotes the p-dimensional vector of all zeros. \(\mathcal {C}_1\) is a scaled version of the second-order cone, well known to be a closed convex set, while \(\mathcal {C}_2\) is the unit ball of the \(\ell _\infty \) norm, also closed and convex. Since \(A_1\) is a normal cone map of a closed convex set and \(A_2\) is the subgradient map of a closed proper convex function (the scaled 1-norm), both of these operators are maximal monotone and problem (70) is a special case of (1) for \(n=2\).
Stochastic oracle implementation The operator \(B:\mathbb {R}^{m+d+1}\mapsto \mathbb {R}^{m+d+1}\), defined in (71), can be written as
where
In our SPS experiments, the stochastic oracle for B is simply \({\tilde{B}}(z) = \frac{1}{|{\textbf{B}}|}\sum _{i\in {\textbf{B}}} B_i(z)\) for some minibatch \({\textbf{B}}\subseteq \{1,\ldots ,m\}\). We used a batchsize of 100.
Resolvent computations The resolvent of \(A_1\) is readily constructed from the projection maps of the simple sets \(\mathcal {C}_1\) and \(\mathcal {C}_2\), while the resolvent \(A_2\) involves the proximal operator of the \(\ell _1\) norm. Specifically,
The constraint \(\mathcal {C}_1\) is a scaled second-order cone and \(\mathcal {C}_2\) is the \(\ell _\infty \) ball, both of which have closed-form projections. The proximal operator of the \(\ell _1\) norm is the well-known soft-thresholding operator [51, Sec. 6.5.2]. Therefore all resolvents in the formulation may be computed quickly and accurately.
SPS stepsize choices For the stepsize in SPS, we ordinarily require \(\rho _k \le \overline{\rho }< 1/L\) for the global Lipschitz constant L of B. However, since the global Lipschitz constant may be pessimistic, better performance can often be achieved by experimenting with larger stepsizes. If divergence is observed, then the stepsize can be decreased. This type of strategy is common for SGD and similar stochastic methods. Thus, for SPS-decay we set \(\alpha _k = C_d k^{-0.51} \) and \( \rho _k = C_d k^{-0.25}, \) and performed a grid search to select the best \(C_d\) from \(\{0.1,0.5,1,5,10\}\), arriving at \(C_d=1\) for epsilon and SUSY, and \(C_d=0.5\) for real-sim. For SPS-fixed we used \(\rho = K^{-1/4}\) and \(\alpha = C_f\rho ^2\), and performed a grid search to select \(C_f\) over \(\{0.1,0.5,1,5,10\}\), arriving at \(C_f=1\) for epsilon and real-sim, and \(C_f=5\) for SUSY. The total number of iterations for SPS-fixed was chosen as follows: For the epsilon dataset, we used \(K=5000\), for SUSY we used \(K=200\), and for real-sim we used \(K=1000\).
Parameter choices for the other algorithms All methods are initialized at the same random point. For Tseng’s method, we used the backtracking linesearch variant with an initial stepsize of 1, \(\theta =0.8\), and a stepsize reduction factor of 0.7. For FRB, we used the backtracking linesearch variant with the same settings as for Tseng’s method. For deterministic PS, we used a fixed stepsize of 0.9/L. For the stochastic Tseng’s method of [7], the stepsize \(\alpha _k\) must satisfy: \(\sum _{k=1}^\infty \alpha _k=\infty \) and \(\sum _{k=1}^\infty \alpha _k^2<\infty \). So we set \(\alpha _k=C k ^{-d}\) and perform a grid search over \(\{C,d\}\) in the range \([10^{-4},10]\times [0.51,1]\), checking \(5\times 5\) values to find the best setting for each of the three problems. The selected values are in Table 1.
The work of [7] also introduced FBFp, a stochastic version of Tseng’s method that reuses a previously-computed gradient and therefore only needs one additional gradient calculation per iteration. In our experiments, the performance of the two methods was about the same, so we only report the performance of stoch. Tseng’s method.
For variance-reduced FRB, the main parameter is the probability p. We hand-tuned p,arriving at \(p=0.01\) for all problems. We set the stepsize to its maximum allowed value of
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Johnstone, P.R., Eckstein, J., Flynn, T. et al. Stochastic projective splitting. Comput Optim Appl 87, 397–437 (2024). https://doi.org/10.1007/s10589-023-00528-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-023-00528-6