Abstract
Many interesting problems in statistics and machine learning can be written as \(\min _{x}F(x) =f(x) + g(x)\), where x is the model parameter, f is the loss and g is the regularizer. Examples include regularized regression in high-dimensional feature selection and low-rank matrix/tensor factorization. Sometimes the loss function and/or the regularizer is nonsmooth due to the nature of the problem, for example, f(x) could be quantile loss to induce some robustness or to put more focus on different parts of the distribution other than the mean. In this paper we propose a general framework to deal with situations when you have nonsmooth loss or regularizer. Specifically we use low-rank matrix recovery as an example to demonstrate the main idea. The framework involves two main steps: the optimal smoothing of the loss function or regularizer and then a gradient based algorithm to solve the smoothed loss. The proposed smoothing pipeline is highly flexible, computationally efficient, easy to implement and well suited for problems with high-dimensional data. Strong theoretical convergence guarantee has also been established. In the numerical studies, we used \(L_1\) loss as an example to illustrate the practicability of the proposed pipeline. Various state-of-the-art algorithms such as Adam, NAG and YellowFin all show promising results for the smoothed loss.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achlioptas, D., McSherry, F.: Fast computation of low-rank matrix approximations. J. ACM (JACM) 54(2), 9 (2007)
Alefeld, G., Chen, X.: A regularized projection method for complementarity problems with non-lipschitzian functions. Math. Comput. 77(261), 379–395 (2008)
Aravkin, A.Y., Kambadur, A., Lozano, A.C., Luss, R.: Sparse quantile huber regression for efficient and robust estimation. ar**v preprint ar**v:1402.4624, 2014
Barnett, S.: Greatest common divisor of two polynomials. Linear Algebra Appl. 3(1), 7–9 (1970)
Bhojanapalli, S., Kyrillidis, A., Sanghavi, S.: Drop** convexity for faster semi-definite optimization. In: Conference on Learning Theory, pp. 530–582 (2016)
Bokde, D., Girase, S., Mukhopadhyay, D.: Matrix factorization model in collaborative filtering algorithms: a survey. Procedia Comput. Sci. 49, 136–146 (2015)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Chen, X.: First order conditions for nonsmooth discretized constrained optimal control problems. SIAM J. Control Optim. 42(6), 2004–2015 (2004)
Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134(1), 71–99 (2012)
Chen, X., Womersley, R.S., Ye, J.J.: Minimizing the condition number of a gram matrix. SIAM J. Optim. 21(1), 127–148 (2011)
Fan, J.: Local Polynomial Modelling and its Applications: Monographs on Statistics and Applied Probability, vol. 66. Routledge (2018)
Garg, R., Khandekar, R.: Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 337–344. ACM (2009)
Gross, D., Liu, Y.-K., Flammia, S.T., Becker, S., Eisert, J.: Quantum state tomography via compressed sensing. Phys. Rev. Lett. 105(15), 150401 (2010)
Huber, P.J.: Robust Statistics. John Wiley and Sons (1981)
Jain, P., Meka, R., Dhillon, I.S.: Guaranteed rank minimization via singular value projection. In: Advances in Neural Information Processing Systems, pp. 937–945 (2010)
Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674. ACM (2013)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ar**v preprint ar**v:1412.6980 (2014)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 8, 30–37 (2009)
Markovsky, I., Usevich, K.: Low Rank Approximation. Springer, London (2012). https://doi.org/10.1007/978-1-4471-2227-2
Mount, J.: Approximation by orthogonal transform (2014). https://www.winvector.gitbuh.io/xDrift/orthApprox.pdf. Accessed 05 Sept 2018
Nesterov, Yu.: A method of solving a convex programming problem with convergence rate o(1/sqr(k)). Soviet Mathematics Doklady 27, 372–376 (1983)
Nesterov, Yu.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Owen, A.B.: A robust hybrid of lasso and ridge regression. Contemp. Math. 443(7), 59–72 (2007)
Park, D., Kyrillidis, A., Caramanis, C., Sanghavi, S.: Finding low-rank solutions via non-convex matrix factorization, efficiently and provably. ar**v preprint ar**v:1606.03168 (2016)
Picci, G.: Stochastic realization theory. In: Mathematical System Theory, pp. 213–229. Springer, Heidelberg (1991). https://doi.org/10.1007/978-3-662-08546-2_12
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM 40(3), 56–58 (1997)
Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inform. Theory 62(11), 6535–6579 (2016)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: International Conference on Machine Learning, pp. 964–973 (2016)
Tu, W., et al.: M-estimation in low-rank matrix factorization: a general framework. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 568–577 (2019)
Christopher JCH Watkins and Peter Dayan: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)
Zuyuan Yang, Y., Zhang, W.Y., **ang, Y., **e, S.: A fast non-smooth nonnegative matrix factorization for learning sparse representation. IEEE Access 4, 5161–5168 (2016)
Zhang, J., Mitliagkas, I.: Yellowfin and the art of momentum tuning. ar**v preprint ar**v:1706.03471 (2017)
Zhao, T., Wang, Z., Liu, H.: Nonconvex low rank matrix factorization via inexact first order oracle. In: Advances in Neural Information Processing Systems (2015)
Zhu, R., Niu, D., Li, Z.: Robust web service recommendation via quantile matrix factorization. In: INFOCOM 2017-IEEE Conference on Computer Communications, IEEE, pp. 1–9. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tu, W. et al. (2022). Nonsmooth Low-Rank Matrix Recovery: Methodology, Theory and Algorithm. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2021, Volume 1. FTC 2021. Lecture Notes in Networks and Systems, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-030-89906-6_54
Download citation
DOI: https://doi.org/10.1007/978-3-030-89906-6_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89905-9
Online ISBN: 978-3-030-89906-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)