Abstract
Financial crises are typically characterized by highly positively correlated asset returns due to the simultaneous distress on almost all securities, high volatilities and the presence of extreme returns. In the aftermath of the 2008 crisis, investors were prompted even further to look for portfolios that minimize risk and can better deal with estimation error in the inputs of the asset allocation models. The minimum variance portfolio à la Markowitz is considered the reference model for risk minimization in equity markets, due to its simplicity in the optimization as well as its need for just one input estimate: the inverse of the covariance estimate, or the so-called precision matrix. In this paper, we propose a data-driven portfolio framework based on two regularization methods, glasso and tlasso, that provide sparse estimates of the precision matrix by penalizing its \(L_1\)-norm. Glasso and tlasso rely on asset returns Gaussianity or t-Student assumptions, respectively. Simulation and real-world data results support the proposed methods compared to state-of-art approaches, such as random matrix and Ledoit–Wolf shrinkage.
Similar content being viewed by others
Notes
The original specification proposed by Friedman et al. (2008) applied the penalty to the entire matrix \(\varvec{\varOmega }\). The version of the model with the penalty applied to \(\varvec{\varOmega }\) is the one studied by Rothman et al. (2008) and is currently implemented in the R package ‘glasso’ (Friedman et al. 2014).
See Theorem 2 and Technical Condition (B) in Lam and Fan (2009).
see Theorem 1, Banerjee et al. (2008).
Notice that this representation implies a permutation of the rows and columns to have the ith asset as the last one.
\(v_i\) can be interpreted as the unhedgeable component of \(X_{i,t}\).
In the case of glasso we refer to the likelihood of a multivariate normal distribution, while with tlasso we refer to the one of a multivariate t-Student distribution.
The result follows from Corollary 1 in Witten et al. (2011), according to which the ith node is fully unconnected to all other nodes if and only if \(|\varvec{\varSigma }_{ij}| \le \rho \quad \forall i \ne j\). When \(\varvec{\varSigma }\) is the correlation matrix, all its elements are smaller or equal to one and therefore for \(\rho = 1\) all the elements are disconnected, that is, the precision matrix is diagonal.
The dimension of \(\mathbf{G }_{\backslash i,\backslash i}\), \(g_{\backslash i,i}\) and \(g_{i,i}\) are respectively \(((n-1)\times (n-1))\), \(((n-1)\times 1)\) and \((1\times 1)\).
Interestingly, the Ledoit–Wolf shrinkage is closely related to portfolio optimization with \(L_2\) penalization of weight estimates. Indeed, the optimization problem \(\min _{\mathbf{w }\in C}(\mathbf{w }'\widehat{\varvec{\varSigma }}\mathbf{w }+ a\mathbf{w }'\mathbf{w })\), with \(C = \{{\mathbf{w }} | {{\mathbf{1 }}{'}}\mathbf{w }=1\}\) can be equivalently stated as \(\min _{\mathbf{w }\in C}(\mathbf{w }'(\widehat{\varvec{\varSigma }}+a \mathbf{I }) \mathbf{w })\), which then is equivalent to solving the problem using the Ledoit–Wolf shrinkage estimator with \(\widehat{\varvec{\varSigma }}_T=\mathbf{I }\) (Bruder et al. 2013).
References
Baba K, Shibata R, Sibuya M (2004) Partial correlation and conditional correlation as measures of conditional independence. Aust N Z J Stat 46(4):657–664
Banerjee O, Ghaoui LE, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J Mach Learn Res 9:485–516
Black F, Litterman R (1992) Global portfolio optimization. Finance Anal J 48(5):28–43
Bouchaud JP, Potters M (2009) Financial applications of random matrix theory: a short review. ar**v preprint ar**v:0910.1205
Brodie J, Daubechies I, De Mol C, Giannone D, Loris I (2009) Sparse and stable Markowitz portfolios. Proc Natl Acad Sci 106(30):12267–12272
Brownlees CT, Nualart E, Sun Y (2015) Realized networks. Working Paper, SSRN
Bruder B, Gaussel N, Richard JC, Roncalli T (2013) Regularization of portfolio allocation. Working Paper, SSRN
Cont R (2001) Empirical properties of asset returns: stylized facts and statistical issues. Quant Finance 1:223–236
DeMiguel V, Nogales FJ (2009) Portfolio selection with robust estimation. Oper Res 57:560–577
DeMiguel V, Garlappi L, Nogales F, Uppal R (2009a) A generalized approach to portfolio optimization: improving performance by constraining portfolio norm. Manag Sci 55:798–812
DeMiguel V, Garlappi L, Uppal R (2009b) Optimal versus naive diversification: how inefficient is the 1/N portfolio strategy? Rev Financ Stud 22(5):1915–1953
Dempster AP (1972) Covariance selection. Biometrics 28(1):157–175
Engle R (2002) Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J Bus Econ Stat 20(3):339–350
Fan J, Zhang J, Yu K (2012) Vast portfolio selection with gross-exposure constraints. J Am Stat Assoc 107(498):592–606
Finegold M, Drton M (2011) Robust graphical modeling of gene networks using classical and alternative t-distributions. Ann Appl Stat 5(2A):1057–1080
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Friedman J, Hastie T, Tibshirani R (2014) Glasso: graphical lasso-estimation of gaussian graphical models. R package
Goto S, Xu Y (2015) Improving mean variance optimization through sparse hedging restrictions. J Financ Quant Anal 50(6):1415–1441
Højsgaard S, Edwards D, Lauritzen S (2012) Graphical models with R. Springer, Berlin
Kan R, Zhou G (2007) Optimal portfolio choice with parameter uncertainty. J Financ Quant Anal 42(3):621–656
Kolm PN, Tütüncü R, Fabozzi F (2014) 60 years following Harry Markowitz’s contribution to portfolio theory and operations research. Eur J Oper Res 234(2):343–582
Kotz S, Nadarajah S (2004) Multivariate t-distributions and their applications. Cambridge University Press, Cambridge
Kremer PJ, Talmaciu A, Paterlini S (2018) Risk minimization in multi-factor portfolios: What is the best strategy? Ann Oper Res 266(1–2):255–291
Laloux L, Cizeau P, Bouchaud JP, Potters M (1999) Noise dressing of financial correlation matrices. Phys Rev Lett 83(7):1467–1469
Lam C, Fan J (2009) Sparsistency and rates of convergence in large covariance matrix estimation. Ann Stat 37(6B):4254
Lange KL, Little RJ, Taylor JM (1989) Robust statistical modeling using the t distribution. J Am Stat Assoc 84(408):881–896
Lauritzen SL (1996) Graph models, vol 17. Clarendon Press, Oxford
Ledoit O, Wolf M (2004a) Honey, i shrunk the sample covariance matrix. J Portf Manag 30(4):110–119
Ledoit O, Wolf M (2004b) A well-conditioned estimator for large-dimensional covariance matrices. J multivar anal 88(2):365–411
Ledoit O, Wolf M (2011) Robust performances hypothesis testing with the variance. Wilmott 55:86–89
Markowitz H (1952) Portfolio selection. J Finance 7(1):77–91
McLachlan G, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, Hoboken
Meucci A (2009) Risk and asset allocation. Springer, Berlin
Michaud RO (1989) The Markowitz optimization enigma: is optimized optimal? ICFA Contin Educ Ser 1989(4):43–54
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, London
Rothman AJ, Bickel PJ, Levina E, Zhu J et al (2008) Sparse permutation invariant covariance estimation. Electron J Stat 2:494–515
Stevens GV (1998) On the inverse of the covariance matrix in portfolio analysis. J Finance 53(5):1821–1827
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288
Witten DM, Friedman JH, Simon N (2011) New insights and faster computations for the graphical lasso. J Comput Graph Stat 20(4):892–900
Won JH, Lim J, Kim SJ, Rajaratnam B (2013) Condition-number-regularized covariance estimation. J R Stat Soc Ser B (Statistical Methodology) 75(3):427–450
Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35
Acknowledgements
Sandra Paterlini acknowledges ICT COST Action IC1408 from CRoNoS. Gabriele Torri acknowledges the support of the Czech Science Foundation (GACR) under project 17-19981S, 19-11965S and SP2018/34, an SGS research project of VSB-TU Ostrava. Rosella Giacometti and Gabriele Torri acknowledge the support given by University of Bergamo research funds 2016 2017.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A The glasso algorithm
Here we briefly describe the algorithm proposed by Friedman et al. (2008) to solve (6), the glasso model. For convenience, we define \(X_i\) as the ith element of X, and and \(X_\backslash i\) as the vector of all the elements of X except the ith. We also define the matrices \(\mathbf{G }\) to be the estimate of \(\varvec{\varSigma }\), and \(\mathbf{S }\) the sample covariance matrix. Furthermore, we identify the following partitions:Footnote 10
Banerjee et al. (2008) show that the solution for \(w_{\backslash i,i}\) can be computed by solving the following box-constrained quadratic program:
or in an equivalent way, by solving the dual problem
where \(c = \mathbf{G }_{\backslash i,\backslash i}^{-1/2}\mathbf{s }_{\backslash i,i}\) and \({\hat{\beta }}^{(i)} = \mathbf{G }_{\backslash i,\backslash i}^{-1}\mathbf{g }_{\backslash i, i}\). As noted by Friedman et al. (2008), (22) resembles a lasso least square problem (see Tibshirani 1996). The algorithm estimates then the ith variable on the others using as input \(\mathbf{G }_{\backslash i,\backslash i}\), where \(\mathbf{G }_{\backslash i,\backslash i}\) is the current estimate of the upper left block. The algorithm then updates the corresponding row and column of \(\mathbf{G }\) using \(\mathbf{g }_{\backslash i, i} = \mathbf{G }_{\backslash i,\backslash i}{\hat{\beta }}^{(i)}\) and cycles across the variables until convergence.
Glasso algorithm
-
1.
Start with \(\mathbf{G }= \mathbf{S }+ \rho \mathbf{I }\) . The diagonal of \(\mathbf{G }\) is unchanged in the next steps.
-
2.
For each \(i = 1,2,\ldots ,n,1,2,\ldots , n,\ldots \), solve the lasso problem (22), which takes as input \(\mathbf{G }_{\backslash i,\backslash i}\) and \(\mathbf{s }_{\backslash i,i}\). This gives a \(n - 1\) vector solution \({\hat{\beta }}\). Fill in the corresponding row and column of \(\mathbf{G }\) using \(\mathbf{g }_{\backslash i,i} = \mathbf{G }_{\backslash i,\backslash i}{\hat{\beta }}\).
-
3.
Repeat until a convergence criterion is satisfied.
The algorithm has a computational complexity of \(O(n^3)\) for dense problems, and considerably less than that for sparse problems (Friedman et al. 2008).
B Alternative covariance estimation methods
Here, we briefly describe the benchmark covariance estimators we use in the comparative analysis. Differently from glasso and tlasso, these approaches provide an estimate for the covariance matrix and not for the precision matrix. Hence, we compute the precision matrix for such methods to be plug-in into the minimum variance portfolio by inverting the covariance.
In particular, we consider the sample covariance and the equally weighted methods (that are commonly regarded as naive approaches) and two state-of-art estimators: random matrix theory and Ledoit Wolf Shrinkage.
The equally weighted (EW) portfolio, a tough benchmark to beat (DeMiguel et al. 2009b), can be interpreted as an extreme shrinkage estimator of the global minimum variance portfolio, obtained using the identity matrix as the estimate of the covariance matrix. Indeed, using (3), we obtain \(\hat{\mathbf{w }}_{EW} = \dfrac{\mathbf{I }\mathbf{1 }}{\mathbf{1 }' \mathbf{I }\mathbf{1 }} = \frac{1}{n}\mathbf{1 }\). By assuming zero correlations and equal variances, such approach is very conservative in terms of estimation error and it suitable in case of severe unpredictability of the parameters.
The second naive approach is the sample covariance estimator, defined as:
where t is the length of the estimation period, \(X_i\) is the multivariate variate vector of assets’ returns at time \(\tau \) and \({\bar{X}}\) is the vector of the average return for the n assets. Such estimator, when computed on datasets with a number of asset close to the length of the window size, is typically characterized by a larger eigenvalue dispersion compared to true covariance matrix, causing the matrix to be ill-conditioned (Meucci 2009). Therefore, when computing the precision matrix by inverting the covariance matrix, estimates are typically not reliable and unstable on different samples as its ill-conditioning nature amplifies the effects of the estimation error in the covariance matrix.
The shrinkage methodology of Ledoit-Wolf (LW) is well-known to better control for the presence of estimation errors, especially for datasets with a large ratio of n / t, where n is the number of assets and t the length of the estimation window. The Ledoit-Wolf shrinkage estimator is defined to be a convex combination of the sample covariance matrix \(\mathbf{S }\) and \(\widehat{\varvec{\varSigma }}_{T}\), a highly structured target estimator, such that \(\widehat{\varvec{\varSigma }}_{LW} = a \mathbf{S }+ (1-a) \widehat{\varvec{\varSigma }}_{T}\) with \(a \in [0,1]\). Following Ledoit and Wolf (2004a), we consider as structured estimator \(\widehat{\varvec{\varSigma }}_{T}\) the constant correlation matrix, such that all the pairwise correlations are identical and equal to the average of all the sample pairwise correlations. As the target estimator is characterized by good conditioning, the resulting shrinkage estimator \(\widehat{\varvec{\varSigma }}_{LW}\) has a smaller eigenvalues dispersion than the sample covariance matrix. In fact, the sample covariance matrix is shrunk towards the structured estimator, with intensity depending on the value of the shrinkage constanta. Ledoit–Wolf estimation of a is based on the minimization of the expected distance between \(\widehat{\varvec{\varSigma }}_{LW}\) and \(\varvec{\varSigma }\). For further details, the reader is referred to Ledoit and Wolf (2004a).Footnote 11
The last approach we focus on is the so called random matrix theory (RMT) estimator \(\widehat{\varvec{\varSigma }}_{RMT}\), introduced by Laloux et al. (1999). The approach is based on the fact that, in the case of financial time series, the smallest eigenvalues of the correlation matrices are often dominated by noise. From the known distribution of the eigenvalues of a random matrix, it is possible then to filter out the part of spectrum that is likely associated with estimation error and maintain only the eigenvalues that carry useful information (Laloux et al. 1999). In particular, when assuming i.i.d. returns, the eigenvalues of the sample correlation matrix are then distributed according to a Marcenko–Pastur (MP) distribution as a consequence of the estimation error. Therefore, we can compute the eigenvalues that correspond to noise based on the minimum and maximum eigenvalues of the theoretical distribution, such that:
where \(\lambda _{\min }\) and \(\lambda _{\max }\) are the theoretical smallest and largest eigenvalues in a \(n\times n\) random covariance matrix estimated by a sample of t observations and \(\sigma ^2\) is the variance of the i.i.d. asset returns. Only the eigenvalues outside the interval [\(\lambda _{\min }\), \(\lambda _{\max }\)] are then assumed to bring useful information, while the others correspond to noise. Here, we estimate the covariance matrix then by eigenvalue clip**, a technique that consists in substituting the eigenvalues smaller than \(\lambda _{\max }\) with their average:
where \(\mathbf{V }\) represents the eigenvectors of the sample covariance matrix and \(\varvec{\varLambda }_{RMT}\) is the diagonal matrix with the ordered eigenvalues, where the eigenvalues \(\lambda \le \lambda _{\max }\) are substituted by their average (Bouchaud and Potters 2009). The RMT filtering has then the effect of averaging the lowest eigenvalues, improving the conditioning of the matrix and therefore reducing the sensitivity of the precision matrix to estimation errors.
For further details the reader is refereed to Laloux et al. (1999), Bouchaud and Potters (2009) and Bruder et al. (2013).
Rights and permissions
About this article
Cite this article
Torri, G., Giacometti, R. & Paterlini, S. Sparse precision matrices for minimum variance portfolios. Comput Manag Sci 16, 375–400 (2019). https://doi.org/10.1007/s10287-019-00344-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10287-019-00344-6