Log in

Sparse precision matrices for minimum variance portfolios

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

Financial crises are typically characterized by highly positively correlated asset returns due to the simultaneous distress on almost all securities, high volatilities and the presence of extreme returns. In the aftermath of the 2008 crisis, investors were prompted even further to look for portfolios that minimize risk and can better deal with estimation error in the inputs of the asset allocation models. The minimum variance portfolio à la Markowitz is considered the reference model for risk minimization in equity markets, due to its simplicity in the optimization as well as its need for just one input estimate: the inverse of the covariance estimate, or the so-called precision matrix. In this paper, we propose a data-driven portfolio framework based on two regularization methods, glasso and tlasso, that provide sparse estimates of the precision matrix by penalizing its \(L_1\)-norm. Glasso and tlasso rely on asset returns Gaussianity or t-Student assumptions, respectively. Simulation and real-world data results support the proposed methods compared to state-of-art approaches, such as random matrix and Ledoit–Wolf shrinkage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The original specification proposed by Friedman et al. (2008) applied the penalty to the entire matrix \(\varvec{\varOmega }\). The version of the model with the penalty applied to \(\varvec{\varOmega }\) is the one studied by Rothman et al. (2008) and is currently implemented in the R package ‘glasso’ (Friedman et al. 2014).

  2. See Theorem 2 and Technical Condition (B) in Lam and Fan (2009).

  3. see Theorem 1, Banerjee et al. (2008).

  4. Notice that this representation implies a permutation of the rows and columns to have the ith asset as the last one.

  5. \(v_i\) can be interpreted as the unhedgeable component of \(X_{i,t}\).

  6. In the original model the factors followed a multivariate normal distribution (Fan et al. 2012). We used a t-Student to capture the leptokurtic distribution of financial time series (Cont 2001).

  7. In the case of glasso we refer to the likelihood of a multivariate normal distribution, while with tlasso we refer to the one of a multivariate t-Student distribution.

  8. The result follows from Corollary 1 in Witten et al. (2011), according to which the ith node is fully unconnected to all other nodes if and only if \(|\varvec{\varSigma }_{ij}| \le \rho \quad \forall i \ne j\). When \(\varvec{\varSigma }\) is the correlation matrix, all its elements are smaller or equal to one and therefore for \(\rho = 1\) all the elements are disconnected, that is, the precision matrix is diagonal.

  9. http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.

  10. The dimension of \(\mathbf{G }_{\backslash i,\backslash i}\), \(g_{\backslash i,i}\) and \(g_{i,i}\) are respectively \(((n-1)\times (n-1))\), \(((n-1)\times 1)\) and \((1\times 1)\).

  11. Interestingly, the Ledoit–Wolf shrinkage is closely related to portfolio optimization with \(L_2\) penalization of weight estimates. Indeed, the optimization problem \(\min _{\mathbf{w }\in C}(\mathbf{w }'\widehat{\varvec{\varSigma }}\mathbf{w }+ a\mathbf{w }'\mathbf{w })\), with \(C = \{{\mathbf{w }} | {{\mathbf{1 }}{'}}\mathbf{w }=1\}\) can be equivalently stated as \(\min _{\mathbf{w }\in C}(\mathbf{w }'(\widehat{\varvec{\varSigma }}+a \mathbf{I }) \mathbf{w })\), which then is equivalent to solving the problem using the Ledoit–Wolf shrinkage estimator with \(\widehat{\varvec{\varSigma }}_T=\mathbf{I }\) (Bruder et al. 2013).

References

  • Baba K, Shibata R, Sibuya M (2004) Partial correlation and conditional correlation as measures of conditional independence. Aust N Z J Stat 46(4):657–664

    Article  Google Scholar 

  • Banerjee O, Ghaoui LE, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J Mach Learn Res 9:485–516

    Google Scholar 

  • Black F, Litterman R (1992) Global portfolio optimization. Finance Anal J 48(5):28–43

    Article  Google Scholar 

  • Bouchaud JP, Potters M (2009) Financial applications of random matrix theory: a short review. ar**v preprint ar**v:0910.1205

  • Brodie J, Daubechies I, De Mol C, Giannone D, Loris I (2009) Sparse and stable Markowitz portfolios. Proc Natl Acad Sci 106(30):12267–12272

    Article  Google Scholar 

  • Brownlees CT, Nualart E, Sun Y (2015) Realized networks. Working Paper, SSRN

  • Bruder B, Gaussel N, Richard JC, Roncalli T (2013) Regularization of portfolio allocation. Working Paper, SSRN

  • Cont R (2001) Empirical properties of asset returns: stylized facts and statistical issues. Quant Finance 1:223–236

    Article  Google Scholar 

  • DeMiguel V, Nogales FJ (2009) Portfolio selection with robust estimation. Oper Res 57:560–577

    Article  Google Scholar 

  • DeMiguel V, Garlappi L, Nogales F, Uppal R (2009a) A generalized approach to portfolio optimization: improving performance by constraining portfolio norm. Manag Sci 55:798–812

    Article  Google Scholar 

  • DeMiguel V, Garlappi L, Uppal R (2009b) Optimal versus naive diversification: how inefficient is the 1/N portfolio strategy? Rev Financ Stud 22(5):1915–1953

    Article  Google Scholar 

  • Dempster AP (1972) Covariance selection. Biometrics 28(1):157–175

    Article  Google Scholar 

  • Engle R (2002) Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J Bus Econ Stat 20(3):339–350

    Article  Google Scholar 

  • Fan J, Zhang J, Yu K (2012) Vast portfolio selection with gross-exposure constraints. J Am Stat Assoc 107(498):592–606

    Article  Google Scholar 

  • Finegold M, Drton M (2011) Robust graphical modeling of gene networks using classical and alternative t-distributions. Ann Appl Stat 5(2A):1057–1080

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2014) Glasso: graphical lasso-estimation of gaussian graphical models. R package

  • Goto S, Xu Y (2015) Improving mean variance optimization through sparse hedging restrictions. J Financ Quant Anal 50(6):1415–1441

    Article  Google Scholar 

  • Højsgaard S, Edwards D, Lauritzen S (2012) Graphical models with R. Springer, Berlin

    Book  Google Scholar 

  • Kan R, Zhou G (2007) Optimal portfolio choice with parameter uncertainty. J Financ Quant Anal 42(3):621–656

    Article  Google Scholar 

  • Kolm PN, Tütüncü R, Fabozzi F (2014) 60 years following Harry Markowitz’s contribution to portfolio theory and operations research. Eur J Oper Res 234(2):343–582

    Article  Google Scholar 

  • Kotz S, Nadarajah S (2004) Multivariate t-distributions and their applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Kremer PJ, Talmaciu A, Paterlini S (2018) Risk minimization in multi-factor portfolios: What is the best strategy? Ann Oper Res 266(1–2):255–291

    Article  Google Scholar 

  • Laloux L, Cizeau P, Bouchaud JP, Potters M (1999) Noise dressing of financial correlation matrices. Phys Rev Lett 83(7):1467–1469

    Article  Google Scholar 

  • Lam C, Fan J (2009) Sparsistency and rates of convergence in large covariance matrix estimation. Ann Stat 37(6B):4254

    Article  Google Scholar 

  • Lange KL, Little RJ, Taylor JM (1989) Robust statistical modeling using the t distribution. J Am Stat Assoc 84(408):881–896

    Google Scholar 

  • Lauritzen SL (1996) Graph models, vol 17. Clarendon Press, Oxford

    Google Scholar 

  • Ledoit O, Wolf M (2004a) Honey, i shrunk the sample covariance matrix. J Portf Manag 30(4):110–119

    Article  Google Scholar 

  • Ledoit O, Wolf M (2004b) A well-conditioned estimator for large-dimensional covariance matrices. J multivar anal 88(2):365–411

    Article  Google Scholar 

  • Ledoit O, Wolf M (2011) Robust performances hypothesis testing with the variance. Wilmott 55:86–89

    Article  Google Scholar 

  • Markowitz H (1952) Portfolio selection. J Finance 7(1):77–91

    Google Scholar 

  • McLachlan G, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, Hoboken

    Google Scholar 

  • Meucci A (2009) Risk and asset allocation. Springer, Berlin

    Google Scholar 

  • Michaud RO (1989) The Markowitz optimization enigma: is optimized optimal? ICFA Contin Educ Ser 1989(4):43–54

    Article  Google Scholar 

  • Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, London

    Google Scholar 

  • Rothman AJ, Bickel PJ, Levina E, Zhu J et al (2008) Sparse permutation invariant covariance estimation. Electron J Stat 2:494–515

    Article  Google Scholar 

  • Stevens GV (1998) On the inverse of the covariance matrix in portfolio analysis. J Finance 53(5):1821–1827

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288

    Google Scholar 

  • Witten DM, Friedman JH, Simon N (2011) New insights and faster computations for the graphical lasso. J Comput Graph Stat 20(4):892–900

    Article  Google Scholar 

  • Won JH, Lim J, Kim SJ, Rajaratnam B (2013) Condition-number-regularized covariance estimation. J R Stat Soc Ser B (Statistical Methodology) 75(3):427–450

    Article  Google Scholar 

  • Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35

    Article  Google Scholar 

Download references

Acknowledgements

Sandra Paterlini acknowledges ICT COST Action IC1408 from CRoNoS. Gabriele Torri acknowledges the support of the Czech Science Foundation (GACR) under project 17-19981S, 19-11965S and SP2018/34, an SGS research project of VSB-TU Ostrava. Rosella Giacometti and Gabriele Torri acknowledge the support given by University of Bergamo research funds 2016 2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosella Giacometti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A The glasso algorithm

Here we briefly describe the algorithm proposed by Friedman et al. (2008) to solve (6), the glasso model. For convenience, we define \(X_i\) as the ith element of X, and and \(X_\backslash i\) as the vector of all the elements of X except the ith. We also define the matrices \(\mathbf{G }\) to be the estimate of \(\varvec{\varSigma }\), and \(\mathbf{S }\) the sample covariance matrix. Furthermore, we identify the following partitions:Footnote 10

$$\begin{aligned} \mathbf{G }= \begin{pmatrix} \mathbf{G }_{\backslash i,\backslash i} &{} \mathbf{g }_{\backslash i,i}\\ \mathbf{g }_{\backslash i,i}' &{} g_{i,i} \end{pmatrix}, \qquad \mathbf{S }= \begin{pmatrix} \mathbf{S }_{\backslash i, \backslash i} &{} \mathbf{s }_{\backslash i,i}\\ \mathbf{s }_{\backslash i ,i}' &{} s_{i,i} \end{pmatrix}. \end{aligned}$$
(20)

Banerjee et al. (2008) show that the solution for \(w_{\backslash i,i}\) can be computed by solving the following box-constrained quadratic program:

$$\begin{aligned} g_{\backslash i,i} = \arg \min _y \left\{ y'\mathbf{G }_{\backslash i,\backslash i}^{-1}y:||y-\mathbf{s }_{\backslash i,i}||_{\infty } \le \rho \right\} , \end{aligned}$$
(21)

or in an equivalent way, by solving the dual problem

$$\begin{aligned} \min _{\beta ^{(i)}}\left\{ \frac{1}{2}||\mathbf{G }_{\backslash i,\backslash i}^{1/2}\beta ^{(i)}-c||^2+\rho ||\beta ^{(i)}||_1\right\} , \end{aligned}$$
(22)

where \(c = \mathbf{G }_{\backslash i,\backslash i}^{-1/2}\mathbf{s }_{\backslash i,i}\) and \({\hat{\beta }}^{(i)} = \mathbf{G }_{\backslash i,\backslash i}^{-1}\mathbf{g }_{\backslash i, i}\). As noted by Friedman et al. (2008), (22) resembles a lasso least square problem (see Tibshirani 1996). The algorithm estimates then the ith variable on the others using as input \(\mathbf{G }_{\backslash i,\backslash i}\), where \(\mathbf{G }_{\backslash i,\backslash i}\) is the current estimate of the upper left block. The algorithm then updates the corresponding row and column of \(\mathbf{G }\) using \(\mathbf{g }_{\backslash i, i} = \mathbf{G }_{\backslash i,\backslash i}{\hat{\beta }}^{(i)}\) and cycles across the variables until convergence.

Glasso algorithm

  1. 1.

    Start with \(\mathbf{G }= \mathbf{S }+ \rho \mathbf{I }\) . The diagonal of \(\mathbf{G }\) is unchanged in the next steps.

  2. 2.

    For each \(i = 1,2,\ldots ,n,1,2,\ldots , n,\ldots \), solve the lasso problem (22), which takes as input \(\mathbf{G }_{\backslash i,\backslash i}\) and \(\mathbf{s }_{\backslash i,i}\). This gives a \(n - 1\) vector solution \({\hat{\beta }}\). Fill in the corresponding row and column of \(\mathbf{G }\) using \(\mathbf{g }_{\backslash i,i} = \mathbf{G }_{\backslash i,\backslash i}{\hat{\beta }}\).

  3. 3.

    Repeat until a convergence criterion is satisfied.

The algorithm has a computational complexity of \(O(n^3)\) for dense problems, and considerably less than that for sparse problems (Friedman et al. 2008).

B Alternative covariance estimation methods

Here, we briefly describe the benchmark covariance estimators we use in the comparative analysis. Differently from glasso and tlasso, these approaches provide an estimate for the covariance matrix and not for the precision matrix. Hence, we compute the precision matrix for such methods to be plug-in into the minimum variance portfolio by inverting the covariance.

In particular, we consider the sample covariance and the equally weighted methods (that are commonly regarded as naive approaches) and two state-of-art estimators: random matrix theory and Ledoit Wolf Shrinkage.

The equally weighted (EW) portfolio, a tough benchmark to beat (DeMiguel et al. 2009b), can be interpreted as an extreme shrinkage estimator of the global minimum variance portfolio, obtained using the identity matrix as the estimate of the covariance matrix. Indeed, using (3), we obtain \(\hat{\mathbf{w }}_{EW} = \dfrac{\mathbf{I }\mathbf{1 }}{\mathbf{1 }' \mathbf{I }\mathbf{1 }} = \frac{1}{n}\mathbf{1 }\). By assuming zero correlations and equal variances, such approach is very conservative in terms of estimation error and it suitable in case of severe unpredictability of the parameters.

The second naive approach is the sample covariance estimator, defined as:

$$\begin{aligned} \mathbf{S }= \frac{1}{t-1} \sum _{\tau =1}^t (X_{\tau }-{\bar{X}}) (X_{\tau }-{\bar{X}})', \end{aligned}$$
(23)

where t is the length of the estimation period, \(X_i\) is the multivariate variate vector of assets’ returns at time \(\tau \) and \({\bar{X}}\) is the vector of the average return for the n assets. Such estimator, when computed on datasets with a number of asset close to the length of the window size, is typically characterized by a larger eigenvalue dispersion compared to true covariance matrix, causing the matrix to be ill-conditioned (Meucci 2009). Therefore, when computing the precision matrix by inverting the covariance matrix, estimates are typically not reliable and unstable on different samples as its ill-conditioning nature amplifies the effects of the estimation error in the covariance matrix.

The shrinkage methodology of Ledoit-Wolf (LW) is well-known to better control for the presence of estimation errors, especially for datasets with a large ratio of n / t, where n is the number of assets and t the length of the estimation window. The Ledoit-Wolf shrinkage estimator is defined to be a convex combination of the sample covariance matrix \(\mathbf{S }\) and \(\widehat{\varvec{\varSigma }}_{T}\), a highly structured target estimator, such that \(\widehat{\varvec{\varSigma }}_{LW} = a \mathbf{S }+ (1-a) \widehat{\varvec{\varSigma }}_{T}\) with \(a \in [0,1]\). Following Ledoit and Wolf (2004a), we consider as structured estimator \(\widehat{\varvec{\varSigma }}_{T}\) the constant correlation matrix, such that all the pairwise correlations are identical and equal to the average of all the sample pairwise correlations. As the target estimator is characterized by good conditioning, the resulting shrinkage estimator \(\widehat{\varvec{\varSigma }}_{LW}\) has a smaller eigenvalues dispersion than the sample covariance matrix. In fact, the sample covariance matrix is shrunk towards the structured estimator, with intensity depending on the value of the shrinkage constanta. Ledoit–Wolf estimation of a is based on the minimization of the expected distance between \(\widehat{\varvec{\varSigma }}_{LW}\) and \(\varvec{\varSigma }\). For further details, the reader is referred to Ledoit and Wolf (2004a).Footnote 11

The last approach we focus on is the so called random matrix theory (RMT) estimator \(\widehat{\varvec{\varSigma }}_{RMT}\), introduced by Laloux et al. (1999). The approach is based on the fact that, in the case of financial time series, the smallest eigenvalues of the correlation matrices are often dominated by noise. From the known distribution of the eigenvalues of a random matrix, it is possible then to filter out the part of spectrum that is likely associated with estimation error and maintain only the eigenvalues that carry useful information (Laloux et al. 1999). In particular, when assuming i.i.d. returns, the eigenvalues of the sample correlation matrix are then distributed according to a Marcenko–Pastur (MP) distribution as a consequence of the estimation error. Therefore, we can compute the eigenvalues that correspond to noise based on the minimum and maximum eigenvalues of the theoretical distribution, such that:

$$\begin{aligned} \lambda _{\min \max } = \sigma ^2 \big (1\pm \sqrt{n/t}\big )^2, \end{aligned}$$
(24)

where \(\lambda _{\min }\) and \(\lambda _{\max }\) are the theoretical smallest and largest eigenvalues in a \(n\times n\) random covariance matrix estimated by a sample of t observations and \(\sigma ^2\) is the variance of the i.i.d. asset returns. Only the eigenvalues outside the interval [\(\lambda _{\min }\), \(\lambda _{\max }\)] are then assumed to bring useful information, while the others correspond to noise. Here, we estimate the covariance matrix then by eigenvalue clip**, a technique that consists in substituting the eigenvalues smaller than \(\lambda _{\max }\) with their average:

$$\begin{aligned} \widehat{\varvec{\varSigma }}_{RMT} = \mathbf{V }\varvec{\varLambda }_{RMT}\mathbf{V }', \end{aligned}$$
(25)

where \(\mathbf{V }\) represents the eigenvectors of the sample covariance matrix and \(\varvec{\varLambda }_{RMT}\) is the diagonal matrix with the ordered eigenvalues, where the eigenvalues \(\lambda \le \lambda _{\max }\) are substituted by their average (Bouchaud and Potters 2009). The RMT filtering has then the effect of averaging the lowest eigenvalues, improving the conditioning of the matrix and therefore reducing the sensitivity of the precision matrix to estimation errors.

For further details the reader is refereed to Laloux et al. (1999), Bouchaud and Potters (2009) and Bruder et al. (2013).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torri, G., Giacometti, R. & Paterlini, S. Sparse precision matrices for minimum variance portfolios. Comput Manag Sci 16, 375–400 (2019). https://doi.org/10.1007/s10287-019-00344-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10287-019-00344-6

Keywords

Navigation