1 Introduction

The poor out-of-sample performance of the mean-variance portfolio optimization procedure introduced by Markowitz (1952) is very well known and documented in the literature (see, e.g., Michaud, 1989; Broadie, 1993; Chopra andZiemba, 1993). Given a set of risky assets and a risk-free asset, the optimal mean-variance portfolio weights are given by the closed form solution

$$\begin{aligned} w_{mv} = \frac{1}{\gamma }\Sigma ^{-1}\mu , \end{aligned}$$
(1)

where \( \mu \) is the vector of mean returns, \(\Sigma \) is the covariance matrix and \(\gamma \) is the risk aversion coefficient. Analogously, if the investor specifies his preferences through a desired mean portfolio return instead of a risk aversion level, the weights

$$\begin{aligned} w_{mv} = \frac{Re}{\mu '\Sigma ^{-1}\mu }\Sigma ^{-1}\mu \end{aligned}$$
(2)

minimize the portfolio variance given the target return Re. Since \(\mu \) and \(\Sigma \) are unknown in practice, they have to be estimated. The classical plug-in approach consists in computing the sample estimates \( {\hat{\mu }} \) and \( {\hat{\Sigma }} \), and treating them as if they were the true values. However, the plug-in approach leads to disappointing outcomes because of estimation errors (Kan and Zhou , 2007). Moreover, the mean-variance utility framework rests on unrealistic assumptions, and the development of alternative approaches for the optimization and evaluation of investments is an active field of research (see e.g. Grau-Carles et al., 2019; Amde-Manesme & Barthlmy, 2020; Geissel et al., 2022) for some recent works in this area). Nevertheless, the impact of estimation errors can frustrate the attempts to obtain portfolios more in line with the investors’ real preferences. In fact, the problem gets even worse if the variance is replaced by downside risk measures (e.g. semivariance or CVaR), which are likely more suitable to quantify risk but are also more affected by estimation errors (Rigamonti , 2020). While studies on real empirical data have shown the importance of targeting the semivariance (see e.g. Santamaria & Bravo, 2013), up to now limited research has been done to address parameter uncertainty in this framework. Sharma et al. (2017) propose an Omega-CVaR optimization approach to avoid some of the difficulties related to the estimation of the return distribution shape or moments. Recently, fuzzy portfolio selection models have been developed as a novel approach to target various risk measures, including those used in a downside risk framework (see e.g. Chen et al., 2018a, b; Yue et al., 2019).

In this paper, we address the problem by taking as a starting point one of the solutions devised to mitigate the impact of estimation errors in a mean-variance framework. We build on the strategy introduced by Chen and Yuan (2016), who propose to estimate a subspace mean-variance portfolio instead of the theoretically optimal mean-variance portfolio. This is done by considering only the first d eigenvectors of the sample covariance matrix \( {\hat{\Sigma }} \) instead of the full matrix. This idea is consistent with the theoretical background of several mainstream asset pricing models such as the capital asset pricing model and the arbitrage pricing theory. In those models asset returns are driven by systematic risks captured by factors. Methodologically, we exploit a principal component analysis to extract those factors.

If the reduced dimension d is carefully chosen, the systematic error caused by restricting the investment options is outweighed by the lower estimation error achieved thanks to the reduced dimension of the problem. Inspired by their setup, we apply a similar principle to optimize mean-semivariance and minimum semivariance portfolios constructed using the approximation of the semicovariance matrix with the technique introduced by Rigamonti et al. (2021). By focusing on a risk measure that only penalizes the negative deviations of the asset returns we might capture a more realistic reflection of the investor’s preferences.

For the choice of d, Chen and Yuan (2016) rely on a criterion introduced by Bai and Ng (2002) that depends on the length of the estimation window. However, when computing the semicovariance matrix not all information available from the estimation window is incorporated. Therefore, the Bai and Ng (2002) criterion may not be suitable. To overcome this obstacle, we transform the semicovariance matrix in a downside correlation matrix, and select d using the Minimum Average Partial (MAP) rule introduced by Velicer (1976), which does not depend on the length of the estimation window. We apply our portfolio rule to three empirical datasets of different size. The results show that our strategy outperforms the portfolios based on the full semicovariance matrix, and performs well even compared to portfolios that target the variance. Moreover, contrary to existing approaches in the literature, our strategy does not involve complicated algorithms to compute the optimal weights, which are instead obtained through simple closed form solutions.

This paper provides three main contributions. First, we address the problem of parameter uncertainty for portfolios based on the semicovariance matrix. Second, we perform for the first time a principal component analysis (PCA) in a downside variability framework. Finally, we show the effectiveness of MAP for asset allocation decisions.

The remainder of the paper is organized as follows. Section 2 describes the dimension reduction methodology and the asset allocation procedure. Section 3 discusses details of principal component analysis when applied to the semicovariance matrix. Section 4 provides an empirical application to real data. Section 5 concludes.

2 Methodology

2.1 Subspace efficient portfolios

The idea which our approach is based on is analogous to that of Chen and Yuan (2016). For an investor with a certain risk aversion \(\gamma \), given the sample estimates \( {\hat{\mu }} \) and \({\hat{\Sigma }}\) and the number d of eigenvalues to retain, they compute a subspace mean-variance portfolio whose weights are

$$\begin{aligned} {\hat{w}}_{mv,cov} = \frac{1}{\gamma }\sum _{k=1}^d {\hat{\theta }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }} , \end{aligned}$$
(3)

where \({\hat{\theta }}_{k}\) is an eigenvalue of \( {\hat{\Sigma }} \) and \({\hat{\nu }}_{k}\) the corresponding eigenvector, with \({\hat{\theta }}_{1}\ge {\hat{\theta }}_{2}\ge \cdots \ge {\hat{\theta }}_{d}\). If we consider an investor with a desired mean portfolio return Re instead, the portfolio weights are

$$\begin{aligned} {\hat{w}}_{mv,cov} = \frac{Re}{{\hat{\mu }}'\sum _{k=1}^d {\hat{\theta }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }}}\sum _{k=1}^d {\hat{\theta }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }} . \end{aligned}$$
(4)

For the choice of d, Chen and Yuan (2016) rely on a criterion introduced by Bai and Ng (2002), and estimate it as

$$\begin{aligned} {\hat{d}}={{\,\mathrm{\arg \!\min }\,}}_{1\le k\le k_{max}} \left\{ \log (\sum _{i>k}^d {\hat{\theta }}^{2}_{k})+\frac{k(N+T)}{NT}\log \left( \frac{NT}{N+T} \right) \right\} , \end{aligned}$$
(5)

where N is the number of assets, T is the estimation window and \( k_{max} \) is prespecified.

An alternative decomposition based on the eigenvalues of the correlation matrix is proposed by Peralta and Zareei (2016). Inspired by the network eigenvector centrality they decompose the weights \({\hat{w}}^{*}_{mv}\) in the following way:

$$\begin{aligned} {\hat{w}}^{*}_{mv} = \frac{Re}{{\hat{\mu }}'{\hat{\Sigma }}^{-1}{\hat{\mu }}}{\hat{\mu }}^{*}+\frac{Re}{{\hat{\mu }}'{\hat{\Sigma }}^{-1}{\hat{\mu }}}\sum _{k=1}^N ({\hat{\lambda }}^{-1}_{k}-1){\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }}^{*} , \end{aligned}$$
(6)

where \( {\hat{\lambda }}_{k} \) is an eigenvalue of the correlation matrix \(\Omega \) and \({\hat{\nu }}_{k}\) the corresponding eigenvector, and \( {\hat{\mu }}^{*} \) for every asset i corresponds to \({\hat{\mu }}_{i}/{\hat{\sigma }}_{i}\), with \({\hat{\mu }}_{i}\) and \({\hat{\sigma }}_{i}\) the estimated mean and standard deviation of that asset respectively. The optimal mean-variance weight for each asset i is then given by \( {\hat{w}}_{mv,i}={\hat{w}}^{*}_{mv,i}/{\hat{\sigma }}_{i}\).Footnote 1

Peralta and Zareei (2016) consider assets as a network and concentrate their attention on the first eigenvector of the correlation matrix \(\Omega \). They set the elements on the diagonal to zero (to discard loops). Thus, they compute the eigenvector centrality values of the network of assets, and build an asset allocation rule based on it. We take a different approach and rewrite Eq. (6) asFootnote 2

$$\begin{aligned} {\hat{w}}^{*}_{mv,cor} = \frac{Re}{{\hat{\mu }}'^{*}\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }}^{*}}\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }}^{*} , \end{aligned}$$
(7)

so that the weight for each asset i can then be obtained as \( {\hat{w}}_{mv,cor,i} = {\hat{w}}^{*}_{mv,cor,i}/{\hat{\sigma }}_{i} \). In this way we can apply the same dimension reduction technique proposed by Chen and Yuan (2016), but using the correlation matrix instead of the covariance matrix. Equations (4) and (7) give the same result (the mean-variance weights obtained with the plug-in approach) when \( d=N \). However, when \( d<N \), weights \( {\hat{w}}_{cov} \) and \( {\hat{w}}_{cor} \) might differ, as the eigenvalues and eigenvectors of the correlation matrix are different from those of the covariance matrix, and not directly related to them (Jolliffe and Cadima , 2016).

The estimation error in the sample mean is typically so big that minimizing the variance leads to portfolios with a higher Sharpe ratio than those computed via mean-variance optimization (see e.g. Jagannathan & Ma, 2003). Therefore, we also consider the minimum-variance portfolio, whose weights are given by

$$\begin{aligned} w_{v} = \frac{1}{{\mathbf{1}}'\Sigma ^{-1}{} \mathbf{1}}\Sigma ^{-1}{} \mathbf{1} , \end{aligned}$$
(8)

where \( \mathbf{1} \) is a vector of ones with length equal to the number of risky assets. The true value of \(\Sigma \) is unknown and has to be replaced by the sample estimate or the shrinkage estimate proposed by Ledoit and Wolf (2004). The strategy proposed by Chen and Yuan (2016) can be modified to compute a subspace minimum-variance portfolio, with weights given by

$$\begin{aligned} {\hat{w}}_{v,cov} = \frac{1}{\mathbf{1}'\sum _{k=1}^d {\hat{\theta }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{} \mathbf{1}}\sum _{k=1}^d {\hat{\theta }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{} \mathbf{1} . \end{aligned}$$
(9)

Finally, in the framework of our strategy based on the correlation matrix, the subspace minimum-variance portfolio is computed using

$$\begin{aligned} {\hat{w}}^{*}_{v,cor} = \frac{1}{\epsilon '\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}\epsilon }\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}\epsilon , \end{aligned}$$
(10)

where \( \epsilon \) for every asset i corresponds to \(1/{\hat{\sigma }}_{i}\), and the weight for each asset i can then be obtained as \( {\hat{w}}_{v,cor,i} = {\hat{w}}^{*}_{v,cor,i}/{\hat{\sigma }}_{i} \).

We now need to decide how many eigenvalues should be retained, and this is where the advantage of focusing on the correlation matrix instead of the covariance matrix becomes apparent. Warne and Larsen (2014) test several rules and find that the best performing one for factor analysis appears to be the Minimum Average Partial (MAP) introduced by Velicer (1976). This method requires to perform a PCA on the correlation matrix. Before describing how MAP works, we quickly recall the concept of partial correlation. Partial correlation measures, on a scale from − 1 to 1, the linear association between two variables X and Y, while controlling for the effect of a set of additional variables \( \mathbf{Z} \).Footnote 3 More formally, the partial correlation \( \rho _{X,Y\cdot Z} \) between X and Y controlling for \( \mathbf{Z} \) is given by the correlation coefficient between the residuals obtained from the linear regression of X on \( \mathbf{Z} \) and Y on \( \mathbf{Z} \). If one variable is partialed out of correlation we have a first order partial correlation; if two variables are partialed out we get a second order partial correlation, and so on. First order partial correlations can also be computed, as in Velicer (1976), using the formula

$$\begin{aligned} \rho _{i,j\cdot \nu _{k}}=\frac{\rho _{i,j} - \rho _{i,\nu _{k}}\rho _{j,\nu _{k}}}{((1-\rho _{i,\nu _{k}}^2)(1-\rho _{j,\nu _{k}}^2))^{1/2}} , \end{aligned}$$
(11)

where i and j are any pair of the N variables and \(\nu _{k}\) is one of the N principal components. When we partial out a component we are removing both systematic and individual variability from the data. Removing the former decreases the value of the numerator, while removing the latter decreases the value of the denominator. As k goes from 1 to N, less systematic and more individual variability gets removed when the \(\nu _{k}\) component is partialed out, which means that the numerator decreases slower and the denominator faster. There is a value of k at which more individual than systematic variability gets removed from the data, which makes the denominator drop more than the numerator. This results in \( \rho _{i,j\cdot \nu _{k}} \) stop** going down and starting going up. The MAP criterion involves computing, for every pair of variables, the value of \( \rho _{i,j\cdot \nu _{k}} \), and from these values the average squared partial correlation:

$$\begin{aligned} MAP_{k}=\sum _{i=1}^N\sum _{j=1}^N\frac{\rho _{i,j\cdot \nu _{k}}^2}{N(N-1)} , \end{aligned}$$
(12)

with \( i\ne j \). As k goes from 1 to N, the value of \( MAP_{k} \) decreases until it reaches a minimum before starting to increase, and that k for which \( MAP_{k} \) is minimum is the optimal number of components/factors to retain.

Despite the good performance found by Warne and Larsen (2014), MAP is not commonly employed in finance, and its application is mostly confined to psychometrics. One notable exception is given by Claeys and Vašíček (2004), who use MAP in their study on spillover and contagion effects in the European sovereign bond markets. To the best of our knowledge, we are the first to apply MAP in the context of financial asset allocation.

MAP does not depend on the sample size, and can therefore be directly applied even to mean-semivariance and minimum semivariance optimization problems, that require as input a semicovariance matrix. To understand why this is problematic in this setting, consider the formula for the semivariance of a single asset, provided by Markowitz (1959):

$$\begin{aligned} \sigma _{B}^2=\frac{1}{T} \sum _{t=1}^T[{{\,\mathrm{\mathrm{Min}}\,}}(r_{t}-B,0)]^2 , \end{aligned}$$
(13)

where T is the estimation window, \( r_{t} \) is the return of the asset in period t, and the benchmark B is the value below which the investor considers variability to be downside variability. In Eq. (13) we have the full sample size T, but to compute the sum we only consider the periods in which the return of the asset is lower than B. The computation of the semicovariance matrix \(\Sigma _{B,p}\), described in Markowitz (1959), is based on an analogous principle. This makes the Bai and Ng (2002) criterion unsuitable to be used, since we are not taking advantage of all the information included in the estimation window. Using T as parameter in Eq. (5) would be inappropriate, and so would be using the number of periods used to compute the numerator. Relying on MAP solves this issue.

There is one issue that needs to be carefully considered when implementing MAP. The method tends to overestimate the number of components to retain when factors are correlated (a frequent situation in real datasets) and when the sample size is not big enough. In order to correct for this effect we take advantage of the Kaiser-Guttman criterion, first proposed by Guttman (1954) and then popularized by Kaiser (1960). Applied to a correlation matrix, this rule provides an upper bound for the number of components to be retained by stating that only those with an eigenvalue greater than 1 should be retained.Footnote 4 Like MAP, this rule does not rely on the sample size used for the estimation, and can therefore be applied in our setting. In short, we use MAP to determine the exact number of components to be retained, but restrict this choice to the components associated to eigenvalues greater than 1 in order to prevent overfactoring.

2.2 Semicovariance matrix estimation

As optimization problems that use \(\Sigma _{B,p}\) are intractable (see Estrada, 2008), we need to use as input some matrix that approximates it. One such strategy is proposed by Estrada (2008), who devises a heuristic that involves computing the elements of the semicovariance matrix with respect to when the single assets, and not the portfolio as a whole, underperform the benchmark. This procedure yields an exogenous matrix, contrary to the exact semicovariance matrix \(\Sigma _{B,p}\), which is endogenous, i.e. its terms change every time the portfolio weights change. However, the approximation error in this matrix can be substantial, and this can seriously hurt the portfolio performance (see Cheremushki, 2009; Rigamonti, 2020). We therefore resort to the smoothed semivariance (SSV) estimator introduced by Rigamonti et al. (2021), which provides an extremely precise approximation of the sample semicovariance matrix using a simple re-weighting scheme that simultaneously estimates the optimal portfolio weights and the corresponding semicovariance matrix \({\hat{\Sigma }}_{B}\). Depending on the value of a single parameter \(\theta \) the algorithm can perform a smoothing of different intensity of the estimation. This leads to the estimation of the sample semicovariance matrix when \(\theta \rightarrow 0\), of the sample covariance matrix when \(\theta \rightarrow \infty \), and a mixture of the two with an intermediate value. Since we are interested in the sample semicovariance matrix, we always employ the SSV estimator with \(\theta =1e^{-5}\). While Rigamonti et al. (2021) define the SSV estimator for the minimum semivariance portfolio, we can use it to estimate also the semicovariance matrix of a mean-semivariance portfolio simply by computing the mean-semivariance weights in the re-weighting scheme.

2.3 Optimal portfolios computation and evaluation

As we argued, using the criterion introduced by Bai and Ng (2002) to select a value for d as done in Chen and Yuan (2016) does not seem appropriate. But if we transform \({\hat{\Sigma }}_{B}\) into a downside correlation matrix we can use MAP (constrained by the upper bound set by the Kaiser-Guttman rule) to select an appropriate d and apply Eq. (7) or (10). The downside correlation matrix \( {\hat{P}}_{B} \) can be computed in a way analogous to how the correlation matrix is obtained from the covariance matrix:

$$\begin{aligned} {\hat{P}}_{B}={\hat{D}}^{-1/2}{\hat{\Sigma }}_{B}{\hat{D}}^{-1/2} , \end{aligned}$$
(14)

where \( {\hat{D}}^{-1/2} \) is a diagonal matrix in which every ith diagonal element is equal to the inverse of the estimated downside deviation \( {\hat{\sigma }}_{B,i} \) of the asset i.

We now have all the necessary inputs to apply our strategy that mitigates the impact of estimation errors in the semicovariance matrix. All the closed-form solutions available for the mean-variance and minimum-variance problems can be applied exactly as they are when using the SSV estimate of the semicovariance matrix. All we have to do is to replace the input parameter, and we obtain the solution to mean-semivariance and minimum-semivariance problems. This is a big advantage over other notable approaches, like those devised by Harlow (1991), Markowitz et al. (1993) and de Athayde (2001), later improved by de Athayde (2003) and recently revisited by Ben Salah et al. (2018). The plug-in mean-semivariance and minimum-semivariance weights are therefore

$$\begin{aligned} {\hat{w}}_{ms} = \frac{Re}{\mu '{\hat{\Sigma }}^{-1}_{B}\mu }{\hat{\Sigma }}^{-1}_{B}\mu \end{aligned}$$
(15)

and

$$\begin{aligned} {\hat{w}}_{s} = \frac{1}{\mathbf{1}'{\hat{\Sigma }}^{-1}_{B}{} \mathbf{1}}{\hat{\Sigma }}^{-1}_{B}{} \mathbf{1} , \end{aligned}$$
(16)

respectively. The subspace mean-semivariance portfolio is computed using

$$\begin{aligned} {\hat{w}}^{*}_{ms,cor} = \frac{Re}{{\hat{\mu }}'^{*}_{B}\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }}^{*}_{B}}\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}{\hat{\mu }}^{*}_{B} , \end{aligned}$$
(17)

where \( {\hat{\lambda }}_{k} \) and \({\hat{\nu }}_{k}\) are an eigenvalue and the corresponding eigenvector of the estimated downside correlation matrix \({\hat{P}}_{B}\), and the elements of \( {\hat{\mu }}^{*}_{B} \) correspond to \({\hat{\mu }}_{i}/{\hat{\sigma }}_{B,i}\) for each asset i. Thus, the asset’s weight is \( {\hat{w}}_{ms,cor,i} = {\hat{w}}^{*}_{ms,cor,i}/{\hat{\sigma }}_{B,i} \), in which \( {\hat{\sigma }}_{B,i} \) is the downside deviation of the asset i. Analogously, the subspace minimum-semivariance portfolio is given by

$$\begin{aligned} {\hat{w}}^{*}_{s,cor} = \frac{1}{\epsilon _{B}'\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}\epsilon _{B}}\sum _{k=1}^d {\hat{\lambda }}^{-1}_{k}{\hat{\nu }}_{k}{\hat{\nu }}'_{k}\epsilon _{B} , \end{aligned}$$
(18)

where for each asset i \( \epsilon _{B} \) corresponds to \(1/{\hat{\sigma }}_{B,i}\), and the asset’s weight can then be obtained as \( {\hat{w}}_{s,cor,i} = {\hat{w}}^{*}_{s,cor,i}/{\hat{\sigma }}_{B,i} \).

Since the Sharpe ratio measures risk across the full distribution of returns (by the standard deviation), it is not very suitable to evaluate the out-of-sample results in the framework of our research question. Instead, we use as main performance indicator the Sortino ratio (Sortino and Prince , 1994):

$$\begin{aligned} {{\,\mathrm{\mathrm{Sortino}}\,}}=\frac{R-B}{\sigma _{B}} , \end{aligned}$$
(19)

where R and \( \sigma _{B} \) are respectively the mean and downside deviation of the portfolio returns.

3 Principal component analysis on the semicovariance matrix

In this section we discuss the interpretation and implications of performing PCA on the semicovariance matrix. If the asset returns follow a symmetric distribution, focusing on the variance or on the semivariance makes no difference, and the investor should focus on the former, which is easier to estimate. However, in the presence of skewed return distributions the semivariance is a more suitable measure of risk, at least theoretically (see Rigamonti, 2020, for a more thorough discussion). The same reasoning holds when we perform PCA. When PCA is performed on the covariance matrix, the principal components explain the variance in the data. If instead we perform PCA on the semicovariance matrix, the principal components explain the semivariance in the data, that is, the variability only below a certain threshold.

To illustrate this point, we generate 10 series of 500000 returns using the moment-matching algorithm devised by Høyland et al. (2003), first with skewness equal to 0 for all the assets, and then equal to 1.3. The mean, covariance matrix and kurtosis are set in both cases equal to the respective sample moments of the returns of the Fama-French 10 industry portfolios from July 1969 to June 2021. Figure 1a and b show the loading plots obtained by performing PCA on the covariance and semicovariance matrix (with \( B=0 \)) estimated on the series with symmetrically distributed returns. As we can see, the two graphs look very similar, with only minor differences. As a result, analyzing the variability or the downside variability in the data is theoretically equivalent.

Let us now focus on Figure 2a and b. They show the loading plots obtained when performing PCA on the covariance and semicovariance matrix of the positively skewed returns. The two plots look completely different from each other, as the loadings in the first two loading vectors change significantly. This means that the information captured by the principal components is very different, which in turn shows that the information contained in the covariance and semicovariance matrix is different due to the skewness in the data. Moreover, Figs. 1a and 2a are extremely similar (as the covariance matrix cannot appreciate the different skewness), while Figs. 1b and 2b look entirely different.

Fig. 1
figure 1

Loading plot with PCA performed on a) the covariance matrix (left) and b) the semicovariance matrix (right). For all variables skew = 0. The estimation is based on the simulated data in which the mean, covariance matrix and kurtosis are set to the sample moments of the returns of the Fama-French 10 industry portfolios from July 1969 to June 2021

Fig. 2
figure 2

Loading plot with PCA performed on a) the covariance matrix (left) and b) the semicovariance matrix (right). For all variables skew = 1.3. The estimation is based on the simulated data in which the mean, covariance matrix and kurtosis are set to the sample moments of the returns of the Fama-French 10 industry portfolios from July 1969 to June 2021

4 Empirical application

We now provide an out-of-sample empirical application of our strategy. Firstly, we want to check that using MAP improves over using the Bai and Ng (2002) criterion when applied to the semicovariance matrix as if we were working with the covariance matrix like in Chen and Yuan (2016). We then compare our strategy to various competing optimal portfolios to see if we can actually improve the effectiveness of downside risk optimization. We always work with excess returns and set \( B = 0 \), so that we can have the benchmark equal to the risk-free rate, which has some desirable theoretical properties (see Hoechner et al., 2017). The covariance matrix is always estimated using the Ledoit and Wolf (2004) shrinkage estimator.

4.1 MAP vs Bai and Ng (2002) with downside risk

In order to compare the accuracy of MAP and the Bai and Ng (2002) criterion in selecting the optimal number of factors for the downside correlation and semicovariance matrix respectively, we compute the subspace minimum semivariance portfolio with both methods.Footnote 5 We also consider the unrestricted minimum semivariance portfolio to check how much we improve over it. We consider two different datasets of monthly returns: the 30 and 48 industry portfolios (value weighted) from the Kenneth R. French Data Library spanning from July 1969 (previous dates are not considered because of missing data in the 48 industry portfolios) to June 2021. We use a 180-month rolling window, so we obtain 444 out-of-sample returns.

The downside deviation achieved by the three strategies is reported in Table 1. With the 30 industry portfolios, using the Bai and Ng (2002) criterion does not lead to any improvement over the unrestricted portfolio, while our strategy that uses MAP achieves a lower downside deviation. With the 48 industry portfolios the Bai and Ng (2002) criterion improves over the unrestricted portfolio, but much less than our strategy. Overall, as expected, these results show that MAP is better suited in selecting the optimal number of factors for the subspace minimum semivariance portfolio compared to the Bai and Ng (2002) criterion. Investors who dislike negative deviations and employ MAP achieve a superior performance.

Table 1 Downside deviation based on a 180-month rolling window

4.2 Comparison between portfolio optimization strategies

We now look at how our strategy performs compared to alternative portfolio optimization strategies. In addition to the downside deviation and the Sortino ratio, we also consider two other popular downside risk measures: the Conditional Value at Risk (CVaR) and the Omega-Sharpe ratio. The CVaR is based on the Value at Risk (VaR), to which is considered preferable, as unlike the VaR it is a coherent risk measure (see e.g. Sarykalin et al., 2008; Yamai & Yoshiba, 2002). The Omega-Sharpe ratio is a simple modification of the classical Omega ratio, which leads to the same portfolio ranking but has a more familiar form (Bacon , 2008). Moreover, we look at the mean excess return and the Sharpe ratio. Finally, we compute the turnover rate at each period t as in DeMiguel et al. (2009):

$$\begin{aligned} Turnover_{t} = \sum _{i=1}^n|w_{i,t}-w_{i,t-1}| , \end{aligned}$$
(20)

where i indicates a risky asset. We then report the average rate computed between all the periods (indicated as “TO” in the tables). An average turnover of 1 means that at each period, on average, an amount of assets equal to 100% of the value of the portfolio has to be traded.

We aim to determine which strategy works best for a real investor. Therefore, it is important to work with individual assets in order to preserve the coskewness structure, that is even more important than the idiosyncratic skewness (Albuquerque , 2012). As long series of returns are not available for a very large number of individual stocks, we work with weekly data instead of monthly data, that allow for more out-of-sample returns within a given time frame. We obtain our data from the Alpha Vantage API, which provides historical adjusted prices of single stocks starting from November 1999.Footnote 6 The risk-free rates were downloaded from the Kenneth R. French Data Library.Footnote 7 We consider three different datasets: the first includes 28 of the 30 companies that make up the Dow Jones Industrial Average, the second consists of 66 stocks included in the NASDAQ 100, and the third includes 82 components of the S &P 100.Footnote 8 All series span from November 19, 1999, to June 25, 2021 (the last period for which the risk-free rate is available on the Kenneth R. French Data Library), for a total of 1128 observations. We use a 260-period rolling window for the estimation, obtaining 868 out-of-sample returns. To check the robustness of our results, we also consider an expanding window that starts with 260 periods, obtaining once again 868 out-of-sample returns.

We first compute the subspace mean-semivariance portfolio and compare it with the mean-variance and mean-semivariance portfolios, and with the mean-CVaR portfolio with the confidence interval set to 95%. To compute the mean-CVaR portfolio, we rely on the historical return distribution and use the formulation proposed by Rockafellar and Uryasev (2000). We set the target return at 0.1%. The results are shown in Table 2. While our strategy leads to a higher downside deviation compared to the competing strategies, it achieves a much higher mean return and therefore it always significantly outperforms the other strategies in terms of Sortino ratio, which is the measure that we are targeting. Moreover, we also achieve a much higher Sharpe and Omega-Sharpe ratio. The results obtained with the expanding window, shown in Table 3, are completely in line with those obtained with the rolling window, confirming the superiority of our approach. Investors and hedgers who dislike negative deviations in the returns but like upside deviations (and for whom the Sortino ratio is therefore the crucial performance measure) achieve the best results with our strategy.

Table 2 Performance comparison using the 260-week rolling window
Table 3 Performance comparison using the 260-week expanding window

We now consider the subspace minimum semivariance portfolio, whose competitors are the minimum variance and minimum semivariance portfolios, both with and without short selling, and the long-only minimum CVaR portfolio. The motivation for computing the minimum semivariance portfolio when the goal is to achieve a high Sortino ratio is the same for computing the minimum variance portfolio when the goal is to achieve a high Sharpe ratio: the estimation error in the sample mean is so large that ignoring the mean altogether improves the results (Jagannathan and Ma , 2003). The additional constraint of excluding short selling can dramatically improve the out-of-sample performance when minimizing risk, as shown by Jagannathan and Ma (2003). Kremer et al. (2018) analyze the performance of several investment strategies and they find that the long-only minimum variance is the best risk minimization strategy. On the same grounds we impose the long-only constraint on the minimum CVaR portfolio, as targeting the CVaR without imposing additional constraints is known to perform poorly (Lim et al. 2011).

The results are reported in Table 4. Our strategy leads to a modest improvement over the unrestricted minimum semivariance portfolio with the Dow Jones Industrial Average, and to a large improvement with the other two datasets. Unsurprisingly, reducing the dimension of the problem is especially beneficial with large asset menus. We also improve in terms of the other two downside risk-based measures and achieve a higher Sharpe ratio, except, for the latter measure, with the Dow Jones Industrial Average. However, compared to the other four strategies, we only achieve a good Sortino and Omega-Sharpe ratio with the NASDAQ 100 dataset, while otherwise we always perform equally or worse in terms of every measure. Table 5 shows the results obtained with the expanding window. Our strategy now leads to a portfolio with a worse downside deviation and CVaR even compared to the unrestricted minimum semivariance portfolio with the two smaller datasets, but we achieve the best results in terms of both measures with the S &P 100. We always achieve a higher Sharpe, Omega-Sharpe and Sortino ratio than the unrestricted minimum variance and minimum semivariance portfolios, but we are not competitive compared to the other three strategies.

Overall, our strategy is very competitive in a setting where the investor with a mean-semivariance utility sets a target return. However, when \(\mu \) is ignored and the investor only minimizes risk, the simple solution of excluding short positions appears to be superior in both cases where the variance and the semivariance are minimized.

Table 4 Performance comparison using the 260-week rolling window
Table 5 Performance comparison using the 260-week expanding window

We conclude our analysis by computing the dynamics of the value of the different portfolios. Given that our strategy has a higher average turnover rate compared to most of the competitor strategies we considered, we compute the dynamics of wealth net of transaction costs. Following Frazzini et al. (2012), we set transactions costs equal to 10 basis points. Figure 3 shows the results for an investor that sets a target return. Our strategy greatly outperforms all the alternatives, especially when using an expanding estimation window. Figure 4 reports the value of the portfolios when risk is minimized without considering the mean return. Our strategy almost always significantly outperform the unrestricted minimum variance and minimum semivariance portfolios. However, as expected from the results in Tables 4 and 5, we generally perform similarly or worse than the strategies that impose a long-only constraint.

5 Conclusions

We propose a novel portfolio strategy for investors and portfolio managers who are in particular sensitive to downside deviations of asset returns. Efficient asset allocations based on downside risk are hard to achieve because it is especially difficult to estimate the necessary inputs. In this paper we propose for the first time a strategy that directly addresses this problem when estimating the semicovariance matrix of a set of assets. In doing so, we provide two additional original contributions by performing PCA on the semicovariance matrix and on the downside correlation matrix, and by using MAP to select the optimal number of components for asset allocation decisions. This idea is consistent with the theoretical asset pricing models such as the capital asset pricing model and the arbitrage pricing theory in which asset returns are driven by systematic risks captured by factors.

The application of our strategy to a variety of datasets shows that it can considerably improve the out-of-sample performance over portfolios constructed using the unrestricted semicovariance matrix. Our suggested criterion, the MAP, greatly increases the Sortino ratio of the resulting portfolios. Overall, our strategy is very competitive in a setting where the investor sets a target return and performs mean-(semi)variance optimization. Given its relatively high turnover compared to most alternatives considered in this study, one direction for future research could be the refinement of our strategy in order to reduce its transaction costs. This would further increase its attractiveness for the investors, hedgers and portfolio managers, in particular those who dislike negative deviations of asset returns.