1 Introduction

Modern portfolio theory (MPT) emerged from the groundbreaking work of Markowitz (1952) through the introduction of the mean–variance (MV) model for asset allocation. The classic MV model primarily focuses on two criteria, i.e. return and risk. However, the intrinsic multi-dimensionality of the portfolio selection problem has been emphasized by many scholars. Empirical evidence shows that incorporating more than two factors in order to choose the best financial portfolio mitigates reliance on any single measure that might have flaws associated with it (Almeida-Filho et al., 2021; Colapinto et al., 2019; Doumpos & Zopounidis, 2014; Kim et al., 2022; Rahiminezhad Galankashi et al., 2020; Spronk et al., 2016; Steuer et al., 2007, 2008; Tamiz & Azmi, 2019; Tamiz et al., 2013; **donas et al., 2012). Furthermore, contemplating factors beyond risk and return is not an unsound move in a financial portfolio selection problem. It may facilitate a reduction in management distraction, as well as resulting in possible improvements to other favorable attributes (Steuer et al., 2007). Consequently, diversified criteria for financial portfolio selection and management align with different investment strategies which primarily rely on investor preferences. In this regard, portfolio optimization models that include more attributes in the selection of assets more realistically represent investors’ aspirations. Some of these attributes may include profitability, liquidity, systematic and non-systematic risks, financial ratios, market ratios, non-financial attributes (e.g. ethical, environmental, and social issues), skewness, and kurtosis, to name but a few (Aouni et al., 2018; Ballestero et al., 2012; Boubaker et al., 2023; Colapinto et al., 2019; Doumpos & Zopounidis, 2014; Peykani et al., 2020; Rahiminezhad Galankashi et al., 2020; Steuer et al., 2007; Wu et al., 2022; Yu & Lee, 2011).

In the context of financial portfolio selection, fundamental analysis (FA) is a method for evaluating a company's investment potential. FA involves a thorough examination of a company’s financial statements in order to assess its investment-worthiness, while technical analysis (TA) relies on historical trajectories to predict future stock values (Edirisinghe & Zhang, 2007, 2008). Prior research supports the integration of FA and TA, indicating that these two techniques could be complements, rather than substitutes (Bettman et al., 2009; Contreras et al., 2012; Kuo et al., 2021; Namdari & Li, 2018). Actually, the synergy resulting from simultaneously employing these two techniques has the potential to enhance the predicting power of firms’ future financial performance, thus leading to the selection of superior portfolios. In the domain of financial portfolio selection, the risk and return attributes of the investment are considered as technical factors (Kuo et al., 2021; Tamiz & Azmi, 2019). On the other hand, there are various ways to implement FA, one of which is Data Envelopment Analysis (DEA) (Abad et al., 2004; Edirisinghe & Zhang, 2007, 2008; Lim et al., 2014). DEA is a data-enabled performance evaluation technique that evaluates the relative efficiency of decision-making units (DMUs) considering several inputs and outputs (Emrouznejad & Yang, 2018; Zhu, 2022) and can be leveraged in the process of financial portfolio construction to measure assets’ efficiencies, thereby specifying the best assets for investment (Edirisinghe & Zhang, 2007, 2008; Lim et al., 2014; Peykani et al., 2020, 2022). Consequently, DEA approaches can be used as a means of FA in the portfolio selection context in order to fundamentally assess a firm’s investment worthiness based on comparison to the market as a whole, rather than evaluating it in isolation, thus contributing to the identification of financially healthy firms.

Multi-criteria decision making (MCDM) approaches have the potential to enable the consideration of various factors in the portfolio selection problem, where asset screening can be combined with portfolio construction within an integrated framework (Zopounidis et al., 2015). The multi-objective programming (MOP) technique of Goal programming (GP) is a prominent and powerful MCDM approach that has been applied in various areas of financial decision-making (Andriosopoulos et al., 2019). GP reaches solutions in line with the decision maker’s expressed goals and preferences in optimization problems with multiple objectives. Indeed, the GP model is based on distance functions where the undesirable positive and negative deviations between the achievement and aspiration levels of the objectives are to be minimized (Jones & Tamiz, 2010). GP is an instrumental tool to analyze portfolio selection problems and achieves reasonable solutions concerning the inclusion of the decision makers’ preferences (Aouni et al., 2014; Colapinto et al., 2019). Using GP, financial decision makers (FDM) are enabled to concurrently consider various aspirational financial and non-financial factors as distinct goals, along with exclusive preferences for each of the factors in a multi-objective optimization framework. In this manner, they are able to benefit from the capabilities of this mathematical framework, including sensitivity analysis, to effectively achieve their desired portfolios. Portfolio selection problems centered only on risk and return optimization can be described as a GP model comprising of two principal goals (Tamiz & Azmi, 2019). Nonetheless, as previously mentioned, extra objectives illustrating other factors and attributes can readily be incorporated into the versatile GP model.

Another prominent issue that should be considered in any proposed approach for portfolio selection is the treatment of parameter uncertainty. The underlying uncertainty of financial markets is one of their most essential characteristics (Kim et al., 2022; Sadjadi et al., 2012). Investment decisions should be made incorporating adequate hedging against uncertainty, otherwise the resultant portfolios would not be reliable and practical for real-world applications. However, many studies assume the input data as certain. In such cases, perturbation in data values can lead to huge variances in the results to the extent that, in the worst-case, infeasible solutions may be produced. Consequently, it is necessary to develop models which meaningfully account for uncertainty while also accommodating the manifold preferences of financial decision makers.

In the optimization literature, stochastic programming (SP) and robust optimization (RO) are two of the widely-adopted techniques utilized to cope with the uncertainty. Both SP and RO seek to address the same question of building an uncertainty–immunized solution to an optimization problem with input data uncertainty. When the uncertain data are of a stochastic nature, the quality of SP-based decisions are associated with the type of underlying probability distribution. However, accurately ascertaining the underlying probability distribution is usually demanding and often necessitates a massive number of observations, rendering the SP model computationally intractable (Ben-Tal et al., 2009; Bertsimas & Sim, 2003; Ghahtarani et al., 2022; Hanks et al., 2017). Considering the aforementioned drawbacks associated with SP, RO, on the other hand, utilizes uncertainty sets instead of probability distributions to account for uncertain data and does not cause intractability. It can therefore be used as an alternative approach to address the uncertainty. RO can result in a solution that is ensured to be robust and feasible for almost all possible realizations of the uncertain parameters (Ben-Tal et al., 2009; Bertsimas & Sim, 2003). Furthermore, this method enables decision makers to make a trade-off between system reliability and economic performance. Although SP is a dynamic approach and appears to be less conservative than the worst-case oriented and static RO, it requires a strong set of assumptions to hold, as mentioned, whereas RO is more pragmatic and hence, in the context of this paper, fits better with the pragmatic ethos of the goal programming framework. Therefore, the combination of GP and RO can result in pragmatic, flexible, and yet straightforward decision support tools for addressing multi-objective optimization problems in uncertain environments.

From the perspective of the GP methodology applied to portfolio selection problems, in this paper we follow the GP framework proposed by Tamiz et al. (2013) and Tamiz and Azmi (2019), though utilizing different factors rooted in FA, besides the conventional technical ones of risk and return, and different GP variants for the stock portfolio selection problem. Also, we extend the GP models to account for the underlying uncertainties prevalent in the financial market. A noteworthy objective in this study is to contribute to the expansion of the robust GP (RGP) literature by introducing new RGP variants that address the limitations of traditional and other uncertain GP paradigms through the incorporation of uncertainty sets. We utilize the GP problem developed in this study as a practical application case for our introduced RGP variants. In light of these advancements, we propose an innovative two-phase approach for portfolio selection, leveraging the DEA and GP techniques. The first phase involves an asset screening procedure, utilizing a super-efficiency DEA model to measure the efficiency of registered assets in the stock exchange market. This step identifies potential investment candidates based on investor-specific filtering criteria for portfolio optimization in the subsequent phase. The amount invested in each qualified asset is decided in the second phase where the portfolio is created. In this phase, a multi-factor extended goal programming (EGP) model is developed, allowing for investor-specific preferences to be considered. In this vein, each objective function represents a specific asset-related factor. Moreover, uncertainty is incorporated into the portfolio optimization phase by extending the deterministic EGP models to their robust counterparts in order to ensure model stability in the face of imprecise data. To this end, two RO approaches via a polyhedral uncertainty set (Li et al., 2011) and a combined interval and polyhedral uncertainty set (Bertsimas & Sim, 2004) are adopted and compared with each other for the following reasons: (a) RO via polyhedral uncertainty set guarantees maximum robustness against uncertainty while incurring a high cost, which entices risk averse decision makers; (b) RO via combined interval and polyhedral uncertainty set reaches a compromise between robustness and its cost, appealing to risk-seeking decision makers; (c) RO via the above-mentioned uncertainty sets doesn’t contain the over-conservatism present in classic worst-case robust models, such as Soyster’s (1973) approach, wherein optimality is excessively sacrificed for feasibility. Instead, in the resulting robust counterparts via the aforesaid uncertainty sets, the decision maker has complete control over the degree of conservatism for each constraint with uncertain parameters; and (d) RO via the aforementioned uncertainty sets retains the linearity of the nominal problem, in contrast to certain RO approaches (e.g. Ben-Tal & Nemirovski, 2000) which convert a linear programming problem into a nonlinear one. Using our approach, financial decision makers are able to select suitable portfolios by utilizing computationally tractable models that accommodate their specific preferences and desired levels of conservatism. Our proposed approach is validated using real data from the Tehran stock exchange (TSE) and the resulting portfolios are compared with one another, as well as with portfolios obtained from traditional benchmark models for portfolio selection, and conclusions are drawn. Figure 1 presents a schematic framework of all the steps of the proposed two-phase approach.

Fig. 1
figure 1

The methodology of the proposed two-phase approach for the multiple criteria portfolio selection with uncertain factors

The remainder of this paper is organized as follows: In Sect. 2, we review the related literature on applications of DEA and GP models in financial portfolio selection followed by the robust goal programming literature and conclude by stating the research gaps and contributions. Section 3 starts with the problem description and continues with the description of the methodologies used in this paper, i.e. the super-efficiency DEA model, the extended factors, the EGP variant, the RO approach, and the general Robust EGP framework. The section continues with the proposed portfolio selection models by which the deterministic model is firstly introduced, and then the resulting robust counterpart GP models via the aforementioned uncertainty sets, are presented. Explanatory details about the data of the case study and various scenarios designed for the experimentations along with sensitivity analysis, results, and discussions are provided in Sect. 4. Finally, in Sect. 5 we conclude the paper and discuss directions for future research.

2 Literature review

This paper is principally related to three streams of literature: applications of DEA in financial portfolio selection, GP applied to financial portfolio selection, and robust GP. In this section, we first discuss relevant research that applies the DEA technique to the context of financial portfolio selection. Then, we describe the related literature on GP applications in financial portfolio selection. An overview of previous studies on robust GP (both methodological and applicational) is presented in subSect. 2.2. Finally, we discuss the literature gaps and research contributions.Footnote 1

2.1 Contextual background

2.1.1 Applications of DEA in financial portfolio selection

In the portfolio selection context, there are typically two main aspects to consider: choosing which stocks to include and deciding how much to invest in each. However, many studies on portfolio selection tend to focus solely on the latter, treating it as a pure optimization problem while overlooking the importance of selecting the right assets. Traditionally, it's believed that spreading investments across a wide range of assets helps reduce risk. But in real-world investing, if a portfolio becomes too diversified, investors might end up paying higher transaction costs, and the investment strategy may not be as effective as anticipated. To address this issue, it is advisable for investors to concentrate their investments in a select number of investment-worthy stocks (Zhou et al., 2010), for which the result is illustrated in Fig. 2.

Figure 2 shows the density visualization of the co-occurrence of all keywords in GP-related research in financial portfolio selection. The experiment to obtain this result involves 134 records from the Web of Science Core Collection with no time period limit, and the input keywords used were “Goal Programming” and “Portfolio Selection/Management”. With a threshold of at least 3 occurrences in all works, out of the 656 initial keywords, 59 were selected for the bibliometric analysis. According to Fig. 2, dense areas shown in red demonstrate the top keywords that occurred most frequently, while green areas represent keywords with lower research density in this field. For example, “uncertainty” falls into this group, indicating a lower number of works that considered uncertainty in modeling. Also, among the uncertainty modeling approaches, “fuzzy goal programming” and “stochastic goal programming” had the most occurrences, respectively. Interested readers can refer to Aouni et al. (2014) and Colapinto et al. (2019) for surveys on stochastic and fuzzy GP models in portfolio optimization problems. Another interesting point that can be deduced from the map is that a clear separation exists between the keywords like “Analytic Network Process” or “ANP” and “Goal Programming,” “Portfolio Selection,” “Polynomial Goal Programming,” and “Fuzzy Goal Programming,” which could be addressed in future research. In what follows, we will examine some relevant GP applications in financial portfolio selection and management in more detail.

Fig. 2
figure 2

Density visualization of all the keywords with a threshold of minimum 3 occurrences

Bravo et al. (2010) developed a mean value-stochastic GP model for selecting buy-and-hold efficient portfolios of funds, where they considered multiple benchmarks for returns as a more realistic approach to portfolio analysis. Stoyan and Kwon (2011) formulated a mixed-integer stochastic GP model for an integrated stock and bond portfolio problem, taking into account uncertainty in asset prices and several major trading constraints. Bilbao-Terol et al. (2012) proposed Weighted and MinMax GP models for selecting portfolios with socially responsible investment (SRI)-funds, where they measured the socially responsible performance of financial products using an index built through the application of fuzzy set theory techniques. Tamiz et al. (2013) applied three goal programming variants—Weighted, Lexicographic, and MinMax GP—to develop portfolio selection models for international mutual funds, which enable decision-makers to incorporate their preferred factors and ideal aspiration levels into the GP model for acquiring their intended portfolio. The authors selected seven factors from three groups: ‘mutual funds specific factors,’ ‘macroeconomics factors,’ and ‘factors for regional preferences.’ Each factor was treated as an objective in their GP models. Using data related to 20 mutual funds of equities from 10 different countries, they compared the resulting portfolios with different combinations of weights, priority levels, and target values in the GP models against each other in terms of return, risk, and the number of mutual funds selected, and demonstrated and discussed the applicability of their approach. Messaoudi et al. (2017) developed a fuzzy chance-constrained GP model to solve a financial portfolio selection problem with three attributes. In their approach, stochastic uncertainty pertains to the independent chance-constrained objectives, and the financial decision maker’s preferences were considered as fuzzy values. De et al. (2018) proposed a fuzzy GP model using Werner’s ‘fuzzy and’ hybrid operator for a portfolio selection problem with three criteria: risk, return, and liquidity. They assumed return and liquidity as fuzzy values and described them with triangular and trapezoidal membership functions. Tamiz and Azmi (2019) utilized five FA-based factors, in addition to risk and return, for stock portfolio selection, with each factor represented as an objective in a weighted goal programming (WGP) model. Various WGP models with different combinations of target values and weights were developed. They applied their models to 30 stocks from the Dow Jones Industrial Average index, and the resulting portfolios were compared against each other, as well as against well-known benchmark models for portfolio selection from the literature. The results obtained supported the use of factors utilized besides risk and return, referred to as 'extended factors,' for addressing portfolio selection problems. Mansour et al. (2019) formulated a GP model for the financial portfolio selection problem with three criteria: risk, return, and liquidity. In this model, fuzzy returns were assumed, and investor preferences were incorporated using the concept of satisfaction functions. Deng and Yuan (2021) introduced a GP model based on fuzzy dominance for a portfolio selection problem involving fuzzy returns while simultaneously considering systematic and non-systematic risks. More recently, Bravo et al. (2022) proposed a GP approach that differs from the state-of-the-art uncertain GP models. In this approach, the variability of parameters resulting from randomness is addressed by replacing the traditional WGP achievement function with a new function that takes into account the decision maker’s perception of randomness through the use of a penalty term. They applied their approach to a mean absolute deviation (MAD) portfolio selection problem and claimed that their method can effectively address the challenges arising from the lack of statistical information about random events.

2.2 Robust goal programming

While GP is extensively applied in various fields, its traditional usage has relied on deterministic values for model parameters, which is not always the case (Hanks et al., 2017). Robust goal programming (RGP) is a relatively new sub-discipline of optimization that integrates uncertainty modeling via the RO theory with traditionally deterministic GP to tackle optimization problems with multiple objectives as well as uncertain parameters.

RGP was first introduced in Kuchta’s (2004) work in which parametric uncertainty is addressed through a combination of cardinality-constrained robustness and interval-based uncertainty sets within the GP technique. In this approach, the author assumed that uncertain parameters are limited to the cost coefficients in the original linear objective functions. The proposed approach allows decision-makers to find solutions for various degrees of uncertainty by changing one parameter per goal, where the worst optimal total deviation from the goals is presented. Closer to our work regarding the application context, Ghahtarani and Najafi (2013) applied Kuchta’s (2004) methodology to develop a robust goal programming model for a multi-objective portfolio selection problem. Their study is the only one to date that applied the RGP methodology to portfolio selection problems modeled as a GP. The authors applied Kuchta’s (2004) methodology to create the robust counterpart of Lee and Chesser’s (1980) portfolio selection GP model. According to their results, as the uncertainty budget increased, solution conservatism also rose. Ghasemi Bojd and Koosha (2018) proposed a robust goal programming model for the capital budgeting problem. Their model accounts for uncertainty in cash flows and addresses multiple goals. Broadening the RGP literature, Hanks et al. (2017) extended Kuchta’s (2004) work and introduced RGP models considering cardinality-constrained robustness using norm-based uncertainty sets, such as L1 and L2 norms, as well as strict robustness via ellipsoidal uncertainty sets. They compared the performance of their proposed models with that of Kuchta’s (2004) using computational examples. The results demonstrated that cardinality-constrained robustness via L2-norm uncertainty sets outperformed cardinality-constrained robustness using interval-based uncertainty sets. However, the computational tractability of L2-norm-based uncertainty sets still remains challenging. Leveraging the findings of the study of Hanks et al. (2017), Hanks et al. (2020) employed RGP with cardinality-constrained robustness via L2-norm uncertainty sets to model a specific problem related to setting transportation ship** rates in a United States ship** line. Mensah and Rocca (2019), taking another step to theoretically contribute to the RGP foundation, proposed light robust goal programming models using budget of uncertainty and ellipsoidal uncertainty sets and compared their results with previous RGP models. They demonstrated that the total goal deviations of the decision-maker depend heavily on the robust quality threshold set as the trade-off between the optimality and the feasibility of the robust solution in the light RGP framework, rather than the specific uncertainty set used. Further applications of RGP can be seen in Wang and Li (2019) and Hendalianpur et al. (2019), focusing on supplier selection and order allocation in supply chain management, as well as in Repetto et al. (2019) for controlling the transfer pricing risk of a multi-national firm. A more recent application of RGP emerged in Cheng et al. (2021), where the authors developed an RGP model to optimize food distribution decisions in humanitarian relief logistics, addressing supply and capacity uncertainties. They developed their RGP model using a refined uncertainty set, defined by the intersection of the box and generalized budget.

2.3 Research gaps and contributions

DEA and GP techniques have been independently applied to various aspects of portfolio selection and optimization in numerous studies. DEA methods have demonstrated their utility within portfolio management, serving purposes such as stock ranking in portfolio selection, estimating portfolio efficiency, and assessing market efficiency (Kuo et al., 2021; Tarnaud & Leleu, 2018). Similarly, GP has proven to be a pragmatic and versatile modeling and solution approach for addressing multi-criteria portfolio optimization problems. It enables FDMs to concurrently consider multiple incommensurable, often conflicting financial and non-financial factors as distinct goals. These factors can include unique preferences and target values, allowing FDMs to tailor their portfolios accordingly (Aouni et al., 2014; Colapinto et al., 2019). However, despite its hybridization potential (Zopounidis et al., 2015), GP has seen relatively limited integration with other approaches in the portfolio management context. Notably, Fig. 2 does not categorize DEA, indicating a scarcity of studies exploring the combination of DEA and GP in the portfolio selection context. To address this gap, we propose an integrated approach that harnesses the power of DEA and GP for portfolio selection and optimization. Furthermore, we introduce a comprehensive stock performance evaluation framework that employs the super-efficiency DEA methodology as a means of FA to identify investment-worthy stocks. In addition, selecting the appropriate GP variant is a crucial aspect of implementing the GP methodology. In our work, we utilize the EGP variant for the first time in the context of portfolio optimization to our best knowledge, as it offers superior flexibility compared to other commonly used variants. Our research explores the effectiveness of this choice in optimizing portfolios. Moreover, this study introduces another novel aspect by incorporating a set of FA-based financial factors, in addition to the TA-based factors of return and risk, as goals within the EGP models, following the concept referred to as ‘extended factors’ introduced by Tamiz et al. (2013). This inclusion serves to integrate and provide a more comprehensive analysis of the role of the underlying business strength of firms during the portfolio optimization phase, all viewed through the lens of multi-objective optimization. By encompassing a range of FA-based factors, the study seeks to enhance the predictive capacity of firms' future financial performance, thereby contributing to a more nuanced and effective portfolio selection process, and allowing for the identification of portfolios with greater potential for superior performance. The developed GP framework readily accommodates additional goals representing other financial or non-financial factors, providing a versatile platform for personalized portfolio optimization.

Additionally, while uncertainty within the portfolio management context is an inherent issue which needs to be accounted for, Fig. 2 as well as the relevant GP literature reveal a paucity of studies that model portfolio problems taking into account underlying uncertainties. Also, among the non-deterministic GP approaches applied to portfolio optimization, fuzzy GP and stochastic GP models have been the most frequently explored. These uncertain GP approaches have primarily focused on goal target values to accommodate uncertain conditions, which may not always align with practical GP applications, such as portfolio optimization where the primary sources of uncertainty pertain to goal function coefficients, such as return rates. To address this limitation and overcome other challenges discussed in Sect. 1, we adopt the more versatile and pragmatic RO technique to manage uncertainties within the GP model during the portfolio optimization phase. This approach also tackles the rarity of applications of RGP models in portfolio optimization. Building upon the comprehensive review of RGP studies in Sect. 2.2 and according to Table 1, it is noteworthy that the majority of RGP models have been developed using WGP variant and the approach proposed by Bertsimas and Sim (2004) (i.e. the combined interval and polyhedral uncertainty set). Therefore, we introduce the robust counterpart of the EGP model, termed the Robust EGP (REGP) variant. Given the reasons discussed in Sect. 1, we develop the REGP models employing both pure polyhedral and combined interval and polyhedral uncertainty sets. We then compare and validate their performance using real-world data. In doing so, we aim to extend and promote the topic of RGP in line with the work of Hanks et al. (2017). RGP is an under-developed topic which deserves more attention, as it has the potential to accommodate uncertainties not only in goal function coefficients and target values but also in system constraints—a critical feature for achieving complete robustness in problems where both constraint parameters and goals are imprecisely defined. Overall, our proposed approach aims to equip FDMs with computationally tractable, adaptable, and pragmatic decision support tools for portfolio selection and optimization. The proposed approach enables FDMs to make investment decisions based on their financial preferences, including desired financial factors, preferential weights, and goal target values, while also accommodating varying levels of conservatism.

Table 1 An overview of RGP studies

3 Methodology

3.1 Problem description

This paper proposes a multiple-criteria financial portfolio selection framework with the primary aim of maximizing the expected return of the investment as well as minimizing its risk. With the aim of incorporating the underlying business strength of firms into the portfolio selection and optimization, in the problem we consider a set of FA-based financial factors besides the technical factors of return and risk of the assets as the criteria, whilst accounting for data uncertainty. Therefore, this paper adds to the literature showing the benefits of moving beyond traditional risk-return bi-criteria approaches to portfolio selection problems with an investor who has the preference of incorporating the fundamental financial strength of firms into the portfolio selection and optimization process. In this vein, we put special focus on addressing the inherent uncertainties of financial markets, as without considering this any portfolio selection approach may be unreliable. As illustrated in Fig. 1, the proposed framework consists of 2 phases. In the first phase, through an asset screening procedure and utilizing a super-efficiency DEA model, the efficiency score of all the registered assets in the stock exchange market is measured. Accordingly, for the inputs and outputs of the DEA model, a set of financial parameters governing various aspects of the firm’s operations is determined based on quantitative and qualitative information derived from financial statements of firms, as well as relevant literature and expert advice. At the end of this phase, assets that are potential investments are determined based on the investor’s filtering criteria (e.g. efficient assets) as candidates for the portfolio optimization problem in the second phase. The portfolio weights of the set of qualified risky assets resulting from the first phase are decided in the second phase, where the portfolio is constructed. In this phase, a multi-factor EGP model is developed which allows for the consideration of specific investor preferences. In this regard, each goal represents a distinct asset-related financial factor, ranging from technical return and risk factors to FA-based factors derived from the first-phase DEA approach. Further details regarding the constraints and assumptions of the portfolio optimization models will be elaborated in Sect. 3.6. Moreover, robust optimization is employed to deal with the inherent data uncertainty feature of financial markets during the portfolio optimization phase. Accordingly, the deterministic EGP models are extended to robust versions in order to ensure model stability when facing imprecise data and to strike a balance between system cost and reliability.

3.2 The super-efficiency DEA model

Various decision-making units (DMUs) can have an efficiency score of 1 in the outputs of original DEA models, such as the CCR model in Charnes et al. (1978). As a result, these models are unable to rank the DMUs precisely. In response to these flaws in traditional DEA models, Andersen and Petersen (1993) developed a novel method for ranking efficient DMUs, resulting in the notion of super-efficiency. The Andersen and Petersen (1993) model is as follows:

$$\mathrm{Min }{\uptheta }_{o}^{s}$$
$$S.t.$$
$$\sum_{\begin{array}{c}j=1\\ j\ne o\end{array}}^{{\text{n}}}{\uplambda }_{{\text{j}}}{\overline{{\text{x}}} }_{{\text{ij}}}\le {\uptheta }_{o}^{s}{\overline{{\text{x}}} }_{{\text{io}}}\quad\mathrm{ i}=1,\dots ,\mathrm{ m}$$
$$\sum_{\begin{array}{c}j=1\\ j\ne o\end{array}}^{{\text{n}}}{\uplambda }_{{\text{j}}}{\overline{{\text{y}}} }_{{\text{rj}}}\ge {\overline{{\text{y}}} }_{{\text{ro}}}\quad\mathrm{ r}=1,\dots ,\mathrm{ s}$$
$${\uplambda }_{{\text{j}}}\ge 0 \quad \mathrm{ j}=\mathrm{1,2},\dots ,{\text{n}}$$
(1)

where \({\uptheta }_{o}^{s}\), \({\overline{{\text{x}}} }_{{\text{io}}}\), and \({\overline{{\text{y}}} }_{{\text{ro}}}\) are the efficiency score, input, and output values of the DMU under evaluation, respectively. The \({\uplambda }_{{\text{j}}}\) variable is the weight corresponding to the \({{\text{DMU}}}_{{\text{j}}}\), and the model (1) evaluates the relative efficiencies of \(n\) DMUs, each with \(m\) inputs and \(s\) outputs denoted by \({\overline{{\text{x}}} }_{1{\text{j}}}, \dots , {\overline{{\text{x}}} }_{{\text{mj}}}\) and \({\overline{{\text{y}}} }_{1{\text{j}}}, \dots , {\overline{{\text{y}}} }_{{\text{sj}}}\), respectively. The super-efficiency model (1) computes the efficiency score of the DMU under evaluation by removing it from constraints. It is used to rank efficient DMUs generated by the original DEA models, but it may also be applied to assess and rank all DMUs. Super-efficiency DEA models have been applied in various fields of finance for performance evaluation [see e.g. Avkiran (2011), Pätäri et al. (2012), Dutta et al. (2020), Lin and Li (2020), and Tsolas (2022) for applications in performance appraisal within banking, equity portfolio, non-banking finance companies, mutual funds, and exchange-traded funds, respectively]. Due to its superior discriminating power, the super-efficiency model (1), in its variable returns to scale (VRS) version, which includes the additional convexity constraint \(\sum_{\begin{array}{c}j=1\\ j\ne o\end{array}}^{{\text{n}}}{\uplambda }_{{\text{j}}}=1\), is employed in this study to fundamentally evaluate and rank all the registered assets in the stock exchange market in the first phase of the proposed methodology for the stock portfolio selection problem. The VRS model allows for the assessment of the relative efficiency of each firm by considering their specific scale of operations and capturing variations in efficiencies due to differences in the firms' financial structures and operational characteristics. At the end of this phase, based on criteria such as a cardinality constraint or the efficiency level determined by the FDM, the top assets are selected for potential investment in the second phase.

Financial ratios are widely used and accepted in the finance and accounting literature for comparative analysis and performance evaluation purposes (Almeida-Filho et al., 2021; Wu et al., 2022). For the inputs and outputs of the DEA model, we employ a range of performance perspectives for a firm including, profitability, activity, liquidity, and leverage ratios. These four major categories of financial ratios represent the underlying financial strengths of a firm. Based on the literature (Edirisinghe & Zhang, 2007, 2008; Kuo et al., 2021; Lim et al., 2014; Peykani et al., 2022; Rahiminezhad Galankashi et al., 2020; Wu et al., 2022; **donas et al., 2009), expert opinions (from fund managers and financial advisors), and the Delphi method (utilizing questionnaires), we select a total of 14 critical financial parameters that encompass all of these perspectives to serve as inputs and outputs for our DEA model, as presented in Table 2. In the DEA model, parameters that follow the ‘the more, the better’ principle are used as outputs, while those adhering to 'the fewer, the better' principle are classified as inputs (Cook et al., 2014).

Table 2 Financial parameters used as inputs and outputs of the super-efficiency DEA model

3.3 Extended factor GP models

We term the factors that we are going to use alongside the traditional TA-based risk and return factors in our proposed GP models for the portfolio optimization problem as “extended factors” (Tamiz & Azmi, 2019; Tamiz et al., 2013). In practice, the FDM may wish to oversee the construction of their portfolio by simultaneously optimizing various incommensurable and conflicting criteria, such as rate of return, risk, liquidity, dividends, number of assets, amount of short selling, social responsibility, financial ratios, investment in R&D, and so forth. These factors are often considered as constraints in the portfolio selection process. However, analysts and investors gain a comprehensive understanding of each factor's relationship with investment risk and return, as well as the trade-offs involved when each factor is treated as a distinct objective (Rahiminezhad Galankashi et al., 2020; Steuer et al., 2007, 2008). In a simple manner, all possible criteria considered as objectives can be linearly modeled (Spronk et al., 2016). Hence, we treat each factor as a distinct linearly modeled goal function within our proposed GP models tailored for the portfolio optimization phase.

It is important to note that the decision of whether to employ a specific set of factors primarily depends on the investor’s preferences, strategies, and objectives. Extra goals representing other factors, financial or even non-financial, can readily be incorporated into the GP framework for portfolio optimization. In this paper, with the aim of incorporating the fundamental financial strength of firms into the portfolio optimization process and leveraging the synergy resulting from concurrently employing FA-based and TA-based factors, we define the following six factors for the multiple-criteria portfolio selection problem:

1- Return (RE); 2- Risk (RI); 3- Liquidity (LI); 4- Leverage (LE); 5- Activity (AC); 6- Profitability (PR);

Assuming \(i\) as the index of assets, the return and risk factors of asset \(i\) are assumed to be the mean and the standard deviation of the rate of returns of asset \(i\) in a given construction time period and are calculated as follows:

$${r}_{i,t}=\frac{{p}_{i,t} \_{p}_{i,t-1}}{{p}_{i,t-1}}$$
(2)
$${RE}_{i}=\frac{1}{T}\sum_{t=1}^{T}{r}_{i,t}$$
(3)
$${RI}_{i}=\sqrt{\frac{\sum_{t=1}^{T}{\left({r}_{i,t}-{RE}_{i}\right)}^{2}}{T-1}}$$
(4)

where \({p}_{i,t}\) and \({p}_{i,t-1}\) are the closing prices of asset \(i\) at the end of the time periods \(t\) and\(t-1\), respectively. Also, \({r}_{i,t}\) is the rate of return of asset \(i\) in the time period \(t\), \(T\) is the number of construction time periods and \({RE}_{i}\) and \({RI}_{i}\) are the return and risk factors of asset\(i\), respectively.

For the calculation of other remaining factors, the following sets and notation are introduced:

\(j\): Index of financial parameters according to Table 2.

\({J}_{Li}\): Members of the liquidity perspective factors set according to Table 2.

\({J}_{Le}\): Members of the leverage perspective factors set according to Table 2.

\({J}_{Ac}\): Members of the activity perspective factors set according to Table 2.

\({J}_{Pr}\): Members of the profitability perspective factors set according to Table 2.

\({R}_{ij}\): The value of the \({j}^{th}\) financial parameter for the \({i}^{th}\) asset.

\({K}_{j}\): The normalization constant for the \({j}^{th}\) financial parameter.

\(\widehat{{R}_{ij}}\): The normalized value of the \({j}^{th}\) financial parameter for the \({i}^{th}\) asset.

The normalized values for the financial parameters can be obtained through the following equation:

$$\widehat{{R}_{ij}}=\frac{{R}_{ij}}{{K}_{j}}\quad \forall i,j$$
(5)

where the normalization constant \({K}_{j}\) is chosen to be the Euclidean mean of the values for the \({j}^{th}\) financial parameter as it is a computationally robust method among the normalization techniques (Tamiz & Azmi, 2019). Thus, we have \({K}_{j}\) for each financial parameter \(j\) as follows:

$${K}_{j}=\sqrt{\sum_{i}{R}_{ij}^{2}}$$
(6)

Consequently, we calculate the values of the other four factors considered in this paper beyond risk and return, called extended factors, as the arithmetic mean of the normalized values of the financial parameters belonging to each perspective as follows:

$${LI}_{i}=\frac{\sum_{j\in {J}_{Li}}\widehat{{R}_{ij}}}{\left|{J}_{Li}\right|}$$
(7)
$${LE}_{i}=\frac{\sum_{j\in {J}_{Le}}\widehat{{R}_{ij}}}{\left|{J}_{Le}\right|}$$
(8)
$${AC}_{i}=\frac{\sum_{j\in {J}_{Ac}}\widehat{{R}_{ij}}}{\left|{J}_{Ac}\right|}$$
(9)
$${PR}_{i}=\frac{\sum_{j\in {J}_{Pr}}\widehat{{R}_{ij}}}{\left|{J}_{Pr}\right|}$$
(10)

where \(\left|{J}_{Li}\right|\), \(\left|{J}_{Le}\right|\), \(\left|{J}_{Ac}\right|\) and \(\left|{J}_{Pr}\right|\) are the cardinalities of the liquidity, leverage, activity, and profitability perspective factors sets according to Table 2, respectively.

It should be noted that liquidity, leverage, activity, and profitability factors represent the fundamental financial strengths of assets in the GP models for portfolio optimization. We further note that our choice of strategic indicator-based metrics as extended factors in the GP models in this study, as described above, aligns with the existing correlations between individual ratios within each perspective according to Table 2’s selection. This primarily allows for a more streamlined and manageable set of goals in the GP models (adhering to a maximum of seven objectives in an MCDM problem, according to Steuer et al. (2007)) while striving for a comprehensive assessment that encapsulates the entirety of Table 2's financial parameters. Moreover, this approach avoids losing information and critical nuances that can potentially result from neglecting certain ratios within a given perspective due to the existing correlations, which could be pivotal for a thorough assessment. The optimization type of the extended factors in the GP models is consistent with distinguishing the DEA inputs/outputs, as mentioned in sub-Sect. 3.2 (In the Activity perspective from Table 2, the first parameter's values were multiplied by -1 to maintain consistency with maximizing the AC factor). The extended factors utilized in this paper along with risk and return for the portfolio optimization problem, the desired type of optimization for each factor as an objective function, the desired target value of each factor, and the corresponding deviational variables to be penalized in the achievement function of the GP models are summarized in Table 3.

Table 3 Summary of the extended factors used in this paper for the portfolio optimization problem

3.4 Extended goal programming

The extended lexicographic goal programming (ELGP) was proposed by Romero (2001) to permit a parametric analysis of trade-offs between efficiency and balance among different achievement levels of the target values. The ELGP formulation allows for the inclusion and combination of the lexicographic ordering, optimization, balancing and satisficing underlying philosophies of a single decision making entity (Jones & Tamiz, 2010). The lexicographic ordering philosophy is attainable through the precedence structure of the achievement function. The satisfying philosophy can be deduced from the set of goals. The optimizing philosophy is accomplished via the minimization of the weighted sum of deviations and the balancing philosophy is achieved through the inclusion of the maximum deviation term (D) in each priority level. Moreover, the equilibrium between optimization (efficiency) and balance (equity) can be controlled at each priority level through the parameter α which can be varied between complete concentration on optimization (α = 0)(the WGP achievement function) and complete focus on balance (α = 1) (the MinMax GP achievement function). Lexicographic and non-lexicographic forms of the model can be formulated for the cases of the presence and absence of a lexicographic ordering of goals, respectively. As this paper is concerned primarily with investigations of efficiency-balance trade-offs between objectives rather than prioritizing them, the non-lexicographic form of the extended goal programming model is used. The single priority level (non-lexicographic) extended goal program can be represented by the following formulation (Romero, 2004):

$$\mathit{Min}A=\alpha D+\left(1-\alpha \right)\left\{\sum_{i=1}^{I}(\frac{{w}_{{n}_{i}}{n}_{i}}{{K}_{i}}+\frac{{w}_{{p}_{i}}{p}_{i}}{{K}_{i}})\right\}$$
$$S.t.$$
$$\frac{{w}_{{n}_{i}}{n}_{i}}{{k}_{i}}+\frac{{w}_{{p}_{i}}{p}_{i}}{{k}_{i}}-D\le 0\quad i=\mathrm{1,2},\dots ,I$$
$${f}_{i}(x)+{n}_{i}-{p}_{i}={b}_{i}\quad i=\mathrm{1,2},\dots ,I$$
$$x\in F$$
$${n}_{i},{p}_{i}\ge 0\quad i=\mathrm{1,2},\dots ,I$$
(11)

where \({\text{A}}\) is the achievement function of the goal program, \({n}_{i}\) is the negative deviational variable of the \({i}^{th}\) goal, \({w}_{{n}_{i}}\) is the preferential weight associated with the minimization of \({n}_{i}\), \({p}_{i}\) is the positive deviational variable of the \({i}^{th}\) goal, \({w}_{{p}_{i}}\) is the preferential weight associated with the minimization of \({p}_{i}\), \({K}_{i}\) is the normalization constant associated with the \({i}^{th}\) goal, \({b}_{i}\) represents the target level for the \({i}^{th}\) goal, \({f}_{i}(x)\) is the \({i}^{th}\) objective function, \(F\) represents the feasible region of the original multi-objective problem, \(I\) is the number of goals and finally parameter \(\alpha \) provides a trade-off between minimization of the weighted sum of unwanted deviation variables and minimization of the maximum deviation from the target values.

Despite its flexibility and advantages over other more commonly used GP variants, the EGP variant is a relatively less used variant in the literature in solving multi-objective optimization problems. To date, some of the data-driven applications of the EGP in the literature include integrated lot-sizing and cutting stock (Oliveira et al., 2021), allocation of medical robotic devices to treatment centres (Jones et el. 2022), supply chain planning in the silk industry (Jatuphatwarodom et al., 2018), and energy and asset management in an industrial microgrid (Choobineh & Mohagheghi, 2016). In this research, we explore the effectiveness of the EGP variant in the financial portfolio selection context under both deterministic and uncertain conditions.

3.5 Robust optimization framework

Unlike stochastic optimization, robust optimization utilizes uncertainty sets instead of probability distributions to account for uncertain data. In our study, two RO approaches via polyhedral uncertainty set (Li et al., 2011) and combined interval and polyhedral uncertainty set (Bertsimas & Sim, 2004) are adopted to develop and compare the robust counterparts of the deterministic extended goal programming model, due to the advantages discussed in Sect. 1. To present these two RO approaches, let us consider a linear mathematical programming model as:

$$Max \sum_{j}{c}_{j}{x}_{j}$$
$$S.t.$$
$$\sum_{j}{\widetilde{a}}_{ij}{x}_{j}\le {b}_{i} \quad\forall i$$
$$x\in X$$
(12)

where coefficients \({\widetilde{a}}_{ij}\) are subject to uncertainty. Let \({J}_{i}\) denote the set of uncertain coefficients in the \({i}_{th}\) constraint. Each uncertain parameter \({\widetilde{a}}_{ij}\) with a symmetric and bounded distribution in interval \(\left[{a}_{ij}-{\widehat{a}}_{ij}, {a}_{ij}+{\widehat{a}}_{ij}\right]\) can be demonstrated as:

$${\widetilde{a}}_{ij}={a}_{ij}+{\zeta }_{ij}{\widehat{a}}_{ij}\quad \forall j\in {J}_{i}$$
(13)

where \({a}_{ij}\) is the nominal value, \({\widehat{a}}_{ij}\) represents the perturbation magnitude of each uncertain parameter \({\widetilde{a}}_{ij}\), and \({\zeta }_{ij}\) is an independent random variable taking values in range \(\left[-1, 1\right]\). According to this definition, Ben-Tal and Nemirovski (1998) show that the set induced robust counterpart formulation of the original \({i}_{th}\) constraint can be expressed as:

$$\sum_{j}{a}_{ij}{x}_{j}+\left[\underset{\upzeta \in {\text{U}}}{{\text{max}}}\left\{\sum_{j\in {J}_{i}}{{\zeta }_{ij}\widehat{a}}_{ij}{x}_{j}\right\}\right] \le {b}_{i} \quad \forall i$$
(14)

It is obvious that the robust formulation (14) is closely linked to the uncertainty set \(U\) which is defined using an arbitrary norm on vector spaces. A polyhedral uncertainty set (as shown in Fig. 3 for a 2-dimensional uncertain parameter space) is employed at first to develop the robust counterpart models. If the uncertainty set \(U\) is chosen to be a polyhedral uncertainty set defined by the \({L}^{1}\) norm of the random variable \(\zeta \), it can be expressed as:

$${U}_{1}=\left\{\zeta |{\Vert \zeta \Vert }_{1}\le\Gamma \right\}=\left\{\zeta |\sum_{j\in {J}_{i}}\left|{\zeta }_{j}\right| \le\Gamma , \forall j\right\}$$
(15)

where Γ is an adjustable parameter which controls the size of the uncertainty set. In this case, the robust counterpart formulation (14) will be equivalent to the following set of constraints (Li et al., 2011):

$$\sum_{j}{a}_{ij}{x}_{j}+{\Gamma }_{i}{\uplambda }_{i}\le {b}_{i} \quad \forall i$$
$${\uplambda }_{i}\ge {\widehat{a}}_{ij}{y}_{j} \quad \forall i,j\in {j}_{i}$$
$${-y}_{j}\le {x}_{j}\le {y}_{j} \quad \forall j$$
$${\uplambda }_{i}\ge 0 \quad \forall i$$
$${{\text{y}}}_{j}\ge 0 \quad \forall j$$
(16)
Fig. 3
figure 3

Illustration of polyhedral uncertainty set

where \({\uplambda }_{i}\) is a new auxiliary variable and\({\Gamma }_{i}\), which is not necessarily an integer, can take value from interval\(\left[0, \left|{J}_{i}\right|\right]\), where \(\left|{J}_{i}\right|\) denotes the cardinality of set\({J}_{i}\). Parameter \({\Gamma }_{i}\) aims to allow \(\left\lfloor {{\Gamma }_{i} } \right\rfloor\) coefficients of row \(i\) to take their worst-case value and is used to adjust the trade-off between the robustness of the model and the conservatism level of the solution. Also, it should be noted that there will be no need to utilize the variable \({{\text{y}}}_{j}\) in the case that\({{\text{x}}}_{j}\ge 0 ,\forall j\).

Subsequently, the behavior of the robust counterpart models is investigated via a combined interval and polyhedral uncertainty set (Fig. 4). This uncertainty set defined by the intersection of \({L}^{\infty }\) and \({L}^{1}\) norms of random variable \(\zeta \) is described as follows (Li et al., 2011):

$${U}_{1\cap \infty }=\left\{\zeta |\sum_{j\in {J}_{i}}\left|{\zeta }_{j}\right| \le\Gamma , \left|{\zeta }_{j}\right|\le 1, \forall j\right\}$$
(17)
Fig. 4
figure 4

Illustration of combined interval and polyhedral uncertainty set

When the uncertainty set \(U\) is defined as the Eq. (17), the equivalent formulation of the robust counterpart (14) is exactly similar to the robust approach proposed by Bertsimas and Sim (2004), as follows:

$$\sum_{j}{a}_{ij}{x}_{j}+{\Gamma }_{i}{\uplambda }_{i}+\sum_{j\in {J}_{i}}{\upomega }_{ij}\le {b}_{i} \quad\forall i$$
$${\uplambda }_{i}+{\upomega }_{ij} \ge {\widehat{a}}_{ij}{y}_{j}\quad \forall i,j\in {j}_{i}$$
$${-y}_{j}\le {x}_{j}\le {y}_{j} \quad\forall j$$
$${\upomega }_{ij}\ge 0 \quad \forall i,j\in {J}_{i}$$
$${\uplambda }_{i}\ge 0 \quad \forall i$$
$${{\text{y}}}_{j}\ge 0 \quad \forall j$$
(18)

where \({\uplambda }_{i}\) and \({\upomega }_{ij}\) are new auxiliary variables and similar to the robust counterpart formulation via the polyhedral uncertainty set, a parameter\({\Gamma }_{i}\in \left[0, \left|{J}_{i}\right|\right]\), called the uncertainty budget, is introduced for each constraint \(i\) to offer a mechanism to control the deterioration effect on the objective function value against the probability of the constraint’s violation. It is worth noting that in the aforesaid robust counterpart formulations, when \({\Gamma }_{i}\) is set equal to zero, the constraints are equivalent to that of the nominal problem. Similarly, when \({\Gamma }_{i}\) is set equal to \(\left|{J}_{i}\right|\), the robust model acts as conservative as in the robust formulation of Soyster’s (1973).

In the above-mentioned robust counterpart approaches, the decision maker can adjust the conservatism level of constraint \(i\) by changing the value of parameter \({\Gamma }_{i}\). Since the parameter \({\Gamma }_{i}\) is an input of the robust formulation, the decision-maker can choose it according to his/her risk aversion level. The larger \({\Gamma }_{i}\) is, the more risk-averse the decision maker is. If up to \({\Gamma }_{i}\) number of uncertain coefficients perturb from their nominal values, the robust solution will remain feasible for every change. However, the upper bound for the probability of constraint violation can be calculated as \({\text{exp}}\left(-{\Gamma }_{i}^{2}/2\left|{J}_{i}\right|\right)\) (Li & Floudas, 2012). Similar steps can also be followed to utilize the aforementioned robust approaches for tackling uncertainty in the right hand side (\(RHS\)) parameters.

3.5.1 General framework of the robust extended goal programming (REGP)

Assuming \({f}_{i}(x)\) as linear objective functions and goals of the minimization type in the general non-lexicographic extended goal programming formulation given in (11), the general robust extended goal programming framework using the polyhedral uncertainty set is formulated as follows:

$$\mathit{Min}A=\alpha D+\left(1-\alpha \right)\left\{\sum_{i=1}^{I}(\frac{{w}_{{p}_{i}}{p}_{i}}{{K}_{i}})\right\}$$
$$S.t.$$
$$\frac{{w}_{{p}_{i}}{p}_{i}}{{k}_{i}}-D\le 0 \quad i=\mathrm{1,2},\dots ,I$$
$$\sum_{j}{c}_{ij}{x}_{j}+{\Gamma }_{i}{\lambda }_{i}+{n}_{i}-{p}_{i}={b}_{i} \quad i=\mathrm{1,2},\dots ,I$$
$${\lambda }_{i}\ge {\widehat{c}}_{ij}{x}_{j} \quad i=\mathrm{1,2},\dots ,I , j\in {j}_{i}$$
$$x\in F$$
$${x}_{j},{n}_{i},{p}_{i},{\lambda }_{i}\ge 0 \quad i=\mathrm{1,2},\dots ,I , \forall j$$
(19)

where \({c}_{ij}\) represents the nominal values of the uncertain objective function coefficients, \({\widehat{c}}_{ij}\) denotes the perturbation magnitude corresponding to the uncertain coefficient \({c}_{ij}\), \({\Gamma }_{i}\) is the conservatism level corresponding to the \({i}^{th}\) goal constraint, \({\lambda }_{i}\) is the auxiliary variable of the robust counterpart model associated with the \({i}^{th}\) goal constraint, and the significance of all other notations and constraints remains the same as defined in formulation (11).

Under the same assumptions, the general robust extended goal programming framework utilizing the combined interval and polyhedral uncertainty set is represented as follows:

$${Min}A=\alpha D+\left(1-\alpha \right)\left\{\sum_{i=1}^{I}(\frac{{w}_{{p}_{i}}{p}_{i}}{{K}_{i}})\right\}$$
$$S.t.$$
$$\frac{{w}_{{p}_{i}}{p}_{i}}{{k}_{i}}-D\le 0 \quad i=\mathrm{1,2},\dots ,I$$
$$\sum_{j}{c}_{ij}{x}_{j}+{\Gamma }_{i}{\lambda }_{i}+\sum_{j\in {J}_{i}}{\omega }_{ij}+{n}_{i}-{p}_{i}={b}_{i} \quad i=\mathrm{1,2},\dots ,I$$
$${\lambda }_{i}+{\upomega }_{ij}\ge {\widehat{c}}_{ij}{x}_{j} \quad i=\mathrm{1,2},\dots ,I , j\in {j}_{i}$$
$$x\in F$$
$${\upomega }_{ij}\ge 0 \quad i=\mathrm{1,2},\dots ,I , j\in {j}_{i}$$
$${x}_{j},{n}_{i},{p}_{i},{\lambda }_{i}\ge 0 \quad i=\mathrm{1,2},\dots ,I , \forall j$$
(20)

where \({\omega }_{ij}\) is the auxiliary variable of the robust counterpart model associated with the \({i}^{th}\) goal constraint and decision variable \({x}_{j}\), and all other notations and constraints are consistent with those in the REGP model described earlier via the polyhedral uncertainty set.

3.6 The proposed portfolio selection models

In the second phase of the proposed methodology for the stock portfolio selection problem, the amount to be invested in each qualifying stock resulting from the first phase is decided and finally the desired portfolio will be constructed. To this end, a multi-factor extended goal programming model containing all the factors introduced in Sect. 3.3 is developed. Then, taking into account the underlying uncertainties in different factors, the robust counterparts of the EGP models are developed utilizing the two different uncertainty sets mentioned in Sect. 1. It is worth noting that in order to make the models general and comparable with classic benchmark models, no additional constraints other than the budget constraint were considered in the portfolio selection models. Naturally, such constraints like cardinality constraint, transaction costs constraint and many other market-related constraints could be straightforwardly added to the models according to the investors’ preferences. Furthermore, no short-selling is allowed to make the whole problem more realistic.

Prior to setting forth the various formulations we investigate, it is necessary to introduce the associated sets, parameters, and decision variables. The notations are defined as follows:

3.6.1 Indices

\(i\) Set of assets, \(i=\mathrm{1,2},\dots ,N\)

\(q\) Set of factors, \(q=RE,RI,LI,LE,AC,PR\)

3.6.2 Parameters

\({RE}_{i}\) Nominal value corresponding to the return factor of the \({i}^{th}\) asset.

\({RI}_{i}\) Nominal value corresponding to the risk factor of the \({i}^{th}\) asset.

\({LI}_{i}\) Nominal value corresponding to the liquidity factor of the \({i}^{th}\) asset.

\({LE}_{i}\) Nominal value corresponding to the leverage factor of the \({i}^{th}\) asset.

\({AC}_{i}\) Nominal value corresponding to the activity factor of the \({i}^{th}\) asset.

\({PR}_{i}\) Nominal value corresponding to the profitability factor of the \({i}^{th}\) asset

\({v}_{q}\) Weight attributed to each negative deviational variable, \(\forall q\)

\({w}_{q}\) Weight attributed to each positive deviational variable, \(\forall q\)

\({T}_{q}\) Target values for goals, \(\forall q\)

\({k}_{q}\) Normalization constant for deviational variables, \(\forall q\)

\(\alpha \) Parameter for controlling the trade-off between efficiency and equity in the EGP achievement function

\({\Gamma }_{q}\) Level of conservatism (budget of uncertainty), \(\forall q\)

\({\Delta }_{q}\) Maximum deviation corresponding to each uncertain factor (percentage of each factor), \(\forall q\)

3.6.3 Decision variables

\({x}_{i}\) Proportion of funds invested in the \({i}^{th}\) asset.

\({n}_{q}\) Negative deviational variables, \(\forall q\)

\({p}_{q}\) Positive deviational variables, \(\forall q\)

\(D\) Maximum weighted deviation from amongst the set of unwanted deviations.

\({\omega }_{i}^{q}\) Penalty associated with each factor and decision variable \({{\text{x}}}_{{\text{i}}}\) (auxiliary variable of the robust counterpart model), \(\forall q\)

\({\lambda }_{q}\) Scalar value that takes into account \({\upomega }_{{\text{i}}}\) (auxiliary variable of the robust counterpart model), \(\forall q\)

3.6.4 The proposed deterministic extended goal programming model for portfolio selection

Considering all the factors previously described in Table 3, the proposed multi-factor deterministic extended goal programming model for the portfolio selection problem has the following general structure:

$$\mathit{Min}Z= \alpha D+\left(1-\alpha \right)\left(\frac{{v}_{RE}}{{k}_{RE}}{n}_{RE}+\frac{{v}_{AC}}{{k}_{AC}}{n}_{AC}+\frac{{v}_{PR}}{{k}_{PR}}{n}_{PR}+\frac{{w}_{RI}}{{k}_{RI}}{p}_{RI}+\frac{{w}_{LI}}{{k}_{LI}}{p}_{LI}+\frac{{w}_{LE}}{{k}_{LE}}{p}_{LE}\right)$$
(21)
$$S.t.$$
$$\frac{{v}_{RE}}{{k}_{RE}}{n}_{RE}-D\le 0$$
(22)
$$\frac{{w}_{RI}}{{k}_{RI}}{p}_{RI}-D\le 0$$
(23)
$$\frac{{w}_{LI}}{{k}_{LI}}{p}_{LI}-D\le 0$$
(24)
$$\frac{{w}_{LE}}{{k}_{LE}}{p}_{LE}-D\le 0$$
(25)
$$\frac{{v}_{AC}}{{k}_{AC}}{n}_{AC}-D\le 0$$
(26)
$$\frac{{v}_{PR}}{{k}_{PR}}{n}_{PR}-D\le 0$$
(27)
$$\sum_{i=1}^{n}{RE}_{i}{x}_{i}+{n}_{RE}-{p}_{RE}={T}_{RE}$$
(28)
$$\sum_{i=1}^{n}{RI}_{i}{x}_{i}+{n}_{RI}-{p}_{RI}={T}_{RI}$$
(29)
$$\sum_{i=1}^{n}{LI}_{i}{x}_{i}+{n}_{LI}-{p}_{LI}={T}_{LI}$$
(30)
$$\sum_{i=1}^{n}{LE}_{i}{x}_{i}+{n}_{LE}-{p}_{LE}={T}_{LE}$$
(31)
$$\sum_{i=1}^{n}{AC}_{i}{x}_{i}+{n}_{AC}-{p}_{AC}={T}_{AC}$$
(32)
$$\sum_{i=1}^{n}{PR}_{i}{x}_{i}+{n}_{PR}-{p}_{PR}={T}_{PR}$$
(33)
$$\sum_{i=1}^{n}{x}_{i}=1$$
(34)
$${x}_{i}\ge 0 \quad \forall i$$
(35)
$${n}_{RE},{p}_{RE},{n}_{RI},{p}_{RI},{n}_{LI},{p}_{LI},{n}_{LE},{p}_{LE},{n}_{AC},{p}_{AC},{n}_{PR},{p}_{PR}\ge 0$$
(36)

where the achievement function (21) minimizes the unwanted deviations according to the EGP philosophy, constraints (22)–(27) determine the maximal weighted, normalized deviation from amongst the set of unwanted deviations, constraints (28)–(33) calculate the positive and negative deviations from the desired target value for each goal for a given solution \({x}_{i},\forall i\), constraint (34) is the budget constraint in the portfolio selection problem, constraints (35) indicates that no short-selling is allowed, and constraint set (36) ensures the non-negativity of all the deviational variables.

3.6.5 The proposed robust extended goal programming models (REGP) for portfolio selection

This sub-section intends to extend the deterministic goal programming models to robust counterpart models in which different financial factors are considered uncertain. This is done via considering two different uncertainty sets previously described. The uncertain parameters are modeled as random variables which take values according to symmetric distributions with means equal to the nominal values. For instance, \({RE}_{i}\) belongs to the interval \(\left[{RE}_{i}-{\widehat{RE}}_{i}, {RE}_{i}+{\widehat{RE}}_{i}\right]\), where \({RE}_{i}\) denotes the nominal value and \({\widehat{RE}}_{i}\) denotes the perturbation magnitude. Also, the perturbation magnitude corresponding to each factor is considered as a percentage (∆) of the nominal value of each factor. E.g., \({\widehat{RE}}_{i}=\Delta \times {RE}_{i}\), where ∆ can be different for each of the factors.

According to the aforementioned assumptions and the robust formulations expressed previously, the corresponding robust counterpart of the model (21)-(36) via the polyhedral uncertainty set can be described as follows:

$$\mathit{Min}Z= \alpha D+\left(1-\alpha \right)\left(\frac{{v}_{RE}}{{k}_{RE}}{n}_{RE}+\frac{{v}_{AC}}{{k}_{AC}}{n}_{AC}+\frac{{v}_{PR}}{{k}_{PR}}{n}_{PR}+\frac{{w}_{RI}}{{k}_{RI}}{p}_{RI}+\frac{{w}_{LI}}{{k}_{LI}}{p}_{LI}+\frac{{w}_{LE}}{{k}_{LE}}{p}_{LE}\right)$$
(37)
$$S.t.$$
$$\sum_{i=1}^{n}{RE}_{i}{x}_{i}-{\Gamma }_{RE}{\lambda }_{RE}+{n}_{RE}-{p}_{RE}={T}_{RE}$$
(38)
$$\sum_{i=1}^{n}{RI}_{i}{x}_{i}+{\Gamma }_{RI}{\lambda }_{RI}+{n}_{RI}-{p}_{RI}={T}_{RI}$$
(39)
$$\sum_{i=1}^{n}{LI}_{i}{x}_{i}+{\Gamma }_{LI}{\lambda }_{LI}+{n}_{LI}-{p}_{LI}={T}_{LI}$$
(40)
$$\sum_{i=1}^{n}{LE}_{i}{x}_{i}+{\Gamma }_{LE}{\lambda }_{LE}+{n}_{LE}-{p}_{LE}={T}_{LE}$$
(41)
$$\sum_{i=1}^{n}{AC}_{i}{x}_{i}-{\Gamma }_{AC}{\lambda }_{AC}+{n}_{AC}-{p}_{AC}={T}_{AC}$$
(42)
$$\sum_{i=1}^{n}{PR}_{i}{x}_{i}-{\Gamma }_{PR}{\lambda }_{PR}+{n}_{PR}-{p}_{PR}={T}_{PR}$$
(43)
$${\lambda }_{RE}\ge {\widehat{RE}}_{i}{x}_{i} \quad \forall i$$
(44)
$${\lambda }_{RI}\ge {\widehat{RI}}_{i}{x}_{i} \quad \forall i$$
(45)
$${\lambda }_{LI}\ge {\widehat{LI}}_{i}{x}_{i} \quad \forall i$$
(46)
$${\lambda }_{LE}\ge {\widehat{LE}}_{i}{x}_{i} \quad \forall i$$
(47)
$${\lambda }_{AC}\ge {\widehat{AC}}_{i}{x}_{i} \quad \forall i$$
(48)
$${\lambda }_{PR}\ge {\widehat{PR}}_{i}{x}_{i} \quad \forall i$$
(49)
$${\lambda }_{RE},{\lambda }_{RI},{\lambda }_{LI},{\lambda }_{LE},{\lambda }_{AC},{\lambda }_{PR}\ge 0$$
(50)

Constraints (22) to (27).

Constraints (34) to (36).

where the significance of all notations and constraints remains the same as that defined in model (21)-(36), except that the goal constraints (28)-(33) contain an additional term, \(\pm {\Gamma }_{q}{\lambda }_{q}, \forall q\), with respect to the optimization type, which accounts for the maximum variability they induce, given a conservatism level \({\Gamma }_{q}\), and which is bounded for each combination of decision variable and goal via the additional constraints set (44)-(49). Finally, constraint set (50) enforces non-negativity restrictions on auxiliary variables \({\lambda }_{q}, \forall q\).

The robust counterpart formulation of the model (21)-(36) via the combined interval and polyhedral uncertainty set (Bertsimas and Sim’s (2004) approach) can be derived in a parallel way and is represented as follows:

$${Min}Z= \alpha D+\left(1-\alpha \right)\left(\frac{{v}_{RE}}{{k}_{RE}}{n}_{RE}+\frac{{v}_{AC}}{{k}_{AC}}{n}_{AC}+\frac{{v}_{PR}}{{k}_{PR}}{n}_{PR}+\frac{{w}_{RI}}{{k}_{RI}}{p}_{RI}+\frac{{w}_{LI}}{{k}_{LI}}{p}_{LI}+\frac{{w}_{LE}}{{k}_{LE}}{p}_{LE}\right)$$
(51)
$$S.t.$$
$$\sum_{i=1}^{n}{RE}_{i}{x}_{i}-{\Gamma }_{RE}{\lambda }_{RE}-\sum_{i=1}^{n}{\omega }_{i}^{RE}+{n}_{RE}-{p}_{RE}={T}_{RE}$$
(52)
$$\sum_{i=1}^{n}{RI}_{i}{x}_{i}+{\Gamma }_{RI}{\lambda }_{RI}+\sum_{i=1}^{n}{\omega }_{i}^{RI}+{n}_{RI}-{p}_{RI}={T}_{RI}$$
(53)
$$\sum_{i=1}^{n}{LI}_{i}{x}_{i}+{\Gamma }_{LI}{\lambda }_{LI}+\sum_{i=1}^{n}{\omega }_{i}^{LI}+{n}_{LI}-{p}_{LI}={T}_{LI}$$
(54)
$$\sum_{i=1}^{n}{LE}_{i}{x}_{i}+{\Gamma }_{LE}{\lambda }_{LE}+\sum_{i=1}^{n}{\omega }_{i}^{LE}+{n}_{LE}-{p}_{LE}={T}_{LE}$$
(55)
$$\sum_{i=1}^{n}{AC}_{i}{x}_{i}-{\Gamma }_{AC}{\lambda }_{AC}-\sum_{i=1}^{n}{\omega }_{i}^{AC}+{n}_{AC}-{p}_{AC}={T}_{AC}$$
(56)
$$\sum_{i=1}^{n}{PR}_{i}{x}_{i}-{\Gamma }_{PR}{\lambda }_{PR}-\sum_{i=1}^{n}{\omega }_{i}^{PR}+{n}_{PR}-{p}_{PR}={T}_{PR}$$
(57)
$${\lambda }_{RE}+{\omega }_{i}^{RE}\ge {\widehat{RE}}_{i}{x}_{i} \quad \forall i$$
(58)
$${\lambda }_{RI}+{\omega }_{i}^{RI}\ge {\widehat{RI}}_{i}{x}_{i} \quad \forall i$$
(59)
$${\lambda }_{LI}+{\omega }_{i}^{LI}\ge {\widehat{LI}}_{i}{x}_{i}\quad \forall i$$
(60)
$${\lambda }_{LE}+{\omega }_{i}^{LE}\ge {\widehat{LE}}_{i}{x}_{i} \quad \forall i$$
(61)
$${\lambda }_{AC}+{\omega }_{i}^{AC}\ge {\widehat{AC}}_{i}{x}_{i} \quad \forall i$$
(62)
$${\lambda }_{PR}+{\omega }_{i}^{PR}\ge {\widehat{PR}}_{i}{x}_{i} \quad \forall i$$
(63)
$${\omega }_{i}^{RE},{\omega }_{i}^{RI},{\omega }_{i}^{LI},{\omega }_{i}^{LE},{\omega }_{i}^{AC},{\omega }_{i}^{PR}\ge 0 \quad \forall i$$
(64)

Constraints (22) to (27).

Constraints (34) to (36), (50).

where all notations and constraints are in common with the REGP model via the polyhedral uncertainty set described first, except that the term for the maximum variability each goal constraint defined in (28)-(33) induces will include \(\pm{\Gamma }_{q}{\lambda }_{q}{\text{and}}\pm \sum_{i=1}^{n}{\omega }_{i}^{q}, \forall q\), with respect to the optimization type, leading to the subsequent changes to the constraints set (44)-(49) which result in the new set of constraints (58)-(63), as well as the additional non-negativity restrictions on auxiliary variables \({\omega }_{i}^{q}, \forall i,q\), as defined by constraint set (64).

4 Experiments and results

4.1 Data description

The Tehran Stock Exchange (TSE) was chosen as a favorable real-world case study for the problem investigated in this research. With over half a century of history, TSE is a major stock market in Iran and the Middle East. Recent fluctuations in market dynamics have heightened volatility within the TSE. These fluctuations often bring uncertainties in key financial metrics of firms that inform investor portfolio decisions. Therefore, deploying robust and resilient investment strategies that shield investors from such uncertainties becomes crucial for investing in this market, making TSE a suitable testing ground for the proposed robust investment approaches in this study. [see e.g. Peykani et al. (2020), (2022), and Rahiminezhad Galankashi et al. (2020) for some other relevant studies on TSE]. Moreover, the inherent flexibility of the proposed approach suggests confidence in its adaptability to navigate various market conditions effectively, enhancing its potential applicability across a broader spectrum of markets. For the first phase, a total of 779 firms listed on the TSE were investigated in order to be evaluated and ranked by the DEA model. The data required for the inputs and outputs of the DEA model according to Table 2 were extracted from the publicly-available financial statements of the firms, all of which are as of June 2019. The experiments for setting up the portfolios in the second phase used a constructing period of 12 months, from November 2018 to November 2019. Besides, the out of sample performance of the resultant portfolios was evaluated using a testing period of 6 months, from December 2019 to May 2020. Also, the computational experiments of this study regarding the optimization models were performed using GAMS software with the CPLEX solver.

4.2 Phase I results

Financial ratios associated with a firm's financial performance, especially those within the same perspective, exhibit correlations (Chen, 2008; Wu et al., 2022), which could be a problem for DEA computations, particularly considering the parameter selection outlined in Table 2 in this study. Neglecting certain ratios within a given financial perspective due to their correlations, on the other hand, can potentially result in a loss of information and critical nuances that could be pivotal for a comprehensive assessment. In this study, we have adopted an aggregation approach that takes into account both the correlations between ratios and their significance in representing specific financial perspectives to streamline the analysis of financial performance indicators within the context of DEA. This involves transforming the diverse set of financial ratios into a concise set of composite variables, each labeled to convey its intended perspective. Specifically, we have created: (1) Input 1, termed "Liquidity Score," as a weighted sum of the normalized values of "Current Ratio" and "Quick Ratio" to represent the liquidity perspective; (2) Input 2, denoted as "Leverage Score," is derived from the weighted sum of the normalized values of "Debt Ratio" and "Debt-to-Equity Ratio" reflecting the leverage perspective; (3) Input 3, "Days Sales Outstanding (DSO)", is treated as a standalone input, as 'the fewer, the better' is the preferred criterion for it, in contrast to ‘the more, the better’ as the ideal principle applied to all other ratios within the activity (asset utilization) perspective; (4) Output 1, labeled as "Asset Utilization Score," is a weighted sum of the normalized values of "Inventory Turnover," "Working Capital Turnover," "Fixed Asset Turnover," and "Asset Turnover"; and (5) Output 2, termed "Profitability Score," encapsulates the normalized values of "Net Profit Margin," "Operating Profit Margin," "Return on Assets," "Return on Equity," and "Return on Working Capital." The choice of weights should reflect the preferences and objectives of the FDMs and can be determined with the help of experts. To ensure a balanced analysis, we employ an equal weighting scheme across these aggregated variables. It should be noted that the chosen aggregation structure in the DEA model in this study aligns with the investor-oriented objective of identifying fundamentally efficient firms for investment, emphasizing a holistic view of financial health over specific areas of strength or weakness. Further, with the aim of meeting the condition of homogeneity of the DMUs, the values used in the DEA computations for the “current” and “quick” ratios are the distance to the industry average for each asset, thereby eliminating the role of industry type. In addition to the weighted sum approach utilized in this study, future research could explore complementary methods, such as clustering techniques as suggested by Wu et al. (2022), for feature reduction in this context.

After running the super-efficiency DEA model with the specifications stated above for all 779 assets registered on the TSE, the efficiency score for each asset is obtained and the assets are ranked based on the scores. The efficiency characteristic of each asset, as determined by the DEA model results, serves as the screening philosophy. Therefore, the selection criterion is chosen to be assets with efficiency scores strictly greater than 1. In this regard, the 60 top assets are chosen. However, due to the lack of historical data necessary for calculating risk and return factors because of permanent deregistration from the stock exchange market during the construction period, we had to exclude 20 assets from the list of the top 60. As a result, we identified and selected the remaining 40 assets as qualified candidates for the portfolio optimization problem in the second phase. Table 6 includes the efficiency scores and ranking of these 40 qualified assets resulting from phase I.

4.2.1 DEA model validation and sensitivity analysis

A sensitivity analysis of the super-efficiency DEA model was conducted to assess the internal and external validity of the Phase I findings (Parkin & Hollingsworth, 1997; Habib & Kayani, 2023). The internal validity test involves the elimination of input and output variables to examine their effects on the DEA efficiency scores. In the current study, the input and output variables of the basic DEA model were removed from the efficiency model sequentially. To gain insights into the distribution of the super-efficiency scores derived from the original DEA model, the Anderson–Darling normality test is conducted (Fig. 5), which reveals that the data are not normally distributed. The test was also conducted for the modified DEA models, which similarly showed that the modified super-efficiency scores did not follow a normal distribution. Hence, a Mann–Whitney U test and the Kruskal–Wallis test are conducted to compare the efficiency scores resulting from the modified DEA models with the original scores to determine whether the removal of the variables resulted in a statistically significant difference in efficiency scores (at a significance level of 0.05). Additionally, the Spearman rank correlation coefficients were calculated to determine whether the rankings of the firms changed within the DEA models. The results of the super-efficiency DEA models with modified specifications are presented in Table 4. As indicated in Table 4, the removal of either input 2 (Leverage Score) or input 3 (DSO) had a significant impact on the model results in terms of the general distribution of the efficiency scores, as evident by the notable drop in average efficiency. However, it had a lesser effect on the rankings of firms, as evidenced by the high and significant Spearman rank correlation coefficients. This outcome is expected, given that either input measure two distinct resource categories. Consequently, removing either one would result in significant information loss. On the output side, none of the outputs (Asset Utilization and Profitability scores) appeared to significantly alter the model results when individually removed, as evidenced by the results of the Mann–Whitney U-test and the significant and high coefficients of the Spearman correlation. Such findings are in line with expectations, as the model is input-oriented.

To assess the external validity of the super-efficiency DEA model, a longitudinal analysis is conducted aimed at examining the consistency of the results over time. The original super-efficiency model was re-applied using data from a year before and a year after the original data set, for which the results are presented in Table 5. Subsequently, the efficiency scores were compared to the original results. The Mann–Whitney U-test revealed no statistically significant difference in the distribution of efficiency scores across the study years, as all P-values exceeded the significance level of 0.05. Furthermore, The Kruskal–Wallis test also supported the results of Mann–Whitney U-test, indicating no statistically significant difference in the efficiency score distribution over the years (P-value = 0.480). Additionally, The Spearman rank correlation coefficients between each year were also high and significant. This suggests that neither the overall distribution of efficiency scores nor the rankings of assets displayed substantial variation from one year to the next. These results underscore the coherence and consistency of the efficiency model employed in this study.

Fig. 5
figure 5

Anderson–Darling normality test of the basic super-efficiency scores

Table 4 Sensitivity analysis of the super-efficiency DEA model
Table 5 Longitudinal analysis of the super-efficiency DEA model

4.3 Phase II results

In order to implement the portfolio selection models in the second phase, the monthly returns of the 40 qualified assets resulted from the first phase are extracted from the TSE using the constructing period mentioned (a total of 520 observations). Also, the out of sample information were obtained using the testing period mentioned (a total of 280 observations). The final data regarding the return and risk factors of the qualified assets as well as the data of the other factors introduced in Sect. 3.3 are presented in Table 6.

Table 6 The data of factors utilized in the portfolio optimization models

Without loss of generality, it is assumed that \({v}_{q}={w}_{q}=1 ,\forall q\) in all the experiments (i.e. each goal is of equal importance). Also, the target values are considered to be the optimal objective function value of the individual single objective models, with each factor considered as the single objective. Moreover, the trade-off parameter in the EGP models (\(\mathrm{\alpha }\)) is considered to be \(0.5\) in all of the experiments as a fair trade-off between optimization and balance underlying philosophies of the EGP structure (Jatuphatwarodom et al., 2018). With these descriptions, the results of different robust EGP models for the multiple criteria portfolio selection problem are presented in the following sections. Also, the weights of the assets in the optimal portfolios are provided in Tables 10, 11 and 12.

4.3.1 Effects of conservatism and perturbation levels in the REGP models

To have a better understanding of how the conservatism degree and data perturbation affect the GP achievement functions and the resultant portfolios, several risk levels have been established based on possible linked combinations of conservatism levels with data perturbations, which can nearly encompass optimistic, realistic and pessimistic decision making approaches. Toward this end, the perturbation range (i.e. ∆) is assumed to be 5%, 10% and 20% while the uncertainty budget is set to 0, 1, 2, 3, 4, 5, 10, 20, and 40. The conservatism level is considered to be the same for all of the uncertain factors (i.e. \({\Gamma }_{RE}={\Gamma }_{RI}={\Gamma }_{LI}={\Gamma }_{LE}={\Gamma }_{AC}={\Gamma }_{PR}=\Gamma \)) and it is assumed that the uncertain parameters vary concurrently.

According to the results presented in Tables 8 and 9 for the multi-factor REGP models, the achievement (objective) function value of the robust models is higher than that of the corresponding deterministic models due to the costs incurred for improving the model stability, which is consistent with similar applications of RGP models such as Ghahtarani and Najafi (2013), Hanks et al. (2017), and Ghasemi Bojd and Koosha (2018). This becomes more tangible when the level of conservatism or perturbation increases. For instance, a highly conservative and risk-averse version of the robust model (i.e., a conservatism level equal to 40 and a perturbation level of 20%) gives us decisions that lead to a 15.8% [= 100 * (0.578 -0.499/0.499)] higher achievement function value for REGP via combined interval and polyhedral uncertainty set and a 52.7% higher achievement function value for REGP via polyhedral uncertainty set. Lower uncertainty levels, on the other hand, increase the stability of the solutions at a more plausible achievement function value by balancing robustness and cost. Figures 6 and 7 depict how the value of the achievement function is affected by different degrees of conservatism and variations in uncertain parameters for the multi-factor REGP models via combined interval and polyhedral and polyhedral uncertainty sets, respectively. The normalized deviation \(\frac{{Z}_{R}-{Z}_{D}}{{Z}_{D}}\) of the optimal value of the achievement function is used in all experiments where \({Z}_{R}\) and \({Z}_{D}\) are the optimal value of achievement function for the robust and deterministic models, respectively. In addition to the changes in the value of the achievement function, Fig. 6 also shows the probabilities of constraint violation for different conservatism degrees.

Figure 6 reveals that the worst-case objective function value of the REGP models via combined interval and polyhedral uncertainty set (Bertsimas and Sim (2004) approach) results when the conservatism levels of the uncertain parameters are less than their extreme values, i.e. \(\Gamma =4<\left|{J}_{i}\right|=40\) for \(\Delta =5\%\) and \(\Delta =10\%\), and \(\Gamma =10<\left|{J}_{i}\right|=40\) for \(\Delta =20\%\). Therefore, \(\Gamma \) does not need to be adjusted to values greater than 4 and 10 to obtain the most conservative results. On the other hand, Fig. 7 depicts that the worst-case objective function value of the REGP models via polyhedral uncertainty set is reached when the conservatism levels of the uncertain factors reach their highest values, i.e. \(\Gamma =\left|{J}_{i}\right|=40\). The initial implications are now apparent. The higher the conservatism level of uncertain parameters is in the REGP model via the polyhedral uncertainty set, the greater the impact will be on the value of the achievement function. This is not the case for variations in parameters in the REGP model via combined interval and polyhedral uncertainty set, where an increase in the level of conservatism causes less impact on the achievement function value.

According to the sensitivity analysis depicted in Figs. 6 and 7, a second insight can be drawn by comparing the extent of achievement value deterioration caused by variations in uncertain parameters. When \(\Delta =20\%\), variation in parameters has a high deterioration impact of 52.7% on the value of achievement function, for the multi-factor REGP model via polyhedral uncertainty set. For the same situation of \(\Delta =20\%\) for the multi-factor REGP model via combined interval and polyhedral uncertainty set, the impact on the value of achievement function imposed by parameters variation is approximately 15.8%. Therefore, the primary focus of the FDM should be placed on more accurate estimation of input parameters data when utilizing the robust models via polyhedral uncertainty set, as variations in parameters can have the highest influence on overall achievement function value.

It should be noted that the trends of variations in the achievement functions of the 2-factor REGP models (i.e., considering only return and risk), as a sample of an REGP model with two goals, compared to their deterministic equivalents are the same as the multi-factor REGP models via both uncertainty sets. Hence, they aren’t discussed here.

The results also demonstrate that despite increasing the problem size by introducing new variables and constraints, the robust approaches retained the computational tractability of the original problem as all of the robust models were solved in less than a second.

Fig. 6
figure 6

Sensitivity of the achievement function to variations in factors for the multi-factor REGP model via combined interval and polyhedral uncertainty set

Fig. 7
figure 7

Sensitivity of the achievement function to variations in factors for the multi-factor REGP model via polyhedral uncertainty set

4.3.2 Comparison between the achievement functions of the REGP models under the two uncertainty sets

As graphically depicted in Fig. 8, under the same conservatism level, the achievement function values of the REGP model via the polyhedral uncertainty set are greater than or equal to those of the REGP model via the combined interval and polyhedral uncertainty set. The gap between the two models is relatively small for low \(\Gamma \) values but widens as \(\Gamma \) increases. In other words, the robust solutions obtained using the polyhedral uncertainty set exhibit a higher deterioration in individual objective function values and distance from the target levels compared to the REGP model using the combined interval and polyhedral uncertainty set. This suggests that the REGP model via the combined interval and polyhedral uncertainty set can offer robust solutions at a lower cost. The observation that the solution based on the polyhedral set is equal or worse than the solution based on the “interval + polyhedral” set can be attributed to the fact that combining the polyhedral set with the interval set results in a smaller uncertainty set, leading to a less conservative outcome. In terms of \(\Delta \), it can also be seen that the gap between the two models gets slightly wider as \(\Delta \) increases.

Fig. 8
figure 8

Achievement function values of multi-factor REGP models via polyhedral and combined interval and polyhedral uncertainty sets for different conservatism & data perturbation levels

4.3.3 Out-of-sample portfolio performance

To gain insight into the impacts of considering uncertainty on out-of-sample portfolio performance, the results of the deterministic EGP models with two factors (i.e., only return and risk) and with multiple factors (all the extended factors) with preferential weights and target values as previously described, as well as the results of the Markowitz model are firstly presented in Table 7. We also note that, as an ideal choice consistent with the goal target value scenarios we selected for the EGP models, we opted for the maximum return value among the 40 candidate assets, which is equal to 0.267 according to Table 6, as the lower bound of the expected portfolio rate of return for the Markowitz model experiment.

The performance metrics used for assessing the results include the following: out-of-sample portfolio return, out-of-sample portfolio standard deviation, Sharpe ratio, number of stocks selected for investment, number of objectives achieved, and the objective function value of the corresponding mathematical model. Among these performance metrics, the Sharpe ratio (Sharpe, 1966), one of the most commonly used measures for evaluating out-of-sample performance, is defined as follows:

$$SR=\frac{E\left({R}_{p}\right)-{R}_{f}}{\sigma ({R}_{p})}$$
(65)

where \(E\left({R}_{p}\right)\), \({R}_{f}\), and \(\sigma ({R}_{p})\) denote the average out-of-sample portfolio return, risk-free return, and out-of-sample standard deviation of portfolio return, respectively.

Tables 8 and 9 present the extensive out-of-sample results of the multi-factor REGP models via combined interval and polyhedral and polyhedral uncertainty sets, respectively. These tables provide the results for each performance metric across various combinations of conservatism levels and data perturbation scenarios.

Table 7 The results of the deterministic models
Table 8 The results of the multi-factor REGP models via combined interval and polyhedral uncertainty set
Table 9 The results of the multi-factor REGP models via polyhedral uncertainty set

4.3.4 Discussion of the phase II results

In what follows, we discuss the results in light of the performance metrics, while also conducting a comparative analysis between nominal (deterministic counterpart) and robust models to elucidate disparities. Additionally, we undertake a meticulous comparison between the multi-factor and two-factor models, both nominal and robust, aiming to derive insights into the significance of considering multiple financial factors. According to Table 7 for the deterministic models results, and having taken into account the consistency of the number of stocks selected for investment (i.e. 2) across all three models, our observations show that the EGP models with multiple factors clearly outperformed the two-factor models. This performance disparity, notably evident against the Markowitz model, becomes most apparent in terms of out-of-sample portfolio rate of return and Sharpe ratio, to the extent that the corresponding Sharpe ratio of the multi-factor EGP model (i.e. 0.980) is twice the corresponding Sharpe ratio of the Markowitz model (i.e. 0.490).

According to Tables 8 and 9, featuring the results of the multi-factor robust models which are our main focus in this research, as expected, with \(\Gamma =0\) the results of the two REGP models are the same and equal to the nominal model. This could be seen as a sign of robust model validation. Notably, a \(\Gamma \) of \(0\) indicates absence of uncertainty consideration, thereby rendering the model susceptible to variations in input parameters. For \(\Gamma =1\), it is observed that the “polyhedral” and the “combined interval and polyhedral” set-based solutions are also identical in terms of all performance metrics, as observed earlier in subSect. 4.3.2 regarding achievement function values. This is because the corresponding uncertainty sets are also identical when \(\Gamma \le 1\), as shown in Fig. 4 . As \(\Gamma \ge 1\), the solutions based on the combined uncertainty set outperform those based on the pure polyhedral set in terms of the achievement function. This superiority arises from the smaller uncertainty sets associated with the combined set, because of the imposition of the bounded box constraint, in contrast to the pure polyhedral set, where the corresponding achievement function values deteriorate rapidly. In terms of portfolio returns, both models generally lead to portfolios having relatively high returns across all uncertainty settings, surpassing the portfolio return obtained from the Markowitz model (with \({R}_{P}=0.109\)). As far as the out-of sample risk values are concerned, both of the REGP models consistently yield portfolios with lower risk compared to the Markowitz model (with \({\sigma }_{P}=0.223\)). Comparing the REGP models via both uncertainty sets against their nominal counterparts (for which the results are presented in the last column of Table 7 or in Tables 8 and 9 with \(\Gamma =0\)) in terms of out-of-sample risk and return performance unveils intriguing dynamics. While both higher and lower returns are realized by robust models (compared to their nominal counterpart model with \({R}_{P}=0.208\)), risk values consistently favor robust models (compared to their nominal counterpart model with \({\sigma }_{P}=0.212\)) across all uncertainty levels. Also, under varying degrees of conservatism, we observe a reduction in returns for both of the REGP models relative to their nominal counterpart, particularly pronounced in the highly conservative versions of the models, which is more notable for the REGP model via polyhedral uncertainty set. In this regard, it is also observed that the REGP model utilizing the polyhedral uncertainty set consistently produces portfolios with significantly lower risk values compared to the nominal model, particularly with its highly conservative and risk-averse versions (marked by higher linked levels of \(\Gamma \) and \(\Delta \)). This could be attributed to the issue that as we increase the level of conservatism in robust optimization models, they become more risk-averse, aiming to protect against worst-case scenarios. This increased aversion to risk may lead to the selection of assets with lower expected returns but lower associated risk, resulting in reduced overall portfolio returns. Consequently, the observed changes in out-of-sample risk and return values for REGP models via both uncertainty sets result in portfolio Sharpe ratios that consistently surpass those of the Markowitz model (\(SR=0.490\)) and the nominal EGP model (\(SR=0.980\)) across all uncertainty levels.

Regarding the influence of perturbation levels (\(\Delta \)) of the uncertain parameters on the aforementioned performance metrics, it is observable that while changes in return, risk, and Sharpe ratio values remain relatively modest for the REGP model using the combined interval and polyhedral uncertainty set at constant conservatism degrees, the influence of \(\Delta \) value variations is more pronounced for the REGP model employing the polyhedral uncertainty set. From the above observations, it can be concluded that the sensitivity of the REGP model via the pure polyhedral uncertainty set is higher in terms of changes in performance metric values against changes in uncertainty levels. This heightened sensitivity was shown in sub-Sects. 4.3.1 and 4.3.2 concerning the achievement function values, and it can be seen from the results that it also extends to other metrics such as \({N}_{x}\).

Concerning the number of selected stocks (\({N}_{x}\)), two general viewpoints prevail. A smaller number of selected stocks will result in lower transaction costs and easier portfolio management. On the other hand, a larger number of stocks also brings with it the benefits of diversification (Georgantas et al., 2021). It is evident from the results that the deterministic models, i.e. the Markowitz and the nominal EGP models, along with some conservative versions of the REGP model employing the combined interval and polyhedral uncertainty set lead to the sparsest portfolios, each consisting of 2 assets. In stark contrast, The REGP model employing the pure polyhedral uncertainty set results in the largest and most diversified portfolios, comprising 33 assets in the highly conservative and risk-averse version of the model. Moreover, in terms of portfolio diversification, we observe that the difference between the nominal and robust models is more trivial in the REGP model via combined interval and polyhedral uncertainty set, with a few exceptions for certain levels of conservatism degree. This phenomenon is not unusual since the application of RO does not mean increased diversification compared to the nominal model. Instead, it tries to create immunization against the worst-case realizations of the uncertain input parameters. In this respect, if a trade-off is made with high return potentials with low risk potentials in nominal models, it will happen with the worst-case return and risk potentials in the robust counterpart problem (** and analyzing the REGP framework, highlighting its potential as a versatile tool that offers a pragmatic, flexible, and straightforward approach to tackling complex multi-objective optimization problems in uncertain environments. EGP underpins REGP, enabling a parametric analysis of trade-offs between efficiency and balance among target values, which makes REGP highly flexible. Our proposed REGP framework in this study features two uncertainty sets—polyhedral and combined interval and polyhedral, which yields robust optimal decisions while preserving linearity and computational tractability, distinguishing it from previous works. It adapts solutions across a spectrum from worst-case to deterministic outcomes based on adjustable parameters associated with the uncertainty sets. The findings illustrated that total deviations from decision makers’ set targets are dependent on the type of the uncertainty set used, unlike the findings of Mensah and Rocca (2019). Furthermore, REGP excels by accommodating uncertainties not only in goal function coefficients and target values but also in system constraints—a critical feature for achieving complete robustness in problems where both constraint parameters and goals are imprecisely defined. Additionally, REGP can be seamlessly extended and combined with other uncertainty modeling techniques, depending on the nature of the uncertainty source, all within a single coherent framework. In this study, REGP was applied to portfolio selection, a context well-suited due to its inherent multi-criteria nature and the presence of uncertainties. REGP can address both dimensions effectively, offering advantages over the classic MV framework, which cannot handle uncertainties or diverse investor preferences. However, the REGP framework has potential applications in other contexts as a versatile tool for robust multi-objective decision making, which is strongly recommended.

5 Conclusions

This paper has proposed an innovative framework tailored for constructing asset portfolios under conditions of data uncertainty. Comprising two distinct phases, the proposed framework harnesses the power of two prominent decision analytical methods of DEA and GP to achieve its objective. The initial phase employs a super-efficiency DEA model to streamline asset selection by identifying suitable investment candidates based on the efficiency characteristic of the assets. An extensive set of financial parameters was utilized as inputs and outputs of the DEA model for efficiency evaluation. This phase demonstrates its merit through a dual advantage: firstly, it mitigates the computational complexity of the Portfolio selection problem through eliminating the need to deal with an extensive pool of assets. Secondly, a thorough assessment of assets has been carried out by the end of this phase and investments with promising potential are selected. Subsequently, the second phase leverages a multi-objective mathematical model to determine the optimal asset weights within the portfolio, with each objective tailored to reflect specific financial factors spanning both technical and fundamental aspects. The multi-objective problem was then handled using the EGP methodology, affording investors the flexibility to accommodate their nuanced preferences pertaining to both financial considerations and the underlying philosophies of weighted or MinMax GP variants. Furthermore, acknowledging the intrinsic volatility of financial market parameters, the study pioneers the development of robust counterparts for the EGP models. These robust models, developed via a polyhedral and a combined interval and polyhedral uncertainty sets, offer optimal decisions which are robust against parameter inaccuracies and slight fluctuations. Remarkably, this robustness is achieved while preserving linearity and tractability. The proposed framework enables investors to encompass their unique preferences and conservatism levels, whilst optimizing portfolios amid data uncertainty.

5.1 Implications

The applicability and efficacy of the proposed approach have been illustrated through a comprehensive case study of the Tehran stock exchange. The empirical investigation yielded several implications. Firstly, the implementation of the super-efficiency DEA approach in the initial phase displayed the capacity to efficiently identify optimal investment candidates. This capability not only assists investors and portfolio managers in reducing the computational complexities and distractions in stock markets where a large number of assets are registered, but also offers insights into the underlying business strength of the firms, thus facilitating deeper analysis. Furthermore, the integration of multiple factors (called extended factors), encompassing both fundamental analysis-based metrics and traditional technical risk-return factors, within the portfolio optimization framework led to the formulation of portfolios that exhibited superior out of sample performance compared to conventional and two-factor strategies. This performance superiority extended across deterministic scenarios and situations marked by uncertainty, underscoring the strength and resilience of the proposed approach. Additionally, the flexibility of the framework was underscored by its ability to incorporate a diverse array of factors, ranging from financial to non-financial, alongside customizable preference weights in the corresponding GP model’s achievement function, goal target values, and conservatism levels. This adaptability ensures the alignment of the approach with the specific preferences and utilities of investors in curating their optimal asset portfolios. Also, the EGP variant along with its robust counterparts proved to be a promising tool in portfolio selection decisions under both deterministic and uncertain conditions. The robust EGP portfolio optimization models generally outperformed their nominal counterpart and the deterministic Markowitz models, highlighting their robustness and dependability under varying market circumstances. Moreover, a detailed comparison between the REGP model via polyhedral and combined interval and polyhedral uncertainty sets revealed intriguing insights. While the former displayed higher total deviations from set targets, it concurrently offered diversification benefits and more favorable out-of-sample risk values. This revelation underscores the importance of well-informed model selection according to the investors’ preferences. The findings illustrated that in the context of the REGP with uncertainty sets, total deviations from decision makers’ set targets are dependent on the type of the uncertainty set used. The observation regarding the disparity in total deviations from target values within the REGP models employing the two distinct uncertainty sets could offer a generalizable guideline applicable to various scenarios across different application domains where linear robust counterparts are to be employed for given linear EGP problems, involving the mentioned uncertainty sets. The analysis also leads to the conclusion that for bounded uncertainty, it is advisable to combine the uncertainty set with the interval set to prevent overly conservative solutions. Conversely, all the models possess the adaptability to fine-tune the solution along a spectrum from the worst-case scenario to the deterministic solution. This adaptability relies on the choice of adjustable parameters associated with their respective uncertainty set. Overall, the findings of the study collectively accentuate the approach's utility in addressing the intricacies of portfolio selection. The integration of DEA and GP, fortified by extended financial factors and robust optimization techniques, surpasses conventional methods. It provides a comprehensive toolkit for informed investment decision-making, tailored to address the multifaceted considerations and uncertainties inherent in contemporary financial markets.

5.2 Future research directions

For future studies, there are several directions that researchers and practitioners can consider. The EGP framework offers opportunities for investigating various priorities for the goals, as well as exploring the inclusion of other meta-objectives, like the number of goals achieved (Jones & Jimenez, 2013). Additionally, other GP variants, like meta-GP (Urı́a et al., 2002) and multi-choice GP (Chang, 2007), could be examined, accompanied by the development of their robust counterpart models via appropriate uncertainty sets. The versatility of REGP as a robust multi-objective decision-making methodology makes it highly promising for addressing data-driven problems in diverse contexts. Moreover, the combination of DEA and GP in the portfolio selection context can extend beyond the current study's scope, with potential for innovative integration methods. The scope can also be extended to the incorporation of robust DEA models, allowing the formulation of robust combined methodologies (Peykani et al., 2020). Another avenue for investigation is the expansion of GP models to include various factors like market ratios (e.g., EPS, P/E, dividends) for effective portfolio optimization. Furthermore, the proposed approach's applicability could be tested using data from different financial markets to assess its broader utility. Also, the interface between MCDM and Artificial intelligence/Machine Learning techniques, such as clustering, can be scrutinized to explore how it can enhance portfolio diversity while considering market uncertainties. Lastly, another avenue to explore involves evaluating the performance of portfolio optimization models under conditions involving cardinality constraints, transaction costs and multi-period time horizons.