1 Introduction

The global energy consumption would significantly increase within the following decades. Figure 1 depicts the energy consumption from 2020 to 2050 indicating the surge of the indicator analyzed [1, 2].

Fig. 1
figure 1

World energy demand prediction from 2020 to 2050

The first group of main energy sources, that comprises renewable energy, nuclear energy, and fossil energy, all play an important part in providing the need for energy all over the globe. Petroleum, coal, and natural gas (NG) are also the three most significant types of fossil energy. According to various investigations, the primary fuel with the most rapid growth through 2040 will be natural gas [3,4,5]. Figure 2 depicts the overall energy usage by type of fuel. Natural gas is widely recognized as the fossil fuel that has the highest levels of safety, cleanliness, and operational effectiveness [5, 6]. This is because the carbon dioxide (CO2) emission of natural gas is about 41% less than that of other fossil fuels when burned [7].

Fig. 2
figure 2

World energy demand by sources

CO2 is one of the resources which has been acknowledged as one of the key contributors to the phenomenon of global warming [8, 9]. Consequently, its removal and reuse from the streams and pollutants produced by industrial processes and the search for reliable and cost-effective absorbents have attracted considerable attention [10]. Absorbance in aqueous amine solutions is the method used substantially in industry for eliminating CO2 from gases. CO2 is absorbed by aqueous amine solutions via both physical and chemical absorption [11,12,13]. Figure 3 represents the process schematic of the amine-based CO2 capture.

Fig. 3
figure 3

The process of amine-based CO2 capture

Triethanolamine (TEA), a tertiary amine, was recognized as one of the first amines utilized considering this purpose in industrial gas treatment procedures [14]. While it has been replaced by another type of amine solution such as methyldiethanolamine (MDEA) and monoethanolamine (MEA) [15], it is still prescribed for the elimination of acid gas. Recent developments have expanded the repertoire of solvents beyond amine-based solutions, ushering in new possibilities for carbon capture technologies. Notably, amine blends, as discussed in [16], have gained attention for their potential to enhance CO2 absorption efficiency and reduce energy requirements. Ionic liquids explored comprehensively in literature [17], offer intriguing prospects due to their low volatility and tunable properties, which can be tailored to specific capture scenarios. The utilization of seawater as a solvent, as exemplified by [18], presents an eco-friendly and abundant alternative with unique challenges and advantages. These emerging solvent systems, along with others not mentioned here, constitute a dynamic frontier in CO2 capture research. Hence, a wide variety of laboratory solubility data (H2S and CO2 in aqueous ethanolamine solutions) with a variety range of temperatures, pressures, solvent compositions, and acid gas loadings are now available. This data may be used to better understand the interaction between H2S and CO2 [15, 19,20,21,22]. At higher power dissipation rates, aqueous TEA solutions can absorb CO2 under equilibrium conditions in high shear jet absorbers. Both the absorption height and the required flow rate of the solution are decreased as a result of putting solution in a high shear jet absorber. This absorber will be especially effective for the removal of acid gases with low partial pressures, as well as using this in distant fields and offshore activities [23, 24].

The solubility of CO2 in an aqueous alkanolamine solution and the equilibrium CO2 loading were both calculated using a number of different models [25, 26]. The CO2 solubility in TEA was investigated by Chung [27] using a modified version of the Kent-Eisenberg model, which represents one of the most precise methods that are currently available. In their research, a modified Kent–Eisenberg model successfully matched the experimental equilibrium loading (solubility)/partial pressure pairs at different temperatures and amine concentration levels. The average absolute relative deviation (AARD) for this model was 18.9%, and it included a maximum of 163 data sets. Fouad et al. [26] also compared experimental data of TEA at 50, 75, and 100 °C. Their findings indicated that the average absolute fitting error ranges between 46.1% and 47.8%. For the purpose of offering an alternate solution method for modeling engineering processes and forecasting the variable of interest, a number of intelligent approaches have been used [13, 28,29,30]. Yarveicy et al. [31] designed the extra tree (ET) algorithm in order to anticipate the capacity for CO2 loading. The developed model could predict all the data of TEA (29 data points) with an R2 of 0.993. The AdaBoost classification and regression trees (AdaBoost-CART) was used by Ghiasi et al. [32] to simulate the CO2 loading for MEA, Diethanolamine (DEA), and TEA. Their investigation of CO2 solubility in TEA included 63 data points with an AARD of 1.41%. In addition, the effects of reaction temperature, CO2 partial pressure, and the concentration of amine on the CO2 absorption performance of MEA, DEA, and TEA were investigated by using adaptive neuro-fuzzy inference system (ANFIS) by Ghiasi et al. [28]. Unexpectedly, it was discovered that the predominant experimental condition differed for different amins. In particular, the relative effect of inputs on the CO2 loading was temperature > CO2 partial pressure > concentration of amines for MEA and TEA, but for DEA, the relative effect was adjusted to CO2 partial pressure > reaction temperature > concentration of amines. This was the case because DEA reacts more slowly than MEA and TEA. This study provided useful implications on the variety of amine design, but it has the potential to produce error as a result of insufficient amount of experimental data.

To the best of the authors' knowledge, there are no published white-box correlations for CO2 loading capacity in amine-based solutions. Our present study focuses on TEA-based systems and development of interpretable models using advanced white-box approaches, it is paramount to recognize and appreciate the growing diversity of solvent options, each contributing to the overarching goal of mitigating CO2 emissions. Hence, the purpose of this work is to assess the capacity of robust correlations to predict the equilibrium absorption of CO2 in TEA aqueous solutions. To this aim, experimental data of equilibrium absorption of CO2 in TEA aqueous solutions are gathered from the published literatures [19, 27, 33]. To this end, temperature, CO2 partial pressure, and amine concentration were regarded as input variables and CO2 loading was the output. Three famous robust correlative algorithms, namely genetic programming (GP), gene expression programming (GEP), and group method of data handling (GMDH) are used to estimate CO2 loading in an aqueous system containing TEA. In the following section, we will present the summary of our collected databank and the pre-processing of the dataset used. Furthermore, Sect. 3 provides a detailed explanation of the development of intelligent white-box algorithms. In Sect. 4, we represent various error analyses to evaluate the models’ performance, statistically. Besides, Sect. 5 provides the equations for predicting CO2 loading using GP, GEP, and GMDH techniques, and also gives a comprehensive graphical and statistical assessment of these models.

2 Data gathering and preparation

In order to construct comprehensive correlations, a large database was assembled from literature sources. Experimental values for CO2 absorption in TEA aqueous solutions were gathered from [19, 27, 40, 41]. The notable benefit of the GP method in comparison to other soft computing approaches, is that the GP paradigm prepares white-box techniques which are interpretable by scientists and engineers, readily [42].

In GP structure, to generate chromosome to be operated on a dataset, an initial population of haphazard functions is created [43]. Next, the network's framework is generated simultaneously with tuning the parameters during computation processes. These chromosomes make the next population which takes over for the following generation. These iterations are repeated until a stop** criterion is satisfied [44]. A schematic flowchart of the GP algorithm is shown in Fig. 7.

Fig. 7
figure 7

Flowchart of the GP technique

3.2 Gene expression programming

Gene Expression Programming (GEP) which was proposed by Ferreira in 2001 [

Fig. 8
figure 8

Flowchart of the GEP algorithm

3.3 Group method of data handling

The first version of Group Method of Data Handling or GMDH algorithm was introduced by Ivakhnenko in the 1960s [52]. GMDH tries to solve different problems, mathematically using a set of spectrums of polynomial procedures. This data-driven algorithm can overcome the complexity and non-linearity of the networks as it permits producing precise and explicit correlations between inputs and output variables [53]. GMDH also known as polynomial neural network (PNN) consists of a group of inductive paradigms and can be used in various fields such as optimization, data mining, pattern recognition, modeling, and prediction [54]. By applying this heuristic technique, a system can be presented as a group of neurons in which different neuron couples in every layer are linked through a quadratic polynomial, and thus generate new neurons in the next layer [55]. These layers and relevant neurons provide the linking of input variables to the desired output. Figure 9 depicts a schematic flowchart of the GMDH paradigm, and Fig. 10 shows the scheme of the GMDH applied in this paper. Possessing a self-organizing nature and smooth accessibility for the users are two remarkable benefits of the GMDH method [56]. The output value concluded by the primary GMDH method is calculated as [57]:

$$y = a_{0} + \mathop \sum \limits_{i = 1}^{N} a_{i} x_{i} + \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} a_{i,j} x_{i} x_{j} + \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} \mathop \sum \limits_{k = 1}^{N} a_{i,j,k} x_{i} x_{j} x_{k} + \cdots$$
(1)

where, \({x}_{i,j,k,\dots }\) show the input vectors, \({a}_{0,i,j,k,\dots }\) are the polynomial coefficients, and N denotes the number of input variables. Therefore, the quadratic polynomial functions are performed for mixing the neurons in the previous layer in order to generate new variables using the following equation:

$$P_{i}^{GMDH} = a_{0} + a_{1} x_{i} + a_{2} x_{j} + a_{3} x_{i} x_{j} + a_{4} x_{i}^{2} + a_{5} x_{j}^{2}$$
(2)
Fig. 9
figure 9

Flowchart of the GMDH technique

Fig. 10
figure 10

Flowchart of the developed GMDH in this study

Eventually, the best combination of the two independent variables is recognized according to Eq. 3.

$$\delta_{j}^{2} = \mathop \sum \limits_{{i = N_{t} + 1}}^{N} \left( {y_{i} - P_{i}^{GMDH} } \right)^{2} < \varepsilon ,\quad j = 1 , 2 , \ldots , \left( {\begin{array}{*{20}c} N \\ 2 \\ \end{array} } \right)$$
(3)

In the above formula, \({N}_{t}\) stands for the number of training data. Hence, the subsequent independent variable will be saved if the prementioned stop** condition is reached [58].