1 Introduction

The goal in regression is to predict one or more variables \(\varvec{\textbf{y}}(k)=(y_{1}(k),\ldots ,y_{m}(k))^T\) from the information provided by measurements \(\varvec{\textbf{x}}(k)=(x_{1}(k),\ldots ,x_{p}(k))^T\) for a given sample k. Customary, \(\varvec{\textbf{y}}(k)\) are refereed as targets, outputs, or dependent variables, while \(\varvec{\textbf{x}}(k)\) are commonly refereed as predictors, inputs, covariates, regressors, or independent variables. Regression models covers several application areas, as economic growth problems [1, 2], air quality prediction [3, 4], medicine [5, 6], chemical industries [7, 8], and industrial processes [9, 10]. Recent studies show that regression models have become predominant in increasingly complex real-world systems due to the large availability of data, inclusion of nonlinear parameters, and other aspects intrinsic to the application area. For complex systems, traditional machine learning techniques (i.e., non-deep/shallow techniques) may become limited due to two main challenges: big-data explosion (high-dimensionality, high number of observations) and increase in complexity, caused by dynamics of nowadays applications. Deep learning (DL) methods have gained prominence in recent years due to their ability to represent systems in complex structures with multiple levels of abstraction and high-level features. However, DL may have limitations such as dependence on a high number of samples, hyperparameter sensitivity, and interpretability issues [11, 12], which limits the application in critical systems or one that requires accountability in its results. In this sense, deep fuzzy systems (DFS) have emerged as a viable method over DL to balance accuracy and interpretability in complex real-world systems.

Regression models can be categorized as [13]: (i) “white-box” when the input–output map** is built upon first-principle equations, (ii) “black-box” when the map** is derived from the data (also referred to as data-driven modeling), or (iii) “grey-box” when the knowledge about the input–output map** is known beforehand and integrated along with data-driven modeling. White-box models are advantageous for promoting interpretations of the internal mechanisms associated with input–output map**. On the other hand, black-box models can address complex systems with predictive analysis without prior knowledge of the system. Grey-box models can combine the interpretability presented by “white-box” and the ability to learn from data given by “black-box” models (e.g., fuzzy systems). Regarding data-driven modeling, the dependency between input and output can be built by linear or nonlinear models. A regression model is linear when the rate of change between input–output is constant due to the linear combination of the inputs. Examples include the multivariate linear regression models. Data-driven nonlinear regression is adopted when the input–output dependence is nonlinear and can not be covered by linear modeling. There is a plethora of methods for nonlinear regression, and its applicability is problem-dependent. Examples include fuzzy systems, support vector regression, artificial neural networks (e.g., non-deep/shallow and deep networks), and rule-based regression (e.g., decision trees and random forest).

Following are brief discussions of recent surveys and reviews that have showcased the applicability of DL to various regression application domains. The work of Han et al. [14] reviews DL models for time-series forecasting, where the DL models are categorized as: (i) discriminative, where the learning stage is based on the conditional probability of the output/target given an observation, (ii) generative, which learn the joint probability of both output and observation, with the generation of random instances), or (iii) hybrid, a combination of generative and discriminative DL methods. The authors demonstrated that DL models are effective at discriminating complex patterns in time series with high-dimensional data by implementing them in benchmark systems and using a real-world use case from the steel industry. Sun et al. [11] discuss the use of DL for soft sensor applications, showing the trends, the scope of applications in industrial processes, and the best practices for model development. The authors established some directions for future research, such as working on DL solutions to overcome the limitation in learning in scenarios with a lack of labeled samples (e.g., semi-supervised methods), hyperparameter optimization, solutions to improve model reliability (e.g., model visualization), and the development of DL methods with distributed and parallel modeling. Torres et al. [15] survey DL architectures for time-series forecasting. Furthermore, the authors discuss the practical aspects that must be considered when using DL methods to solve complex real-world problems in big-data settings, which include the existing libraries/toolbox, the techniques for automatic optimization of model structures, and the hardware infrastructure. Pang et al. [16] and Chalapathy et al. [29] and [30] investigated some recent trends in DFS models and real-world applications (e.g., time series forecasting, natural language processing, traffic control, and automatic control). However, despite the benefit of adopting FLS principles to DL systems, no comprehensive survey or review has been conducted focusing exclusively on deep fuzzy for regression problems.

This paper surveys and discusses the state-of-the-art on deep fuzzy techniques developed to deal with a diverse range of regression applications. Initially, Sect. 2 presents fundamental concepts about FLS and XAI. Then, an overview of DL techniques commonly used for regression will be presented in Sect. 3, namely Convolutional Neural Networks, Deep Belief Networks, Multilayer Autoencoders, and Recurrent Neural Networks. Next, Sect. 4 shows the literature on deep fuzzy systems in two ways: (i) standard deep fuzzy systems, based on fundamental FLS; and (ii) hybrid deep fuzzy systems, with the combination of FLS and the conventional deep models discussed in Sect. 3. Finally, Sect. 5 presents general discussions based on the state-of-the-art surveyed.

2 Background

2.1 Fuzzy Logic Systems

Developed initially by Lotfi A. Zadeh [31], FLS are rule-based systems composed mainly of an antecedent part, characterized by an “IF” statement, and a consequent part, characterized by a “THEN” statement, allowing the transformation of a human knowledge base into mathematical formulations, thus introducing the concept of linguistic terms related to the membership degree.

The basic configuration of a fuzzy logic system, shown in Fig. 1, depends on an interface that transforms the real input variables into fuzzy sets (fuzzifier), which are interpreted by a fuzzy inference model to perform an input–output map** based on fuzzy rules. Thus, the mapped fuzzy outputs go through an interface that transforms them into real output variables (defuzzifier) [32, 33].

Fig. 1
figure 1

Basic configuration of fuzzy logic systems

Some well-known fuzzy systems are Mamdani fuzzy systems [34], Takagi-Sugeno (T-S) fuzzy systems [35], and Angelov-Yager’s (AnYa) fuzzy rule-based systems using data clouds [36]. Among these, the T-S fuzzy models stand out for their ability to decompose nonlinear systems into a set of linear local models smoothly connected by fuzzy membership functions [37]. T-S fuzzy models are universal approximators capable of approximating any continuous nonlinear system, that can be described by the following fuzzy rules [38]:

$$\begin{aligned} R^{i}:{\textbf {IF}} ~ x_1(k)~\text {is}~F_{1}^{i}~\text {and}~\ldots ~\text {and}~x_p(k)~\text {is}~F_{p}^{i} \nonumber \\ {\textbf {THEN}} ~ \varvec{\textbf{y}}^i(k)=\varvec{\textbf{f}}^{\,i}(x_{1}(k),\ldots ,x_{p}(k)), \end{aligned}$$
(1)

where \(R^{i}\) (\(i=1,\ldots ,N\)) represents the i-th fuzzy rule, N is the number of rules, \(x_1(k), \ldots ,x_p(k)\) are the input variables of the T-S fuzzy system, \(F^{i}_{j}\) are the linguistic terms characterized by fuzzy membership functions \(\mu ^{i}_{F^{i}_{j}}\), and \(\varvec{\textbf{f}}(x_{1}(k),\ldots ,x_{p}(k))\) represents the function model of the system of the i-th fuzzy rule [39].

2.2 Explainable Artificial Intelligence

Although the explainable artificial intelligence (XAI) concept is often associated with a homonymous program formulated by a group of researchers from the Defense Advanced Research Projects Agency (DARPA) [40], the principles related to explainability gained strength from the 1970 s onwards. The earliest works presented rule-based structures and decision trees with human-oriented explanations, such as the MYCIN system proposed in [41] developed for infectious disease therapy consultation, the tutoring program GUIDON proposed in [42] based on natural language studies, among numerous other systems [43,44,45,46]. Although many authors commonly use the terms “explainability” and “interpretability” as synonyms, Rudin [47] discusses the problem of using purely explainable methods to only provide explanations from the results obtained in black-box models (post hoc analysis), demystifying the importance of develo** inherently interpretable methodologies with causality relationships that are understandable to human users.

3 Deep Regression Overview

Deep neural networks (DNN) have emerged due to their architecture with multiple levels of representation and their remarkable performance in a variety of tasks [48]. This section presents the DL techniques commonly employed in regression problems to provide a better understanding and context for the literature review of deep fuzzy regression performed in Sect. 4.

3.1 Convolutional Neural Networks

Convolutional neural networks (CNNs or ConvNets) are feedforward neural networks with a grid-like topology that are used for applications such as time-series data processing in 1-D grids and image data processing in 2-D pixel grids [49]. Figure 2 presents a CNN architecture for processing time-series data in 1-D grids.

Fig. 2
figure 2

Convolutional neural network, with 1-D architecture

The “neocognitron” model proposed in [50] is frequently referred to as the inspiration model for what is currently known about CNNs. First proposed in [51], the neocognitron aimed to represent simple and complex cells from the visual cortex of animals which present a mechanism capable of detecting light in receptive fields [52].

CNNs use a math operation on two functions called “convolution”, with the first function referred to as input, the second function as kernel, and the convolution’s output as the feature map [49]. Outputs from the convolution layer go through a pooling layer (downsampling), which performs an overall statistic of the adjacent outputs by reducing the size of the data and associated parameters via weight sharing [11, 53]. After the data is processed along with the layers that alternate between convolution and pooling, the final feature maps go through a fully connected (dense) layer to extract high-level features. For regression problems, the extracted features can be combined in a prediction mechanism with an activation function or a supervised learning model (e.g., support vector regression) to estimate the final output [54, 55].

CNNs in the context of regression have been explored in several domains, for example, traffic flow forecasting of real road networks [56, 57], prediction of natural environmental factors using data from meteorological institutes [58,59,60,61], industrial process optimization [62,63,64], electric load forecasting [65,66,67,68,69], and chemical process analysis [70, \(\varvec{\textbf{W}} = \{\varvec{\textbf{w}}_{1},\varvec{\textbf{w}}_{2},\ldots ,\varvec{\textbf{w}}_{L}\}\) between layers, are fine-tuned by a supervised method (e.g., backpropagation algorithm) [96].

Recent studies using AEs within a deep architecture for regression have considered soft sensing applications for quality variable prediction [97,98,99,100,101]. Other cases of applications in industrial processes include CNC turning machines [102, 103], end-point quality prediction [104, 105], and prognostics [106,107,108,109]. Other works were developed for time-series forecasting applications, such as natural environmental factors [110,179], which performs the construction of fuzzy rules from a small-scale observation set (data pairs) with input–output map**. Finally, a random local loop optimization strategy is performed to remove feature combinations and corresponding subsystems with low correlation to achieve fast convergence [178]. The performance of RLODFS on the prediction of 12 real-world datasets from the UCI repositoryFootnote 1 was compared with other methods such as DBN, LSTM, and generalized regression neural network (GRNN). From the authors’ perspective, RLODFS has good interpretability due to its structure, clear physical meaning of its parameters, and the ease of locating fuzzy rules that may fail for future optimizations. However, there is a lack of transparency in using input sharing strategies, which increases the method complexity. Furthermore, the selection of the number of features per fuzzy subsystem needs to be carefully done, whose manual/arbitrary choice may not reflect the real needs of the case study under analysis.

Fig. 9
figure 9

The structure of RLODFS based on grou** and input sharing, adapted from [178]

The study in [180] proposes a stacked structure composed of double-input rule modules and interval type-2 fuzzy models, abbreviated as IT2DIRM-DFM. The proposed model is illustrated in Fig. 10, where four layers are presented: the input layer, which deals directly with the original input data, grouped two-by-two in each rule module; the stacked layer, where the signals coming from the input layer become the inputs of the first layer, whose output becomes the input of the second layer, and so on; the dimension reduction layer, where the width of the IT2DIRM-DFM is hierarchically reduced until it becomes two; and the output layer, where the latest IT2DIRM-DFM model produces the final forecasting results. Still, the authors discuss the interpretability of the model showing layered learning and fuzzy rules composed of only two variables in the antecedent, partitioned into interval type-2 second-order rule partitions. The resulting model was evaluated in two real-world applications, using subway passenger data from Buenos Aires, Argentina, and traffic flow data from California Highway System. The results allowed to verify the interpretability from consistent readability of which partitions and bounds of the inputs are used in each fired rule from each rule module. However, the proposed method is only interpretable locally in each rule module and not globally, whose structure does not reflect the depth or number of layers needed to address the experiments. Furthermore, it is not clear the motivations for considering a function, denoted as \(\text {f}\), in each layer related to the worst-performing module (this is shown at the bottom of Fig. 10).

Fig. 10
figure 10

The structure of the double-input rule modules stacked deep interval type-2 fuzzy model (IT2DIRM-DFM), adapted from [180]

Another work that explores double-input rule modules within a stacked deep fuzzy model (in a hierarchical way) was proposed in [181] using datasets related to photovoltaic power plants from Belgium and China. The authors investigate the interpretability of the resulting model, called DIRM-DFM, with conclusions similar to [180], mainly regarding the composition of fuzzy rules. However, in DIRM-DFM, they promote more transparency and simplicity. Some other studies deal with interval type-2 fuzzy models for deep learning in regression problems. In [182], a novel dynamic fractional-order deep learned type-2 FLS was proposed and constructed using singular value decomposition and uncertainty bounds type-reduction. The resulting model was implemented with two chaotic benchmark system simulations, a simulation for the prediction of the glucose level of type-1 diabetes patients, and a dataset of a heat transfer system with an experimental setup. In addition to determining the limit values of the input data (upper and lower singular values), the authors used stability criteria of fractional-order systems, allowing to reduce the necessary number of fuzzy rules and reduce the complexity of nonlinear systems. An evolving recurrent interval type-2 intuitionistic fuzzy neural network (FNN) was proposed in [183], and it was evaluated using regression datasets from the KEEL repository,Footnote 2 Mackey–Glass time series, and a simulated second-order time-varying system. Intuitionistic evaluation, fire strength of membership degree and strategies for adding and removing fuzzy rules were considered to improve uncertainty modeling. Both studies in [182] and [183] did not present an analysis of the interpretability of their models.

Methods that use multiple neuro-fuzzy systems in a deep hierarchical structure were proposed in [184] and [185]. The work in [184] proposed a deep model that cascades multiple neuro-fuzzy systems modified as multivariable generalized additive models, with application in real ecological time series, the Darwin sea level pressure. The resulting model manages to locally detail the mechanisms of each neuro-fuzzy system in each layer. However, it presents incomplete discussions and results regarding the increase in the network depth and the influence of the inputs and partial outputs of the layers on the final prediction. In [185], a hybrid cascade neuro-fuzzy network was proposed, which is composed of multiple extended neo-fuzzy neurons with adaptive training designated for online non-stationary data stream handling. Each layer has a generalization node that performs a weighted linear combination to obtain an optimal output signal. The experimental results in electrical loads prediction, using Southern Ukraine’s data from 2012, show the authors’ search for better accuracy, although at the cost of increasing membership functions to cover the input space and increasing adjusted parameters (weights).

The work in [186] proposed a deep learning recurrent type-3 fuzzy system applied for modeling renewable energies (i.e., power generation of a 660kW wind turbine and solar radiation generated by sunlight simulator). The proposed methodology showed good performance compared to other methods, such as multilayer perceptron, type-1 FLS, type-2 FLS, and interval type-3 FLS, despite the lack of a more elaborate discussion of the presented results. Furthermore, transparency is not guaranteed regarding the modeling steps and the influence of various parameters optimized during learning. The authors in [187] proposed a self-organizing FNN with incremental deep pre-training, abbreviated as IDPT-SOFNN, to promote efficient feature extraction and dynamic adaptation in the structure according to error-reduction rate. IDPT-SOFNN was implemented for the prediction of Mackey–Glass time series, total phosphorus concentration in wastewater treatment plant, and air pollutant concentration. In [188], a deep fuzzy cognitive map was proposed for multivariable time-series forecasting applications, such as air quality indexes, traffic speed of six road segments in China, and two benchmark datasets from the UCI repository, the electric power consumption and the temperature of a monitor system. The analysis of the model’s interpretability was based on nonlinear and nonmonotonic influences of unknown exogenous factors. The work in [189] proposed a deep FNN composed of an input layer, four hidden layers (membership functions, T-norm operation, linear regression, and aggregation), and an output layer, designed exclusively for intra- and inter-fractional variational prediction for multiple patients’ breathing motion.

Table 1 summarizes the literature on Standard DFS. They are categorized according to the application domain and regarding its interpretability categories. The survey shows that the application domain is vast, with applications in the domain of industrial systems, power systems, traffic systems, and multivariable benchmark systems.

Table 1 State-of-the-art on methods with standard deep fuzzy systems for regression problems. XAI: explainable artificial intelligence; Disc.: discussion by the authors (Yes/No); Und.: how understandable is the model, whether it is transparent (T) or opaque (O); Scope: in post hoc scope, if the model promotes local explanations (L), global explanations (G) or visual explanations (V)

Table 1 shows that only four out of eleven works discuss interpretability, despite most of their proposed methods being transparent. Also, not all methods provide post hoc explanations, and of these, the local scope is more frequent. The Standard DFS architecture is easy to implement, with flexibility in the construction of fuzzy rules, having a similar structure to feedforward neural networks. However, these methods may suffer from loss of interpretability when using a non-intuitive hierarchical structure, the lack of investigation of interconnections between the variables involved, and the limitation of fuzzy rules that can disturb the estimation by not covering all operating regions (coverage of input data space) [190].

4.1.2 Hybrid Deep Fuzzy Systems

The Hybrid DFSs are discussed in this section. The selection and order of works follow the same principles used for Standard DFS. The common DL architectures used in combination with fuzzy systems are the Deep Belief Networks, Autoencoders, Long Short-Term Memory networks, and Echo State Networks.

In [191], it is proposed a sparse Deep Belief Network (SDBN) with FNN for nonlinear system modeling (benchmark) and total phosphorus prediction in wastewater treatment plant. The SDBN is considered for unsupervised learning and pre-training to perform fast weight-initialization and improve modeling robustness. The FNN is used as supervised learning to reduce layer-by-layer complexity. As shown in Fig. 11, the structure of the DBN resembles the structure shown in Fig. 3, except for considering additional constraints (sparsity) used to penalize fluctuations of values along the hidden neurons. The proposed method performed better compared to other similar methods, such as transfer learning-based growing DBN [192], DBN-based echo-state network [164], and self-organizing cascade neural network [193]. However, the authors noticed various fluctuations in the assignment of hyperparameters, which can compromise the stability of the proposed model, making it necessary to dynamically and robustly improve its structure to these fluctuations. In terms of interpretability, the proposed model guarantees a moderate number of membership functions and rules, allowing good consistency and readability of what happens within the FNN structure. The same cannot be said for the DBN structure, which is not intuitive in sparse representation to decide which features are more valuable than others.

Fig. 11
figure 11

Examples of FLS application with conventional deep models: a with deep belief networks, adapted from [191]; and b with denoising autoencoders, adapted from [194]

The authors in [194] proposed a novel robust deep neural network (RDNN) for regression problems involving nonlinear systems, with the implementation of three strategies: a fuzzy denoising autoencoder (FDA) as a base-building unit for RDNN, improving the ability to represent uncertainties; a compact parameter strategy (CPS), designed to reconstruct the parameters of the FDA, reducing unnecessary learning parameters; and an adaptive backpropagation (ABP) algorithm to update RDNN parameters with fast convergence. As shown in Fig. 11, FDA has an input layer, a fuzzy layer (built by Gaussian membership functions), and an output layer, which can be divided into an encoder (“input” to “fuzzy”) and a decoder (“fuzzy” to “output”). Furthermore, at each FDA, due to the nature of this autoencoder, the data is partially corrupted by noises (represented with a “tilde” symbol) and is reconstructed using CPS (represented with a “hat” symbol) with associated parameters (e.g., encoder/decoder weights and biases) are adjusted via ABP. The resulting model was evaluated through four prediction examples: air quality from the UCI repository, wind speed from National Renewable Energy Laboratory, housing price from the KEEL repository, and water quality from a wastewater treatment plant in Bei**g, China. Regarding the interpretability of the proposed model, there was no appropriate discussion by the authors, as they focused mainly on model performance/accuracy in the presence of uncertainties. In addition to a fuzzy rule base not being defined, the number of fuzzy neurons in each FDA is manually initialized and adapts through redundancies during the reconstruction of parameters via CPS. Finally, the architecture of the proposed model is not intuitive based on the mechanisms related to the representation ability of each FDA, which would be crucial to determine the influence of input variables and learning parameters on the outputs.

Some hybrid methods using FLS and conventional deep models have been applied in the literature for traffic flow prediction. In [195], the authors developed an algorithm based on Dolphin Echolocation optimization [196], where the input features are fuzzified into membership functions to obtain chronological data, whose integration goes into the weight update process of the proposed algorithm converging to a globally optimal solution with a Deep Belief Network. The method in [195] was evaluated using datasets of traffic-major roads in Great Britain and PEMS-SF (San Francisco bay area freeways). The study in [197] combined fuzzy information granulation and a deep neural network to represent the temporal-spatial correlation of mass traffic data and be able to adapt to noisy data. A Stacked Autoencoder is used to obtain the prediction results based on processed granules that have a good capacity for interpretation, which have not been discussed by the authors. The method in [197] was evaluated using traffic flow data archived for the Portland–Vancouver Metropolitan region.

Other methods were implemented for energy forecasting, with the usual application of LSTM networks. A novel fuzzy seasonal LSTM was proposed in [198], where a fuzzy seasonality index [199] and a decomposition method were employed to solve the seasonal time-series problem in a monthly wind power output dataset from the National Development Council in Taiwan. In [200], the authors used an LSTM network with rough set theory [201] and interval type-2 fuzzy sets for short-term wind speed forecasting (dataset from Bandar-Abbas City, Iran), with the aid of mutual information approach for efficient variable input selection. A novel ultra-short-term photovoltaic power forecasting method was proposed in [202], where a T-S fuzzy model comprises a fuzzy c-means clustering algorithm and DBNs, with evaluation tests using a 433 kW photovoltaic matrix database. The studies in [198, 200] and [202] did not discuss the interpretability of their models, which have a structure with an ensemble characteristic whose fuzzy part has an affinity for good coverage of the input space and good capture of data uncertainty.

The work in [203] proposed a deep type-2 FLS (D2FLS) architecture with greedy layer-wise training for high-dimensional input data. The D2FLS model was applied to two regression datasets (prediction of the performance of British Telecom’s work area and health insurance premium) and two binary classification datasets (Santander Customer Transaction prediction and British Telecom’s customer service). The authors showed how to extract interpretable explanations related to the contribution of fuzzy rules to the final prediction developed for a two-layer D2FLS. However, the authors opted for a large number of fuzzy rules (100, in this case) that impair interpretability by increasing the complexity of the model. The authors in [204] proposed an ensemble model composed of an Echo State Network (ESN), a T-S fuzzy model, and differential evolution for time-series forecasting problems: Mackey-Glass time series, nonlinear auto-regressive moving average (NARMA) time series, and Lorenz attractor. The differential evolution method, used to optimize the weight coefficients of the model, managed to reduce the number of fuzzy rules. The interpretability of the model can be impaired due to the structure with an ensemble characteristic, which does not provide enough transparency for learning. A method based on the T-S fuzzy model and ESN was developed in [205] and tested with three benchmark examples: approximation of a nonlinear function, prediction of Henon chaotic system, and identification of a dynamic system with and without noise signal. The authors chose to balance model complexity and performance based on the parameters involved for learning (e.g., number of fuzzy rules and reservoir size), which resulted in better results in comparison with other methods (e.g., traditional ESN and hybrid fuzzy ESN), despite the lack of discussion on interpretability.

In [206], a framework based on a sparse autoencoder (SpAE) and a high-order fuzzy cognitive map (HFCM) is proposed for time-series forecasting problems (e.g., sunspots, Mackey-Glass, S&P 500 stock index and Dow-Jones industrial index). SpAE is used to extract features from the original data, and these features are via HFCM. Another study that uses SpAE with FLS was proposed in [207] and implemented in Mackey-Glass time series and Iris dataset (classification). The proposed method uses a method to reduce fuzzy rules by reducing the data dimensionality with SpAE. Both studies in [206] and [207] do not consider addressing the interpretability of the proposed models, which present good directions in data partitioning and construction of fuzzy rules but are not intuitive in feature extraction with sparse representation.

Table 2 summarizes the works presented in this section, in addition to evaluating them within the context of explainable artificial intelligence (XAI) systems.

Table 2 State-of-the-art on hybrid methods using fuzzy logic systems and conventional deep models for regression problems. XAI: explainable artificial intelligence; Disc.: discussion by the authors (Yes/No); Und.: how understandable is the model, whether it is transparent (T) or opaque (O); Scope: in post hoc scope, if the model promotes local explanations (L), global explanations (G) or visual explanations (V)

Only two works discuss the interpretability in Hybrid DFS, half of the discussed works are opaque, and only one presents post hoc explanations. This fact occurs since the combination with DNN brings a new layer of black-box to the system. However, the Hybrid DFS methods categorized as transparent show that the fuzzy component can promote good interpretability with efficient input–output map**s and the construction of rules to cover the universe of discourse for a given system. Inherently interpretable methodologies are difficult to achieve only with an ensemble of multiple methods, as seen in recent studies, in addition to the challenges of reducing the time complexity of systems, which is slightly mitigated by the reduction of fuzzy rules [208].

5 Conclusion

This study surveyed the literature on deep fuzzy systems (DFSs) for regression applications with an emphasis on interpretability. For the survey, the DFSs were categorized as (i) Standard DFS and (ii) Hybrid DFS. Regarding the interpretability, each method was categorized as to whether it is transparent or not and whether it has post hoc explanations (under the definition of [22]). Standard DFSs have been shown to promote more interpretability of their models when compared to Hybrid DFSs, according to the survey. Indeed, Standard DFSs are based on fundamental fuzzy logic systems, which are inherently interpretable, whereas Hybrid DFSs include conventional deep learning (DL) methods, which lack flexibility in promoting interpretability. In terms of applications, DFS can be flexible in its implementation either by simulation or in real-time systems, whose most recurrent applications involve time-series forecastings, such as traffic flow and energy modeling (e.g., photovoltaic and wind). Furthermore, the DFS is frequently referred to as interpretable by default, but only 5 of the 23 works surveyed here actually addressed this issue. The remaining works had a common goal: to improve prediction accuracy using their proposed methods. However, this survey presented the potential of using Standard DFS as a base for develo** accurate models while promoting interpretability since hybrid models are not straightforward.