Keywords

1 Introduction

Let \(\varvec{\mathrm {X}} _1^N=\{X_{1},\dots , X_{N}\}\), \(\varvec{\mathrm {Y}} _1^N=\{Y_{1},\dots , Y_{N}\}\) and \(\varvec{\mathrm {R}} _1^N=\{R_{1},\dots , R_{N}\}\) be three random sequences taking values in \(\mathbb {R}^{m}\), \(\mathbb {R}^{q}\) and \(\varOmega =\{1,\dots ,K\}\) respectively. Let \(\varvec{\mathrm {X}} _1^N\) be a hidden process and \(\varvec{\mathrm {Y}} _1^N\) be an observed process. We consider a switching regime model represented by the sequence of switches \(\varvec{\mathrm {R}} _1^N\). We address the smoothing problem consisting in an recursive search of the unobserved process \(\varvec{\mathrm {X}} _1^N\) and the switches sequence \(\varvec{\mathrm {R}} _1^N\), only knowing the observed sequence \(\varvec{\mathrm {Y}} _1^N\). A fast Bayesian processing can be carried out by assuming that the distribution of \((\varvec{\mathrm {X}} _1^N, \varvec{\mathrm {Y}} _1^N)\) is within the framework of hidden Gaussian Markov model. The non-linearity can be modeled by a switching regime system. Then, the idea is to approximate a non-linear non-Gaussian system by a regime switching Gaussian system. Some recent switching models have been proposed with efficient fast exact filtering schemes. These switching models called “conditionally Markov switching hidden linear models” (CMSHLM) [16] include conditionally Gaussian observed Markov switching models (CGOMSM) defined as follows:

  • \(T_1^N=(X_1^N, R_1^N,Y_1^N)\) is Markov chain;

  • \(p(r_{n+1}|x_n,r_n,y_n)=p(r_{n+1}|r_n)\) and

    $$\begin{aligned}&\left[ \begin{array}{c} X_{n+1}\\ Y_{n+1} \end{array} \right] =\left[ \begin{array}{cc} A_{n+1}^{xx}(R_{n}^{n+1})&{}A_{n+1}^{xy}(R_{n}^{n+1})\\ 0&{}A_{n+1}^{yy}(R_{n}^{n+1}) \end{array}\right] \left[ \begin{array}{c} X_{n}\\ Y_{n} \end{array} \right] \\&\quad \quad \quad \quad + \left[ \begin{array}{cc} B_{n+1}^{xx}(R_{n}^{n+1})&{}B_{n+1}^{xy}(R_{n}^{n+1})\\ B_{n+1}^{yx}(R_{n}^{n+1})&{}B_{n+1}^{yy}(R_{n}^{n+1}) \end{array}\right] \left[ \begin{array}{c} U_{n+1}\\ V_{n+1} \end{array} \right] +\left[ \begin{array}{c} N^X(R_{n}^{n+1})\\ N^Y(R_{n}^{n+1}) \end{array} \right] , \end{aligned}$$

with

$$\begin{aligned} \left[ \begin{array}{c} N^X(R_{n}^{n+1})\\ N^Y(R_{n}^{n+1}) \end{array} \right] =\left[ \begin{array}{c} M^X(R_{n+1})-A_{n+1}^{xx}(R_{n}^{n+1})M^X(R_{n})-A_{n+1}^{xy}(R_{n}^{n+1})M^Y(R_n)\\ M^Y(R_{n+1})-A_{n+1}^{yy}(R_{n}^{n+1})M^Y(R_{n}) \end{array} \right] , \end{aligned}$$

where \(U_1^N\) and \(V_1^N\) are two Gaussian unit-variance white noise vectors, \(M^X(R_{n})\) and \(M^Y(R_{n})\) are respective means of \(\varvec{\mathrm {X}} _1^N\) and \(\varvec{\mathrm {Y}} _1^N\) in each state (independently from n). The CGOMSM is then defined by matrices \(A(\varvec{\mathrm {R}} _n^{n+1})\), \(B(\varvec{\mathrm {R}} _n^{n+1})\), transition matrix denoted as t such that \(t(i,j)=p(r_{n+1}=j|r_n=i)\) and mean vectors \(M(R_{n})=[M^X(R_{n}); M^Y(R_{n})]\). Figure 1 depicts the dependence graph of the CGOMSM model.

Fig. 1.
figure 1

Directed graph representing dependencies between random sequences \(\varvec{\mathrm {X}} _1^N\) \(\varvec{\mathrm {Y}} _1^N\) and \(\varvec{\mathrm {R}} _1^N\). Circles represent continuous process and diamond represents discrete process.

We assume that \(\varvec{\mathrm {R}} _1^N\) takes its values in a discrete finite set of K switches \(\varOmega =\{1,...,K\}\). This hard jumps model has been widely used in several contexts dealing with switching regime Markov systems. Its success comes from its ability to represent non-linear dynamic patterns which is an inherent property in several applications (analysis of economic and finance time series [11], sustainable energy [6], robotics [7, 8, 12], etc.).

However, this model does not take into account the intrinsic imprecision of the switches in real-world applications. In fact, hard jumps induce discontinuity in the dynamic behavior of the studied system. This transitory imprecision can be handled with fuzzy modeling which consists in allowing each switch to take its value as a mixture of many components simultaneously. Fuzzy modeling has been widely incorporated in several applications dealing with Markov models [13,14,15]. In this paper, we present a new method to approximate non-linear Markov systems using a new variant of CGOMSM using fuzzy switches (hereafter called CGOMFSM).

The remaining of this paper is organized as follows. In the second section, we detail the formulation of fuzzy switching model. The third section describes the adaptation of CGOMSM algorithms for parameters estimation and for posterior marginal probabilities computation the fuzzy counterpart. The fourth section presents experimental results, and the last one draws conclusions and future work.

2 Fuzzy Switching Model with K Hard Classes

2.1 Probability Distribution of Fuzzy Vectors

In the fuzzy switches system which extends the hard case with K classes \(\varOmega =\{1,...,K\}\), we assume that each jump \(r_n^K\) is a vector in \([0,1]^K\). So \(r_n^K=(\varepsilon _1,...,\varepsilon _K)\), and each component \(\varepsilon _k\) can be seen as “fuzzy part” of the hard class k. Therefore, we have \(\sum \limits _{k=1}^{K}{\varepsilon _k}=1\). This is an extension of the hard case as each \(\varepsilon =(\varepsilon _1,...,\varepsilon _K)\) of the form \(\varepsilon =(0,...,0,\varepsilon _k=1,0,...,0)\) is assimilated to the hard class k. The distribution of \(R_n^K\) is then a probability distribution on \(F^K\subset [0,1]^K\), with \(F^K\) the set of \(\varepsilon =(\varepsilon _1,...,\varepsilon _K)\) verifying \(\sum \limits _{k=1}^{K}{\varepsilon _k}=1\). Such a probability distribution can be defined in different manners; we propose to adopt the following one. Let us consider \(\delta _0\) Dirac mass on 0 and \(\delta _1\) Dirac mass on 1, and let \(\mu \) be Lebesgue measure on ]0, 1[. Let us note \(\nu =\delta _0+\delta _1+\mu \). Since \(\varepsilon _K=1-(\varepsilon _1+...+\varepsilon _{K-1})\), it is sufficient to define the distribution of the random vector \(R_n^{K-1}=(R_n^1,...,E_n^{K-1})\) on \(F_{K-1}\subset [0,1]^{K-1}\) whose elements verify \(\varepsilon _1+...+\varepsilon _{K-1}\le 1\). Let us define this distribution by a density f with respect to the measure \(\nu ^{\otimes (K-1)}\). Thus, we have:

$$\begin{aligned} P_{R^{K-1}}=f\nu ^{\otimes (K-1)}. \end{aligned}$$
(1)

The case \(K=2\) will be dealt with in next sections, in the example below we specify the case \(K=3\).

Example

Let \(K=3\). Here the distribution on \(F_2\subset [0,1]^2\) (whose elements verify \(\varepsilon _1+\varepsilon _2\le 1\)) is \(P_{E^2}=f\nu ^{\otimes 2}=f(\delta _0+\delta _1+\mu )^{\otimes 2}=f(\delta _{00}+\delta _{01}+\delta _{10}+\delta _{11}+\delta _0\otimes \mu + \mu \otimes \delta _0+\delta _1\otimes \mu + \mu \otimes \delta _1+\mu \otimes \mu )\).

However, we recall that \(f(\varepsilon _1, \varepsilon _2)=0\), for \((\varepsilon _1, \varepsilon _2)\in [0,1]^2\) such that \(\varepsilon _1+\varepsilon _2>1\). The sets \(F^3\subset [0,1]^3\) and \(F_2\subset [0,1]^2\) are illustrated in Fig. 2.

Fig. 2.
figure 2

For three hard classes, domain \(F^3\) is the triangle DB, while domain \(F_2\) is the triangle ABC.

2.2 Joint Densities

In Markov context considered in this paper, we have to define joint densities of the distributions of \((R_n, R_{n+1})\) defined on \(F_{K-1}\times F_{K-1}\). We will consider f of the form:

$$\begin{aligned} f(r_n, r_{n+1}) = {\left\{ \begin{array}{ll} \alpha _{ij} &{}\text{ if } \text{ both } \text{ switches } \text{ are } \text{ hard, } \\ \beta \phi (r_n,r_{n+1})+\theta &{} \text{ otherwise. }\end{array}\right. } \end{aligned}$$
(2)

We choose \(\phi (r_n,r_{n+1})= \left( 1-\frac{\Vert { r_{n+1} - r_n }\Vert }{\sqrt{2}}\right) ^r,~r\in \mathbb {R}\) with \( \Vert {r_{n+1}-r_n}\Vert \) is the distance between two consecutive switches given by the quadratic norm. Then:

$$\begin{aligned} \phi (r_n,r_{n+1})= \left( 1- \left( \frac{\sum \limits _{i=1}^K{(\varepsilon _n^i-\varepsilon _{n+1}^i})^2}{2}\right) ^{\frac{1}{2}}\right) ^r,~r\in \mathbb {R}. \end{aligned}$$
(3)

Example

When the number of hard switches equals 3, the expression of normalization condition gives:

$$\begin{aligned}&\int \limits _0^1\int \limits _0^1f(u,v)(\delta _0+\delta _1+\mu )^{\otimes 2}(u,v)=\nonumber \\&\alpha _{00}+\alpha _{01}+\alpha _{10}+\alpha _{11}+\beta \left[ \int \limits _0^1{\phi (0,v)dv}+\int \limits _0^1{\phi (u,0)}du+\int \limits _0^1\int \limits _0^1{\phi (u,v)dudv}\right] +\theta =1, \end{aligned}$$
(4)

with \(\phi (u,v)=\left( 1-|u-v|\right) ^r\).

2.3 Parameters Interpolation

The model matrices of the fuzzy switching model can be calculated by linear interpolation using the following formula:

$$\begin{aligned} A({\varepsilon _1, \varepsilon _2})= & {} \sum \limits _{1\le i,j\le K}{\varepsilon _1^i\varepsilon _2^jA({i,j}}),\\ B({\varepsilon _1, \varepsilon _2})= & {} \sum \limits _{1\le i,j\le K}{\varepsilon _1^i\varepsilon _2^jB({i,j}}),\\ M({\varepsilon _1, \varepsilon _2})= & {} \sum \limits _{1\le i,j\le K}{\varepsilon _1^i\varepsilon _2^jM({i,j}}), \end{aligned}$$

where A(ij) and B(ij) are the model matrices corresponding to the hard components i and j. M(ij) is the mean vector for hard switches i and j.

The implementation of the fuzzy switching model can be performed by an adequate quantification of the interval [0, 1] into F discrete fuzzy levels. The larger F is, the more accurate the representation of data would be. However, choosing a large number of fuzzy levels will lead to high computation time. For example, when the number of crisp components equals three, setting \(F=3\) yields 15 switches and setting \(F=4\) gives 21 switches.

3 Fuzzy Switching Model with Two Hard Components

In this remaining of the paper, we consider the case of two hard switches \(\varOmega =\{0,1\}\). To model fuzzy switches, we consider that each random variable \(R_{n}\) in \(\varvec{\mathrm {R}} _1^N\) takes its values in the continuous interval [0, 1], instead of the set \(\{0,1\}\).

Let us denote the pair \((\varepsilon _n^0, \varepsilon _n^1)\in [0,1]\), in which \(\varepsilon _n^i\) represents the contribution of the hard component i to the switch \(r_n\). Without loss of generality, let \(\varepsilon _n=\varepsilon _n^1 = 1- \varepsilon _n^0\). Then we have \(R_{n}= \varepsilon _n\):

  • \(\varepsilon _n = 0 \) if the switch is the hard component 0.

  • \(\varepsilon _n \in \, ]0,1[ \) if the switch is fuzzy.

  • \(\varepsilon _n = 1 \) if the switch is the hard component 1.

So this model is able to represent signals with both discrete (hard) and continuous (fuzzy) components. Let \(\nu =\delta _0+\delta _1+\mu \).

We will assume that \(\varvec{\mathrm {R}} _1^N\) is stationary. Then \(p(r_n^{n+1})\) does not depend on n. Let us now precisely define the joint a priori density \(p(r_1^2)\), where notation \(r_1^2\) represents the pair \((r_1, r_2)\). \(p(r_1^2)\) is defined with respect to the measure product \(\nu \otimes \nu \), under normalization condition (Fig. 3).

Fig. 3.
figure 3

Density of \(p\left( r_1^2 \right) \) with respect to measure \(\nu \otimes \nu \).

Analytic computation of the joint prior densities can be worked out by quantifying the interval [0, 1] into F equal-length sub-intervals \(\left[ \frac{i}{F},\frac{i+1}{F} \right] \) as described in Fig. 4. Using this scheme, the normalization condition in Eq. (4) yields:

$$\begin{aligned}&\alpha _{00}+\alpha _{01}+\alpha _{10}+\alpha _{11}\nonumber \\ +&\beta \left[ \frac{1}{2F}\sum _{i=0}^{F-1}{\left( 1-\varepsilon _i\right) ^r }+ \frac{1}{2F}\sum _{i=0}^{F-1}{\varepsilon _i^r}+ \frac{1}{2F^2}\sum _{i=0}^{F-1}\sum _{j=0}^{F-1}{ \left( 1-|\varepsilon _i-\varepsilon _j|\right) ^r }\right] +\theta =1. \end{aligned}$$
(5)

Each sub-interval can be represented by its medium value \(\frac{2i+1}{2F}\). So, in this discrete approximate scheme, the joint a priori density can be defined by a \((2+F) \times (2+F)\) matrix.

Fig. 4.
figure 4

Subdivision of the interval [0, 1] into \(F=5\) equal-length fuzzy sub-intervals.

Under the assumptions of fuzzy switches, we can define the matrices of the incorporated model using a bi-linear function as follows:

$$\begin{aligned} \nonumber&A(\varepsilon _1,\varepsilon _2 )=[(1-\varepsilon _1 )A(0,0)+\varepsilon _1 A(1,0)](1-\varepsilon _2 ) \\&\quad \quad \quad \quad +\,[(1-\varepsilon _1 )A(0,1)+\varepsilon _1 A(1,1) ] \varepsilon _2 \end{aligned}$$
(6)
$$\begin{aligned} \nonumber&B(\varepsilon _1 ,\varepsilon _2 )=[(1-\varepsilon _1 )B(0,0)+\varepsilon _1 B(1,0)](1-\varepsilon _2 ) \\&\quad \quad \quad \quad +\,[(1-\varepsilon _1 )B(0,1)+\varepsilon _1 B(1,1)] \varepsilon _2 \end{aligned}$$
(7)

The means vectors of the fuzzy model are calculated using the following equations:

$$\begin{aligned} M(\varepsilon _i )=[(1-\varepsilon _i )M(0)+\varepsilon _i M(1)] \end{aligned}$$
(8)

Hence, the CGOMFSM is entirely defined by

  • the parameters of the corresponding deterministic hard switching model,

  • the number of fuzzy levels F, and

  • parameters \(\alpha _{ij}, i,j\in \{0,1\}\), \(\beta \), \(\theta \) and r.

The parameter r specifies the homogeneity of the switching model. The larger r is, the larger the probability of having two similar consecutive switches is.

Fig. 5.
figure 5

Trajectories of simulated CGOMFSM \((\varvec{\mathrm {X}} _1^N,\varvec{\mathrm {Y}} _1^N,\varvec{\mathrm {R}} _1^N)\).

Figure 5 represents an example of simulation of \((\varvec{\mathrm {X}} _1^N,\varvec{\mathrm {Y}} _1^N,\varvec{\mathrm {R}} _1^N)\) using the set of parameters of a fuzzy switching model defined in Table 1. Simulations were performed using the following transition matrix:

$$t= \left( \begin{array}{ccccccc} 0.99 &{}0.01 &{}0 &{}0 &{}0 &{}0 &{}0\\ 0 &{}0.99 &{}0.01 &{}0 &{}0 &{}0 &{}0\\ 0 &{}0 &{}0.99 &{}0.01 &{}0 &{}0 &{}0\\ 0 &{}0 &{}0 &{}0.99 &{}0.01 &{}0 &{}0\\ 0 &{}0 &{}0 &{}0 &{}0.99 &{}0.01 &{}0\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0.99 &{}0.01\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}1.00 \end{array} \right) .$$

In the transition matrix rows correspond to \(R_n\) and columns correspond to \(R_{n+1}\). The first and the last rows and columns represent hard switches while other rows and columns correspond to discrete fuzzy switches. This simulation shows the imprecision between hard switches 0 and 1 as illustrated in the trajectory of \(\varvec{\mathrm {Y}} _1^N\). The choice of the transition matrix allows a progressive regime switching of parameters sets corresponding to hard switch 0 to hard switch 1.

Table 1. Example of fuzzy switching model with 5 fuzzy switches.

4 Fast Smoothing in the CGOMFSM

Let us denote by \(\varvec{\mathrm {T}} _1^N\) the triplet \((\varvec{\mathrm {X}} _1^N, \varvec{\mathrm {R}} _1^N, \varvec{\mathrm {Y}} _1^N)\). The smoothing problem consists in computing:

$$\begin{aligned} \mathbb {E}\left[ \varvec{\mathrm {X}} _{n+1} \left| \varvec{\mathrm {y}}_1^{N} \right. \right] =\quad \int _{[0,1]}{p\left( r_{n+1}=\nu \left| \varvec{\mathrm {y}}_1^{N} \right. \right) \mathbb {E}\left[ \varvec{\mathrm {X}} _{n+1} \left| r_{n+1}=\nu , \varvec{\mathrm {y}}_1^{n+1} \right. \right] d\nu }, \end{aligned}$$
(9)

from \(p\left( r_{n+1} \left| \varvec{\mathrm {y}}_1^{N} \right. \right) \) and \(\mathbb {E}\left[ \varvec{\mathrm {X}} _{n+1} \left| r_{n+1},\varvec{\mathrm {y}}_1^{n+1} \right. \right] \).

The optimal smoother computes recursively \(p\left( r_{n+1} \left| \varvec{\mathrm {y}}_1^{N} \right. \right) \) and \(\mathbb {E}\left[ \varvec{\mathrm {X}} _{n+1} \left| r_{n+1}, \varvec{\mathrm {y}}_1^{n+1} \right. \right] \) from \(p\left( r_n \left| \varvec{\mathrm {y}}_1^{N} \right. \right) \) and \(\mathbb {E}\left[ \varvec{\mathrm {X}} _{n} \left| r_n, \varvec{\mathrm {y}}_1^{n} \right. \right] \) and the model parameters using the procedure detailed in [9]. The main difference between CGOMSM and CGOMFSM is that, in the case of fuzzy switches, we involve continuous integration, requiring to be quantified with respect to the number of discrete fuzzy levels F.

Since \((\varvec{\mathrm {R}} _1^N, \varvec{\mathrm {Y}} _1^N)\) is a pairwise Markov chain in the model, we get

$$\begin{aligned} p\left( r_{n+1} \left| \varvec{\mathrm {y}}_1^{n+1} \right. \right) =\frac{\int _{[0,1]}{p\left( r_{n+1},\varvec{\mathrm {y}}_{n+1} \left| r_n=\nu ,\varvec{\mathrm {y}}_n \right. \right) p\left( r_n=\nu \left| \varvec{\mathrm {y}}_1^{n} \right. \right) } d\nu }{\int _{[0,1]}\int _{[0,1]}{p(r_{n+1}^*=\upsilon ,\varvec{\mathrm {y}}_{n+1}|r_n=\nu ,\varvec{\mathrm {y}}_n) p\left( r_n=\nu \left| \varvec{\mathrm {y}}_1^{n} \right. \right) }d\nu d\upsilon }, \end{aligned}$$
(10)

and

$$\begin{aligned} p\left( r_n \left| r_{n+1},\varvec{\mathrm {y}}_1^{n+1} \right. \right) = \frac{\int _{[0,1]}{p\left( r_{n+1},\varvec{\mathrm {y}}_{n+1} \left| r_n,\varvec{\mathrm {y}}_n \right. \right) p\left( r_n \left| \varvec{\mathrm {y}}_1^{n} \right. \right) d\nu }}{\int _{[0,1]}{p\left( r_{n+1},\varvec{\mathrm {y}}_{n+1} \left| r_n^*,\varvec{\mathrm {y}}_n \right. \right) p\left( r_n^* \left| \varvec{\mathrm {y}}_1^{n} \right. \right) } d\upsilon }. \end{aligned}$$
(11)

Since

$$\begin{aligned} \mathbb {E}\left[ \varvec{\mathrm {X}} _{n} \left| \varvec{\mathrm {r}}_n^{n+1}, \varvec{\mathrm {y}}_1^{n+1} \right. \right] = \mathbb {E}\left[ \varvec{\mathrm {X}} _{n} \left| r_n, \varvec{\mathrm {y}}_1^{n} \right. \right] , \end{aligned}$$
(12)

and from (9), we can derive the following recursive equation:

$$\begin{aligned}&\mathbb {E}\left[ \varvec{\mathrm {X}} _{n+1} \left| r_{n+1}, \varvec{\mathrm {y}}_1^{n+1} \right. \right] =\int _{[0,1]} p\left( r_n \left| r_{n+1}, \varvec{\mathrm {y}}_1^{n+1} \right. \right) \times \nonumber \\&\quad \quad \quad \quad \quad \quad \quad F_{n+1}(\varvec{\mathrm {r}}_n^{n+1}, \varvec{\mathrm {y}}_n^{n+1}) \mathbb {E}\left[ \varvec{\mathrm {X}} _{n} \left| r_n, \varvec{\mathrm {y}}_1^{n} \right. \right] + H_{n+1}(\varvec{\mathrm {r}}_n^{n+1}, \varvec{\mathrm {y}}_n^{n+1}) d\nu , \end{aligned}$$
(13)

with \(F_{n+1}(\varvec{\mathrm {r}}_n^{n+1}, \varvec{\mathrm {y}}_n^{n+1})\) and \(H_{n+1}(\varvec{\mathrm {r}}_n^{n+1}, \varvec{\mathrm {y}}_n^{n+1})\) are adequate matrices. Probabilities \(p(r_n|y_1^n)\) and \(p(r_n, y_1^N)\) are recursively calculated in linear time using forward and backward probabilities in the Markov chain \((Y_1^N,R_1^N)\) such that \(\alpha _n(r_n)=p(r_n,y_1^n)\) and \(\beta _n(r_n)=p(y_{n+1}^N|r_n,y_n)\).

$$\begin{aligned} \nonumber \alpha _1(r_1)= & {} p(r_1,y_1)\\ \alpha _{n+1}(r_{n+1})= & {} \int _{[0,1]}\alpha _n(\upsilon )p(r_{n+1},\varvec{\mathrm {y}}_{n+1}|r_n,\varvec{\mathrm {y}}_n)d\upsilon \end{aligned}$$
(14)

and

$$\begin{aligned} \nonumber \beta _N(r_N)= & {} 1\\ \beta _n(r_n)= & {} \int _{[0,1]}{\beta _{n+1}(\upsilon )p(r_{n+1},\varvec{\mathrm {y}}_{n+1}|r_n,\varvec{\mathrm {y}}_n)d\upsilon } \end{aligned}$$
(15)

Using forward-backward probabilities, we can compute the smoothed and the filtered probabilities as follows:

$$\begin{aligned} p(r_n|\varvec{\mathrm {y}}_1^{N})= & {} \frac{\alpha _n(r_n)\beta _n(r_n)}{\int _{[0,1]}{\alpha _n(\upsilon )\beta _n(\upsilon )d\upsilon }}\end{aligned}$$
(16)
$$\begin{aligned} p(r_n|\varvec{\mathrm {y}}_1^{n})= & {} \frac{\alpha _n(r_n)}{\int _{[0,1]}{\alpha _n(\upsilon )d \upsilon }}. \end{aligned}$$
(17)

Posterior marginal probabilities are calculated using the normalized Baum-Welch algorithm. The algorithm computes recursively the forward and backward probabilities. In the case of fuzzy switches, these probabilities are defined as follows:

$$\begin{aligned} \alpha _{n+1}(\delta )= & {} \int _{[0,1]}\alpha _n (\theta ) p\left( \varvec{\mathrm {t}}_{n+1}(\delta ) \left| \varvec{\mathrm {t}}_n(\theta ) \right. \right) d\theta ,\end{aligned}$$
(18)
$$\begin{aligned} \beta _{n}(\delta )= & {} \int _{[0,1]}\beta _{n+1}(\theta ) p\left( \varvec{\mathrm {t}}_{n+1}(\delta ) \left| \varvec{\mathrm {t}}_n(\theta ) \right. \right) d\theta , \end{aligned}$$
(19)

with \(\varvec{\mathrm {t}}_n(\theta ) = (\varvec{\mathrm {x}}_n, \varvec{\mathrm {y}}_n, r_n=\theta )\).

Then:

$$\begin{aligned} p(r_n,r_{n+1}|\varvec{\mathrm {x}}_1^{N},\varvec{\mathrm {y}}_1^{N})=\frac{\alpha _n(r_n)p(\varvec{\mathrm {t}}_{n+1}|\varvec{\mathrm {t}}_n)\beta _{n+1}(r_{n+1})}{\int _{[0,1]}\int _{[0,1]}\alpha _n(\delta )p(\varvec{\mathrm {t}}_{n+1}|\varvec{\mathrm {t}}_n)\beta _{n+1}(\theta )d\delta d\theta }. \end{aligned}$$
(20)

5 Experiments

In this section, we present two series of experiments to assess the performance of the exact smoother, in the case of scalar data (\(m=q=1\)). In the first series we evaluate the performance of the fuzzy model with synthetic fuzzy signals; in the second series we apply our algorithm to smooth simulated Stochastic Volatility (SV) data. In both experiments, parameters estimation is carried out using EM algorithm using training samples denoted by \((\varvec{x}_1^T, \varvec{y}_1^T)\) of size T. Then we repeatedly generate, according to the considered model, synthetic sequences of size S denoted by \((\varvec{x}_1^S, \varvec{y}_1^S)\). Smoothing algorithm is then performed using estimated parameters to generate \(\hat{\varvec{x}}_1^S\) from the observed sequence \(\varvec{y}_1^S\). The criterion used to assess the efficiency of smoothing algorithms is the mean squared error (MSE) defined as follows:

$$\begin{aligned} MSE=\frac{1}{S}\sum _{n=1}^{S}\left( x_n-\hat{x}_n\right) ^2 \end{aligned}$$
(21)

5.1 Smoothing Synthetic Fuzzy Signals

We define the distribution of the random process \((\varvec{\mathrm {X}} _1^N,\varvec{\mathrm {Y}} _1^N,\varvec{\mathrm {R}} _1^N)\) by the Gaussian distributions \(p(z_1,z_2|r_1,r_2)\), where \(Z_n=(X_n^T,Y_n^T)^T\). Let \(\varGamma _{i,j}\) be the covariance matrix \(\varGamma _{i,j}=\mathbb {E}\left[ Z_1Z_2^T \left| r_1=i,r_2=j \right. \right] =\left[ \begin{array}{cccc} 1 &{} b_i &{} a_{ij}&{}d_{ij}\\ b_i&{}1&{}e_{ij}&{}c_{ij}\\ a_{ij}&{}e_{ij} &{} 1 &{}b_j\\ d_{ij}&{} c_{ij} &{}b_j &{}1 \end{array}\right] \).

To ensure that the model follows the definition of CGOMSM (\(A^{yx}(R_n^{n+1})=0\)), we take \(d_{ij}=b_{i}c_{ij}\), we also take \(a_{ij}=c_{ij}\) and \(e_{ij}=d_{ij}\). Moreover, we consider that \(a_{ij}=a_{i}\). Hence, the simulation model is defined by the set of parameters \(\varTheta =\{a_0, a_1, b_0, b_1\}\). We consider five cases defined by the parameter set \(\varTheta \) detailed in Table 2. For each case, we generate a fuzzy signal with 5 discrete fuzzy switches. Then we restore the hidden process using the CGOMFSM model using different values of F ranging from 1 to 5. For each set of data, we consider 3 values of \(r\in \{2,8,20\}\). For each case, we perform 10 independent experiments with \(S=1000\). Each experiment consists in generating a training sample \((\varvec{X}_1^T, \varvec{Y}_1^T)\) of size \(T=20000\) for the 100 iterations of EM algorithm. Figure 6 shows examples of trajectories of simulated process \(\varvec{\mathrm {R}} _1^N\) with three different values of r. Figure 7 illustrates an example of simulated data using case 2 parameters and the optimal (but approximated) smoothing output. Table 3 reports the MSE results for the 5 different cases of fuzzy models and different values of r.

Table 2. Parameters of 5 simulation cases.
Fig. 6.
figure 6

Examples of trajectories of simulated process \(\varvec{\mathrm {R}} _1^N\) using different values of switches homogeneity \(r=2\) (a), \(r=8\) (b), \(r=20\) (c) and \(r=40\) (d).

Fig. 7.
figure 7

A \((\varvec{\mathrm {X}} _1^N, \varvec{\mathrm {R}} _1^N, \varvec{\mathrm {Y}} _1^N)\) CGOMFSM trajectory together with the restored signal in green (solid). (Color figure online)

The experimental results show that when the number of fuzzy levels increases, the smoothed signal is closer to the “ground-truth” hidden signal. Moreover, we noticed that the higher the homogeneity parameter r, the lower the estimation error.

Table 3. MSE results for different fuzzy signals with 5 discrete fuzzy levels and different values of r.
Table 4. MSE for SV models with \(\mu =0.5, \beta =0.5\). To ensure stationarity of models, we set \(\phi ^2= 1- \sigma ^2\). PS column is the result obtained from the particle smoother (1500 particles).

5.2 Experiments on Stochastic Volatility Models

Stochastic volatility (SV) models are widely used to highlight the variance of stochastic processes [10]. Several variants of SV models have been studied (Henston, CEV, GARCH, Chen, etc.). In this paper, we consider the standard SV model defined as follows:

$$\begin{aligned} X_1= & {} \mu +U_1 \end{aligned}$$
(22)
$$\begin{aligned} X_{n+1}= & {} \mu +\phi (X_n-\mu )+\sigma U_{n+1}\end{aligned}$$
(23)
$$\begin{aligned} Y_{n}= & {} \beta \exp \left( \frac{X_n}{2}\right) V_n, \end{aligned}$$
(24)

where \(U_i\), \(V_i\) are independent standard Gaussian vectors. The SV models is defined by the set of parameters \(\sigma \), \(\mu \), and \(\alpha \). The main conclusion is that when the number of discrete fuzzy states increases, the model approaches the results of the optimal (but time consuming) particle smoother (Table 4).

6 Conclusion

In this paper, we presented a novel approach to approximate non-linear Markov system using Conditionally Gaussian Observed Markov Fuzzy Switching Model (CGOMFSM). The chief novelty of this work is the introduction of fuzzy jumps instead of classical crisp states. This model still allows exact (up to required quantification) and fast smoothing equations. The fuzzy jumps allow transient modification of parameters, which is more appropriate for real-world applications. Future work includes the evaluation of the model on real data.