This chapter proposes another nonlinear PLS method, named as locality-preserving partial least squares (LPPLS), which embeds the nonlinear degenerative and structure-preserving properties of LPP into the PLS model. The core of LPPLS is to replace the role of PCA in PLS with LPP. When extracting the principal components of \(\boldsymbol{t}_i\) and \(\boldsymbol{u}_i\), two conditions must satisfy: (1) \(\boldsymbol{t}_i\) and \(\boldsymbol{u}_i\) retain the most information about the local nonlinear structure of their respective data sets. (2) The correlation between \(\boldsymbol{t}_i\) and \(\boldsymbol{u}_i\) is the largest. Finally, a quality-related monitoring strategy is established based on LPPLS.

First, the geometric interpretation of PCA in PLS and LPP is introduced. LPPLS model and LPPLS-based quality-related process monitoring method are proposed. Here three different types of LPPLS models are also given in the same framework, facing three nonlinear cases: nonlinearly correlated in the input space \(\boldsymbol{X}\) or the output space \(\boldsymbol{Y}\), as well as between them. A typical algorithm for extracting principal components is derived. Then, the feasibility and effectiveness of LPPLS method is verified by artificial 3-D data and Tennessee Eastman Process simulations.

10.1 The Relationship Among PCA, PLS, and LPP

For the normalized data sets of process variables \(\boldsymbol{X} = \left[ \boldsymbol{x}^\mathrm {T}(1), \boldsymbol{x}^\mathrm {T}(2), \right. \) \(\left. \ldots , \boldsymbol{x}^\mathrm {T}(n) \right] ^\mathrm {T}\in R^{n \times m}\;(\boldsymbol{x} \in R^{1 \times m})\) and quality variable \(\boldsymbol{Y}=\left[ \boldsymbol{y}^\mathrm {T}(1),\boldsymbol{y}^\mathrm {T}(2),\right. \) \(\left. \ldots ,\boldsymbol{y}^\mathrm {T}(n)\right] ^\mathrm {T}\in R^{n\times l}\;(\boldsymbol{y} \in R^{1\times l}) \), where m and l are the dimension of the process and quality variables spaces, and n is the number of samples, the principal component extraction of PCA, LPP, and PLS is actually equivalent to the following constrained optimization problem.

$$\begin{aligned} J_{\mathrm {PCA}}(\boldsymbol{w})&= \max {\boldsymbol{w}^{\mathrm {T}}}{{\boldsymbol{X}}^{\mathrm {T}}}\boldsymbol{Xw} \end{aligned}$$
(10.1)
$$\begin{aligned} \mathrm {s.t.}&{\boldsymbol{w}^{\mathrm {T}}}\boldsymbol{w} = 1\nonumber \\ J_{\mathrm {LPP}}(\boldsymbol{w})&= \max \; {\boldsymbol{w}^{\mathrm {T}}}{{\boldsymbol{X}}^{\mathrm {T}}}\boldsymbol{S}_x\boldsymbol{Xw} \end{aligned}$$
(10.2)
$$\begin{aligned} \mathrm {s.t.}&\;{\boldsymbol{w}^{\mathrm {T}}}{{\boldsymbol{X}}^{\mathrm {T}}}\boldsymbol{D}_x\boldsymbol{Xw} = 1\nonumber \\ J_\mathrm{PLS}(\boldsymbol{w},\boldsymbol{c})&=\max \boldsymbol{w}^\mathrm {T}\boldsymbol{X}^\mathrm {T}\boldsymbol{Yc}\\ \mathrm {s.t.}&{\boldsymbol{w}^{\mathrm {T}}}\boldsymbol{w} = 1,{\boldsymbol{c}^{\mathrm {T}}}\boldsymbol{c} = 1\nonumber \end{aligned}$$
(10.3)

The meaning of related variables such as \(\boldsymbol{w}, \boldsymbol{c}\) has been given in Chap. 9. Also, in Chap. 9, to weaken the limitations of PLS’s lack of local feature extraction capabilities, the input space \(\boldsymbol{X}\) and the output space \(\boldsymbol{Y}\), are mapped into a new feature space \(\boldsymbol{X}_F\) and \(\boldsymbol{Y}_F\) that includes a global linear subspace and a plurality of local linear subspaces. Consequently, the following new optimization objective function of the global plus local projection to latent structures (GPLPLS) method is immediately obtained using the feature space \(X_F\) or \(Y_F\) to replace the original space \(\boldsymbol{X}\) or \(\boldsymbol{Y}\),

$$\begin{aligned} \begin{aligned} J_{\mathrm {GPLPLS}}(\boldsymbol{w}, \boldsymbol{c})&= \arg \max \{ {\boldsymbol{w}^\mathrm {T}}{\boldsymbol{X}^\mathrm {T}_F}\boldsymbol{Y}_F \boldsymbol{c}\} \\ \mathrm {s.t.} \;\;&{\boldsymbol{w}^\mathrm {T}}\boldsymbol{w} = 1,{\boldsymbol{c}^\mathrm {T}}\boldsymbol{c} = 1, \end{aligned} \end{aligned}$$
(10.4)

where \(\boldsymbol{X}_F= \boldsymbol{X}+\lambda _x\boldsymbol{\theta }_x^\frac{1}{2}\), \(\boldsymbol{Y}_F= \boldsymbol{Y}+\lambda _y\boldsymbol{\theta }_y^\frac{1}{2}\).

Although adding local features to the global features makes the GPLPLS model show excellent performance in fault detection, the GPLPLS model does not fully implement local feature extraction or its local features are only extracted approximately. The main reason is that the constraint condition of the GPLPLS model is still the constraint condition of PCA or PLS. Of course, this combination way generally cannot guarantee the constraints of PCA and LPP at the same time.

Only the nonlinear part of the function is described by the local features, and the linear part is still characterized by the traditional covariance matrix in Chap. 9. In fact, the characteristics of the linear part can also be described by local characteristics. In this way, we can regard the linear part and the nonlinear part as a whole, thereby avoiding unnecessary parameter trade-offs. In the following context, we attempt to analyze the differences and similarities between PCA and LPP.

The local characteristics of \({\boldsymbol{X}}\) of LPP are contained in the matrices \({\boldsymbol{X}}^{\mathrm {T}} \boldsymbol{S}_x\boldsymbol{X}\) and \({\boldsymbol{X}}^{\mathrm {T}} \boldsymbol{D}_x\boldsymbol{X}\). To study the similarity of LPP and PCA, the matrix \(\boldsymbol{S}_x\) and \(\boldsymbol{D}_x\) are decomposed into \(\boldsymbol{S}_x^{\frac{1}{2}{\mathrm {T}}} \boldsymbol{S}_x^{\frac{1}{2}}\) and \(\boldsymbol{D}_x^{\frac{1}{2}{\mathrm {T}}} \boldsymbol{D}_x^{\frac{1}{2}}\), respectively. Then LPP criteria (10.2) is further transformed as

$$\begin{aligned} \begin{aligned} J_{\mathrm {LPP}}(\boldsymbol{w})&= \max \;\boldsymbol{w}^{\mathrm {T}} \boldsymbol{X}_M^{\mathrm {T}} {\boldsymbol{X}_M} \boldsymbol{w}\\ \mathrm {s.t.}&{\boldsymbol{w}^{\mathrm {T}}}{\boldsymbol{M}_x^{\mathrm {T}}} \boldsymbol{M}_x\boldsymbol{w} = 1, \end{aligned} \end{aligned}$$
(10.5)

where \(\boldsymbol{M}_x=\boldsymbol{D}_x^{\frac{1}{2}}{\boldsymbol{X}}\), \({\boldsymbol{X}_M} =\boldsymbol{S}_x^{\frac{1}{2}} {\boldsymbol{X}}\).

Comparing (10.5) and (10.1), it can be found that the structure in the mathematical description of the optimization problem of LPP and PCA is similar. “PCA selects a subspace consisting of the eigenvectors corresponding to the largest eigenvalues of the global covariance matrix, while LPP selects a subspace consisting of the eigenvectors corresponding to the smallest eigenvalues of the local covariance matrix (He et al. 2005)”. Therefore, LPP can replace PCA in the PLS decomposition process, thus achieving the preservation of strong local nonlinearity.

PCA is used to extract a set of components that transforms the original data \(\boldsymbol{X}\) to a set of t-scores T in the PLS criteria (10.3) of forming latent variables. PCA and PLS only extract global linear features and therefore do not reflect the local information of the sample and its nonlinear features. Actually PCA is not the only method of extracting principle components. LPP, converting the global nonlinearity into a combination of multiple local linearities, also can be used for extracting principle components. Therefore, LPP is suitable for systems with strong local nonlinear features.

10.2 LPPLS Models and LPPLS-Based Fault Detection

10.2.1 The LPPLS Models

Based on (10.3), the two criteria for selecting latent vectors \(\boldsymbol{u}_i\) and \(\boldsymbol{t}_i\) for PLS are as follows:

  1. (1)

    The linear variation on latent vectors is manifested as much as possible;

  2. (2)

    The correlation between is as strong as possible.

The optimization objective for extracting the first component pairs \((\boldsymbol{t}_1,\boldsymbol{u}_1)\) is

$$\begin{aligned} \begin{aligned} J_{\mathrm {PLS}}(\boldsymbol{w}_1,\boldsymbol{c}_1)=&\max {\boldsymbol{w}^{\mathrm {T}}_1}{{\boldsymbol{X}}^{\mathrm {T}}}\boldsymbol{Yc}_1\\ \mathrm {s.t.}&\quad {\boldsymbol{w}^{\mathrm {T}}_1}\boldsymbol{w} = 1, \boldsymbol{c}_1^{\mathrm {T}} \boldsymbol{c}_1 =1. \end{aligned} \end{aligned}$$
(10.6)

The optimization objective (10.6) is used for fast extraction of principal components in PLS. Define \(\boldsymbol{E}_{0}= {\boldsymbol{X}}\), \(\boldsymbol{F}_{0}={\boldsymbol{Y}}\), then the latent variables \(\boldsymbol{t}_1\) and \(\boldsymbol{c}_1\) are calculated by \(\boldsymbol{t}_1=\boldsymbol{E}_0\boldsymbol{w}_1\) and \(\boldsymbol{u}_1=\boldsymbol{F}_0\boldsymbol{c}_1\), where \(\boldsymbol{c}_1\) and \(\boldsymbol{w}_1\) are the eigenvectors corresponding to the maximum eigenvalues of the following matrices.

$$\begin{aligned} \boldsymbol{E}_{0}^{\mathrm {T}}{\boldsymbol{F}_{0}}\boldsymbol{F}_{0}^{\mathrm {T}}{\boldsymbol{E}_{0}}{\boldsymbol{w}_1}&= \theta _1^2{\boldsymbol{w}_1} \end{aligned}$$
(10.7)
$$\begin{aligned} \boldsymbol{F}_{0}^{\mathrm {T}}{\boldsymbol{E}_{0}}\boldsymbol{E}_{0}^{\mathrm {T}}{\boldsymbol{F}_{0}}{\boldsymbol{c}_1}&= \theta _1^2{\boldsymbol{c}_1}. \end{aligned}$$
(10.8)

Considering the similarity between LPP and PCA discussed in the previous section, LPP is used to extract the principle components (10.3) in PLS decomposition instead of PCA, i.e., the LPPLS model. Three LPPLS models (types I, II, and III) are developed to address the different nonlinear relationships.

The type I LPPLS model is given to deal with this case where the input space \({\boldsymbol{X}}\) has a nonlinear relationship and the correlation between the input \({\boldsymbol{X}}\) and the output \({\boldsymbol{Y}}\) is linear. The principal components of the input space \({\boldsymbol{X}}\) of the type I LPPLS are extracted by LPP and the principal components of the output space \({\boldsymbol{Y}}\) are extracted by PCA. The optimization objectives are as follows:

$$\begin{aligned} \begin{aligned} J_{\mathrm {LPPLS}_{\mathrm {I}}}(\boldsymbol{w}, \boldsymbol{c})&=\max \boldsymbol{w}^{\mathrm {T}} \boldsymbol{X}_M^{\mathrm {T}} {\boldsymbol{Y}} \boldsymbol{c} \\ \mathrm { s.t. }\;\; \boldsymbol{c}^{\mathrm {T}} \boldsymbol{c}&=1, \boldsymbol{w}^{\mathrm {T}} \boldsymbol{M}_{x}^{\mathrm {T}} \boldsymbol{M}_{x} \boldsymbol{w}=1. \end{aligned} \end{aligned}$$
(10.9)

The type II LPPLS model is given to deal with the nonlinearly correlation between the input space \({\boldsymbol{X}}\) and output space \({\boldsymbol{Y}}\), but linearly correlation in the input space \({\boldsymbol{X}}\). The principal components in input space \({\boldsymbol{X}}\) are extracted by PCA and the principal components of the output space \({\boldsymbol{Y}}\) are extracted by LPP. The optimization function is

$$\begin{aligned} \begin{aligned} J_{\mathrm {LPPLS}_{\mathrm {II}}}(\boldsymbol{w}, \boldsymbol{c})&=\max \boldsymbol{w}^{\mathrm {T}} {\boldsymbol{X}}^{\mathrm {T}} \boldsymbol{Y}_M \boldsymbol{c} \\ \mathrm { s.t. }\;\; \boldsymbol{w}^{\mathrm {T}} \boldsymbol{w}&=1, \boldsymbol{c}^{\mathrm {T}} \boldsymbol{M}_{y}^{\mathrm {T}} \boldsymbol{M}_{y} \boldsymbol{c}=1 \end{aligned} \end{aligned}$$
(10.10)

in which

where \(\boldsymbol{S}_{y}\) and \(\boldsymbol{D}_{y}\) are similar as the \(\boldsymbol{S}_x\) and \(\boldsymbol{D}_x\) and it has a different neighbors parameter \(\delta _{y}\) in (9.8).

The type III LPPLS model is given for the nonlinear correlation between the input space \({\boldsymbol{X}}\) and the output space \({\boldsymbol{Y}}\) as well as among the input spaces \({\boldsymbol{X}}\). In this case, the principal components of the input space \({\boldsymbol{X}}\) and output space \({\boldsymbol{Y}}\) are both extracted by the LPP. Its corresponding optimization objective function is

$$\begin{aligned} \begin{aligned} \quad J_{\mathrm {LPPLS}_{\mathrm {III}}}(\boldsymbol{w}, \boldsymbol{c})&=\max \boldsymbol{w}^{\mathrm {T}} \boldsymbol{X}_M^{\mathrm {T}} \boldsymbol{Y}_M {\boldsymbol{c}} \\ \mathrm { s.t. }\;\; \boldsymbol{w}^{\mathrm {T}} \boldsymbol{M}_{x}^{\mathrm {T}} \boldsymbol{M}_{x} \boldsymbol{w}&=1, \boldsymbol{c}^{\mathrm {T}} \boldsymbol{M}_{y}^{\mathrm {T}} \boldsymbol{M}_{y} \boldsymbol{c}=1. \end{aligned} \end{aligned}$$
(10.11)

The criteria for the selection of latent vectors \(\boldsymbol{u}_i\) and \(\boldsymbol{t}_i\) for type III LPPLS are as follows:

  1. (1)

    The nonlinear variation on the latent vector is manifested as much as possible;

  2. (2)

    The correlation between latent vectors is as strong as possible.

Discussion one of the aims of is to choose factors \(\boldsymbol{u}_i\) and \(\boldsymbol{t}_i\) that better represent the nonlinear variation of the factor changes. GLPLS’s optimization objective is given in (10.12) (Zhong et al. 2016).

$$\begin{aligned} \begin{aligned} J_{\mathrm {GLPLS}}(\boldsymbol{w},\boldsymbol{c}) =&\max \left\{ {\boldsymbol{w}^{\mathrm {T}}}{ {\boldsymbol{X}}^{\mathrm {T}}} \boldsymbol{Y}\boldsymbol{c}+\beta _1 \boldsymbol{w}^{\mathrm {T}} \boldsymbol{X}_M^{\mathrm {T}} \boldsymbol{X}_M \boldsymbol{w} + \beta _2 \boldsymbol{c}^{\mathrm {T}} \boldsymbol{Y}_M^{\mathrm {T}} \boldsymbol{Y}_M\boldsymbol{c}\right\} \\ \mathrm {s.t.}&\quad \boldsymbol{w}^{\mathrm {T}} \boldsymbol{w} = 1,\; \boldsymbol{c}^{\mathrm {T}}\boldsymbol{c} = 1, \end{aligned} \end{aligned}$$
(10.12)

where the parameters \(\beta _1\) and \(\beta _2\) are the trade-off between global and local feature extraction. Here the embedding properties and data screening of LPP are removed because the constraints \(\boldsymbol{w}^{\mathrm {T}} {\boldsymbol{X}}^{\mathrm {T}} \boldsymbol{D}_x {\boldsymbol{X}} \boldsymbol{w} = 1\) and \(\boldsymbol{c}^{\mathrm {T}} {\boldsymbol{Y}}^{\mathrm {T}} \boldsymbol{D}_{y} {\boldsymbol{Y}} \boldsymbol{c} = 1\) of LPP are removed in (10.12). GLPLS model is a fusion of the PLS model with the partial LPP model. “The best vectors \(\boldsymbol{w}\) and \(\boldsymbol{c}\) from (10.12) ensure maximum correlation (PLS) and relative or local optimal data filtering and embedding capabilities for \({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\) (Zhong et al. 2016)”. On the other hand, \(\boldsymbol{w}^{\mathrm {T}} {\boldsymbol{X}}^{\mathrm {T}} \boldsymbol{S}_x {\boldsymbol{X}} \boldsymbol{w}\) and \(\boldsymbol{c}^{\mathrm {T}} {\boldsymbol{Y}}^{\mathrm {T}}\boldsymbol{S}_{y} \boldsymbol{Y}\boldsymbol{c}\) are only used to introduce the local features in the input and output space, but not the correlation features between them. However, the LPP model is fully embedded in the LPPLS model. It is embedded in the outer layer, inner layer or both of the PLS model, i.e., three types of LPPLS models. At the same time, the correlation information in the input and output spaces is retained.

Type III LPPLS is used as an example to show the extracting of principal components. Supposed the first component pairs is \((\boldsymbol{t}_1,\boldsymbol{u}_1)\). Define \(\boldsymbol{E}_{0L}=\boldsymbol{X}_M\) and \(\boldsymbol{F}_{0L}=\boldsymbol{Y}_M\) in order to facilitate comparison with the traditional linear PLS.

First, the optimization (10.11) for the first component pair \((\boldsymbol{t}_1,\boldsymbol{u}_1)\) is converted into an unconstrained problem by the Lagrangian multiplier,

$$\begin{aligned} \varPsi ({\boldsymbol{w}_1},{\boldsymbol{c}_1})&= \boldsymbol{w}_1^{\mathrm {T}} \boldsymbol{E}_{0L}^{\mathrm {T}} {\boldsymbol{F}_{0L}}{\boldsymbol{c}_1} - {\lambda _1}(\boldsymbol{w}_1^{\mathrm {T}} {\boldsymbol{M}}_x^{\mathrm {T}} {{\boldsymbol{M}}_x}{\boldsymbol{w}_1} - 1) - {\lambda _2}({\boldsymbol{c}_1}^{\mathrm {T}} {\boldsymbol{N}}_{y}^{\mathrm {T}} {{\boldsymbol{N}}_{y}}{\boldsymbol{c}_1} - 1). \end{aligned}$$
(10.13)

Let \(\frac{{\partial \varPsi }}{{\partial {\boldsymbol{w}_1}}}=0\) and \(\frac{{\partial \varPsi }}{{\partial {\boldsymbol{c}_1}}}=0\), then the optimal pair of \(\boldsymbol{w}_1\) and \(\boldsymbol{c}_1\) is obtained

$$\begin{aligned} \boldsymbol{E}_{0L}^{\mathrm {T}} {\boldsymbol{F}_{0L}}{\boldsymbol{c}_1}&= 2{\lambda _1}{\boldsymbol{M}}_x^{\mathrm {T}}{{\boldsymbol{M}}_x}{\boldsymbol{w}_1} \end{aligned}$$
(10.14)
$$\begin{aligned} \boldsymbol{F}_{0L}^{\mathrm {T}} {\boldsymbol{E}_{0L}}{\boldsymbol{w}_1}&= 2{\lambda _2}{\boldsymbol{N}}_{y}^{\mathrm {T}}{{\boldsymbol{N}}_{y}}{\boldsymbol{c}_1}. \end{aligned}$$
(10.15)

Equations (10.14) and (10.15) are respectively multiplied by \(\boldsymbol{w}_1^{\mathrm {T}} \) and \(\boldsymbol{c}_1^{\mathrm {T}} \) on the left, then,

$$\begin{aligned} {\theta _1} := 2{\lambda _1} = 2{\lambda _2} = {\boldsymbol{w}_1^{\mathrm {T}} } \boldsymbol{E}_{0L}^{\mathrm {T}} {\boldsymbol{F}_{0L}}{\boldsymbol{c}_1} = \boldsymbol{c}_1^{\mathrm {T}} \boldsymbol{F}_{0L}^{\mathrm {T}} {\boldsymbol{E}_{0L}}{\boldsymbol{w}_1}. \end{aligned}$$
(10.16)

Comparing (10.11) and (10.16), it is found that \(\theta _1\) is the objective function value. Substitute (10.16) into (10.14) and (10.15), and the relationship between \(\boldsymbol{w}_1\) and \(\boldsymbol{c}_1\) is obtained,

$$\begin{aligned} {\boldsymbol{w}_1}&= \frac{1}{{{\theta _1}}}{({\boldsymbol{M}}_x^{\mathrm {T}}{{\boldsymbol{M}}_x})^{ - 1}}\boldsymbol{E}_{0L}^{\mathrm {T}}{\boldsymbol{F}_{0L}}{\boldsymbol{c}_1} \end{aligned}$$
(10.17)
$$\begin{aligned} {\boldsymbol{c}_1}&= \frac{1}{{{\theta _1}}}{({\boldsymbol{N}}_{y}^{\mathrm {T}}{{\boldsymbol{N}}_{y}})^{ - 1}}\boldsymbol{F}_{0L}^{\mathrm {T}}{\boldsymbol{E}_{0L}}{\boldsymbol{w}_1}. \end{aligned}$$
(10.18)

Substitute (10.18) into (10.14) and substitute (10.17) into (10.15), the following equations about the first vector pair are obtained,

$$\begin{aligned} {({\boldsymbol{M}}_x^{\mathrm {T}}{{\boldsymbol{M}}_x})^{ - 1}}\boldsymbol{E}_{0L}^{\mathrm {T}}{\boldsymbol{F}_{0L}}{({\boldsymbol{N}}_{y}^{\mathrm {T}}{{\boldsymbol{N}}_{y}})^{ - 1}}\boldsymbol{F}_{0L}^{\mathrm {T}}{\boldsymbol{E}_{0L}}{\boldsymbol{w}_1}&= \theta _1^2{\boldsymbol{w}_1} \end{aligned}$$
(10.19)
$$\begin{aligned} {({\boldsymbol{N}}_{y}^{\mathrm {T}}{{\boldsymbol{N}}_{y}})^{ - 1}}\boldsymbol{F}_{0L}^{\mathrm {T}}{\boldsymbol{E}_{0L}}{({\boldsymbol{M}}_x^{\mathrm {T}}{{\boldsymbol{M}}_x})^{ - 1}}\boldsymbol{E}_{0L}^{\mathrm {T}}{\boldsymbol{F}_{0L}}{\boldsymbol{c}_1}&= \theta _1^2{\boldsymbol{c}_1}. \end{aligned}$$
(10.20)

The optimal weight vectors \(\boldsymbol{w}_1\) and \(\boldsymbol{c}_1\) is obtained by the maximum eigenvalue of (10.19) and (10.20). Now the potential variables \(\boldsymbol{u}_1\) and \(\boldsymbol{t}_1\) are calculated as follows:

$$\begin{aligned} \boldsymbol{t}_1=\boldsymbol{E}_{0L}\boldsymbol{w}_1, \;\boldsymbol{u}_1=\boldsymbol{F}_{0L} \boldsymbol{c}_1. \end{aligned}$$

Calculation of the load vector:

$$\begin{aligned} \boldsymbol{p}_1=\frac{\boldsymbol{E}_{0L}^{\mathrm {T}} \boldsymbol{t}_1}{\Vert \boldsymbol{t}_1\Vert ^2},\; \bar{\boldsymbol{q}}_1=\frac{\boldsymbol{F}_{0L}^{\mathrm {T}} \boldsymbol{t}_1}{\Vert \boldsymbol{t}_1\Vert ^2}. \end{aligned}$$

Residual matrixes \(\boldsymbol{E}_{1L}\) and \(\boldsymbol{F}_{1L}\) are

$$\begin{aligned} \boldsymbol{E}_{1L}=\boldsymbol{E}_{0L}-\boldsymbol{t}_1\boldsymbol{p}_1^{\mathrm {T}} ,\;\boldsymbol{F}_{1L}=\boldsymbol{F}_{0L}-\boldsymbol{u}_1\bar{\boldsymbol{q}}_1^{\mathrm {T}}. \end{aligned}$$

The first optimal weight vector \(\boldsymbol{w}_1\) of PLS (10.7) is the eigenvectors of matrix \( \boldsymbol{E}_0^{\mathrm {T}} \boldsymbol{F}_0 \boldsymbol{F}_0^{\mathrm {T}} \boldsymbol{E}_0 \), while in LPPLS (10.19), it is corresponding to the eigenvectors of matrix \(\left( \boldsymbol{M}_x^{\mathrm {T}} \boldsymbol{M}_x\right) ^{- 1}\boldsymbol{E}_{0L}^{\mathrm {T}}{\boldsymbol{F}_{0L}}\) \( \left( \boldsymbol{N}_{y}^{\mathrm {T}} \boldsymbol{N}_{y}\right) ^{-1} \boldsymbol{F}_{0L}^{\mathrm {T}} \boldsymbol{E}_{0L}\). The optimization problem with maximum eigenvalue in (10.19)are very similar to the traditional linear PLS. Therefore, the traditional NIPALS technique is convenient to extract the remaining principle components.

The other latent variables are calculated based on the residual matrices \(\boldsymbol{E}_{iL}\) and \(\boldsymbol{F}_{iL}, i=1,2,\ldots , d-1\).

$$\begin{aligned} \boldsymbol{t}_{i+1}=\boldsymbol{E}_{iL}\boldsymbol{w}_{i+1} ,\;\boldsymbol{u}_{i+1}=\boldsymbol{F}_{iL}\boldsymbol{c}_{i+1}, \end{aligned}$$

where \(\boldsymbol{w}_{i+1}\) is the eigenvector corresponding to the maximum eigenvalue \(\theta _{i+1}^2\) of matrix \(({\boldsymbol{M}}_x^{\mathrm {T}} {\boldsymbol{M}}_x)^{- 1}\boldsymbol{E}_{iL}^{\mathrm {T}} \boldsymbol{F}_{iL}\) \(({\boldsymbol{N}}_{y}^{\mathrm {T}}{\boldsymbol{N}}_{y})^{- 1}\boldsymbol{F}_{iL}^{\mathrm {T}} \boldsymbol{E}_{iL}\).

Similarly, \(\boldsymbol{c}_{i+1}\) is the eigenvector corresponding to the maximum eigenvalue of\(({\boldsymbol{N}}_{y}^{\mathrm {T}} {\boldsymbol{N}}_{y})^{- 1}\boldsymbol{F}_{iL}^{\mathrm {T}} \boldsymbol{E}_{iL}\) \(({\boldsymbol{M}}_x^{\mathrm {T}}{\boldsymbol{M}}_x)^{- 1}\boldsymbol{E}_{iL}^{\mathrm {T}} \boldsymbol{F}_{iL}\). Then,

$$\begin{aligned} \boldsymbol{p}_{i+1}=\frac{\boldsymbol{E}_{iL}^{\mathrm {T}} \boldsymbol{t}_{i+1}}{\Vert \boldsymbol{t}_{i+1}\Vert ^2},\;\bar{\boldsymbol{q}}_{i+1}=\frac{\boldsymbol{F}_{iL}^{\mathrm {T}} \boldsymbol{t}_{i+1}}{\Vert \boldsymbol{t}_{i+1}\Vert ^2}. \end{aligned}$$

Finally, d latent variables of LPPLS are determined using the cross-validation method.

10.2.2 LPPLS for Process and Quality Monitoring

\({\boldsymbol{X}}\) and \({\boldsymbol{Y}}\) is projected to a low-dimensional space by latent variables \((\boldsymbol{t}_1,\ldots ,\boldsymbol{t}_d)\). The neighboring map** of original data \(\boldsymbol{E}_{0L}\) and \(\boldsymbol{F}_{0L}\) is decomposed as follows:

$$\begin{aligned} \begin{aligned} \boldsymbol{E}_{0L}&=\sum ^d_{i=1}\boldsymbol{t}_i\boldsymbol{p}_i^{\mathrm {T}}+E = \boldsymbol{T}\boldsymbol{P}^{\mathrm {T}} +\bar{\boldsymbol{E}}\\ \boldsymbol{F}_{0L}&=\sum ^d_{i=1}\boldsymbol{t}_i\boldsymbol{q}_i^{\mathrm {T}}+F = \boldsymbol{T}\bar{\boldsymbol{Q}}^{\mathrm {T}} +\bar{\boldsymbol{F}}, \end{aligned} \end{aligned}$$
(10.21)

where \(\boldsymbol{T}=[\boldsymbol{t}_1,\boldsymbol{t}_2,\ldots ,\boldsymbol{t}_d]\) are the latent score vectors. \(\boldsymbol{P} = [\boldsymbol{p}_1,\ldots ,\boldsymbol{p}_d]\) and \(\bar{\boldsymbol{Q}}= [\bar{\boldsymbol{q}}_1, \ldots , \bar{\boldsymbol{q}}_d]\) are load matrices for \(\boldsymbol{E}_{0L}\) and \(\boldsymbol{F}_{0L}\), respectively. \(\boldsymbol{T}\) is represented by the neighboring map** data \(\boldsymbol{E}_{0L}\),

$$\begin{aligned} \boldsymbol{T}=\boldsymbol{E}_{0L} \boldsymbol{R}=\boldsymbol{S}_x^{\frac{1}{2}} \boldsymbol{E}_0 \boldsymbol{R}, \end{aligned}$$
(10.22)

where \(\boldsymbol{R} = [{r_1},\ldots ,{r_d}] \) ,

$${r_i} = \prod \limits _{j = 1}^{i - 1} \left( {\boldsymbol{I}_n} - \boldsymbol{w}_j \boldsymbol{p}_j^\mathrm {T}\right) {\boldsymbol{w}_i}$$

Similarly as GPLPLS method, (10.21) and (10.22) are difficult to apply in practice since the locality transformation matrix S cannot be obtained during the online measurements. So they are changed to the direct decomposition of \(\boldsymbol{E}_{0}\) and \(\boldsymbol{F}_{0}\),

$$\begin{aligned} \boldsymbol{E}_{0}&=\boldsymbol{S}_x^{-\frac{1}{2}} (\boldsymbol{T}\boldsymbol{P}^{\mathrm {T}} +\bar{\boldsymbol{E}}) = \boldsymbol{T}_0\boldsymbol{P}^{\mathrm {T}} + \boldsymbol{E}' \end{aligned}$$
(10.23)
$$\begin{aligned} \boldsymbol{F}_{0}&=\boldsymbol{S}_{y}^{-\frac{1}{2}} (\boldsymbol{S}_x^{\frac{1}{2}}\boldsymbol{T}_0\bar{\boldsymbol{Q}}^{\mathrm {T}} +\bar{\boldsymbol{F}}), \end{aligned}$$
(10.24)

where \(\boldsymbol{T}_0= \boldsymbol{E}_0 \boldsymbol{R}, \boldsymbol{E}' = \boldsymbol{S}_x^{-\frac{1}{2}} \bar{\boldsymbol{E}}\).

Process and quality monitoring for new scaled and mean-centered data samples \({\boldsymbol{x}}\) and \({\boldsymbol{y}}\) is performed by the oblique projection of the input data \(\boldsymbol{x}\).

$$\begin{aligned} \begin{aligned} \boldsymbol{x}&=\hat{\boldsymbol{x}} +\boldsymbol{x}_e \\ \hat{\boldsymbol{x}}&= \boldsymbol{R}\boldsymbol{P}^{\mathrm {T}} \boldsymbol{x} \\ \boldsymbol{x}_e&=\left( \boldsymbol{I}-\boldsymbol{P}\boldsymbol{R}^{\mathrm {T}}\right) \boldsymbol{x}. \end{aligned} \end{aligned}$$
(10.25)

The residual space still contains much variation information (Qin and Zheng 2012), but it is not the main focus of LPPLS. To facilitate the comparison with traditional monitoring methods, this chapter will directly adopt traditional fault monitoring indices without any modification. The \(T^2\) and Q statistics are defined,

$$\begin{aligned} \begin{aligned} \boldsymbol{t}&=\boldsymbol{R}^{\mathrm {T}} \boldsymbol{x} \\ \mathrm{{T}^2}&=\boldsymbol{t}^{\mathrm {T}} \boldsymbol{\varLambda }^{-1} \boldsymbol{t}=\boldsymbol{t}^{\mathrm {T}}\left( \frac{1}{n-1}\boldsymbol{T}^{\mathrm {T}}_0 \boldsymbol{T}_0\right) ^{-1}\boldsymbol{t} \\ \mathrm{{Q}}&=\Vert \boldsymbol{x}_e\Vert ^2 = \boldsymbol{x}^{\mathrm {T}} (\boldsymbol{I}-\boldsymbol{P}\boldsymbol{R}^{\mathrm {T}})\boldsymbol{x}, \end{aligned} \end{aligned}$$
(10.26)

where \(\boldsymbol{\varLambda }\) is the sample covariance matrix. The matrix \(\tilde{\boldsymbol{X}}\) or \( \boldsymbol{E}_{0L}\) of type III LPPLS is not a scaled and mean-centered one. Moreover in nonlinear systems, the output variables may not obey the Gaussian distribution even if the input variables obey it. So the control limits of the statistics of \(\mathrm {T}^2\) and \(\mathrm {Q}\) are not computed according to the F and \(\chi ^2\) distributions. It should be calculated based on their probability density functions obtained by non-parametric kernel density estimation method (Lee et al. 2004).

Remark 10.1

The LPPLS decomposition (10.23) is similar to linear PLS, but its residual space \(\boldsymbol{E}'\) is related to the locally preserved projection matrix \(\boldsymbol{S}_x^{\frac{1}{2}}\). It is difficult to obtain the locally retained projection matrix \(\boldsymbol{S}_x^{\frac{1}{2}}\) for new data during online fault detection. But its covariance matrix \(\varLambda \) of the samples and the statistics of \(\mathrm{{T}^2}\) and \(\mathrm Q\) (10.26) are not directly related to the locally retained projection matrix \(\boldsymbol{S}_x^{\frac{1}{2}}\) which is a useful feature for online monitoring

Although matrix \(\boldsymbol{S}_{L}:=\boldsymbol{S}_{y}^{-\frac{1}{2}}\boldsymbol{S}_x^{\frac{1}{2}}\in R^{n \times n}\) is constant, the regression equation (10.24) cannot be used for output projections. As mentioned above, the first reason is that the locally preserved projection matrices \(\boldsymbol{S}_x^{\frac{1}{2}}\) and \(\boldsymbol{S}_{y}^{\frac{1}{2}}\) for the new data are difficult to obtain. Another is that direct application of least squares solution \(\boldsymbol{S}_R=\boldsymbol{E}_0^+\boldsymbol{S}_{L} \boldsymbol{E}_0\) may lead to poor prediction performance. The prediction performance directly determines whether a model needs to be updated in practice. The regression equation can be constructed based on \(\boldsymbol{F}_0\) and \(\boldsymbol{T}_0\) based on (10.23),

$$\begin{aligned} \boldsymbol{F}_{0}=\boldsymbol{T}_0\boldsymbol{Q}^{\mathrm {T}} +\tilde{\boldsymbol{F}}. \end{aligned}$$
(10.27)

Remark 10.2

In the special case of \(\boldsymbol{S}_{L}=I\), (10.24) and (10.27) are equal. In most cases, the regression coefficients (\(\bar{\boldsymbol{Q}}\) and \(\boldsymbol{Q}\)) are significantly different. But considering both \(\bar{\boldsymbol{Q}}\) and \(\boldsymbol{Q}\) are least squares solutions for any type of regression equation, so the regression errors \(\bar{\boldsymbol{F}}\) and \(\tilde{\boldsymbol{F}}\) are equivalent in theory. Therefore, the latter regression equation (10.27) can be used to predicts the corresponding output of the new input data.

10.2.3 Locality-Preserving Capacity Analysis

Here two three-dimensional artificial data sets are used to explain the locality-preserving capacity of LPPLS, S-curves and Swiss roll. They are common to validate the performance of manifold learning algorithm.

$$\begin{aligned} \boldsymbol{X}_1&= [{x_1};{x_2};{x_3}]\\ {}&= \left[ \cos (\alpha ), - \cos (\alpha )];5v_1;[sin(\alpha ), 2 - sin(\alpha )\right] \\ \boldsymbol{X}_2&= [{x_1};{x_2};{x_3}]\\ {}&= \left[ t\cos (t );2v_3;tsin(t)\right] , \end{aligned}$$

where \(\alpha = (1.5 v_2-1)/\pi \), \(t=3\pi /2(1+2v_4)\). \(v_1,v_2,v_3\) and \(v_4\) are uniformly distributed on (0, 1). Two kinds of output function is defined as \({y}=2x_1-x_3\) (linear) and \({y}=x_1x_3\) (nonlinear).

1000 sample points are randomly generated in the 3-D space \([x_1,x_2,x_3]\), and the dimensionality reduction process for PLS and LPPLS model is performed. The projection results of the two models in two dimensions are shown in Figs. 10.1 and 10.2, respectively.

Fig. 10.1
figure 1

Projection results of PLS, and LPPLS models for S-curve data set with \({\boldsymbol{Y}}=2x_1-x_3\). Type I LPPLS model is used

Fig. 10.2
figure 2

Projection results of PLS, and LPPLS models for Swiss roll data set with \({\boldsymbol{Y}}=x_1x_3\). Type III LPPLS model is used

The projection results show that PLS does not preserve the local structural information for the S-curves and Swiss roll. In other words, the data is not correctly classified by color. However, LPPLS preserves the local structural features and has good classification results. LPPLS model improves the local preserving capability of PLS model; moreover, LPPLS can better discriminate the boundary features. Thus, LPPLS method can be used to detect faults related to output variables in systems with strong nonlinearity.

10.3 Case Study

Validation of the proposed LPPLS-based fault detection method is performed on the Tennessee Eastman Process simulation platform (Lyman and Georgakis 1995). TEP is described in detail in the article found in (Lee et al. 2006). The related data sets are downloaded from “http://web.mit.edu/braatzgroup/links.html”. PCA (Dunia and Qin 1998; Good et al. 2010) and other global-local preserving projections methods (Luo 2014; Bao et al. 2016; Luo et al. 2016) did not merge any information in the output space, so only the LPPLS method and two quality-related monitoring methods (PLS method and GLPLS method) are compared.

10.3.1 PLS, GLPLS and LPPLS Models

The input variable matrix \(\boldsymbol{X}=[x_1,x_2\cdots ,x_{33}]^{\mathrm {T}}\) consists of 22 process variables (XMEAS(1:22):=\(x_1:x_{22}\)) and 11 manipulated variables (\(x_{23}:x_{33}\)) except XMV(12). The quality variable matrix \({\boldsymbol{Y}}=[{ y}_1;{y}_2]\) is composed of the components G of stream 9 and the components E of stream 11, i.e., XMEAS (35) (\({y}_1\)) and (38) (\({y}_2\)). The training set is the normal data IDV(0) containing 960 samples. The test set is the fault data IDV(1:21). Each fault data have 960 samples (the first 160 samples are normal and the last 800 samples are faulty). The model parameters are \(\delta _x=1.5\), \(\delta _{y}=0.8\), \(K_x=20\) and \(K_{y}=15\), where \(K_x\) and \(K_{y}\) are the adjacent parameters in the input space and output space, respectively. Regression coefficients obtained by PLS, GLPLS, and LPPLS models are shown in Table 10.1. The relative errors of training are shown in Fig. 10.3. Here the relative error is calculated as \(\text {error} = ({y}_i - {{y}_{i,tr}})/{y}_i, i=1,2\) and \({y}_{i,tr}\) is the corresponding output of the training model.

Table 10.1 Regression coefficients of PLS, GLPLS, and LPPLS models

The training error in Fig. 10.3 shows that the training results of the PLS, GLPLS, and LPPLS models satisfy the modeling requirements. The output prediction experiments of these models are finished under all the fault conditions (i.e., the test data set), and similar prediction abilities are obtained for most cases. Give fault IDV(21) as an example, the output prediction of three models are shown in Fig. 10.4. \({y}_1\) and \({y}_2\) are at the top and bottom of these figures, respectively. Fault IDV(21) is caused by a slow drift in the output variables to drift slowly (Lee et al. 2006), but the prediction performances of three methods still are good even in this fault case. So the generalization capability of three models is verified.

Fig. 10.3
figure 3

Relative errors of PLS, GLPLS, and LPPLS models

Fig. 10.4
figure 4

Prediction results for IDV(21) of PLS, GLPLS, and LPPLS models

10.3.2 Quality Monitoring Analysis

The \(\mathrm {T}^2\) statistic represents the map** between process variables and quality variables for PLS and its related methods. The alarm in \(\mathrm {T}^2\) statistic indicates a quality-related fault. In contrast, the \(\mathrm Q\) statistic represents only the residuals in the input space, therefore, its alarm indicates that the fault is not quality related. Table 10.2 gives the monitoring FDR whose control limits are calculated with confidence level \(99.75\%\), respectively.

Table 10.2 FDR of PLS, GLPLS, and LPPLS models

The product quality consists of component G (XMEAS(35)) and component E (XMEAS(38)). Faults IDV(3,4,9,11,14,15,19) have almost no effect on product quality, but the remaining faults cause significant changes in the quality variables. The FDR results of the LPPLS method match the above actual TPE case, which detects quality-related faults with much higher accuracy than the PLS and GLPLS models (e.g., IDV(5) and IDV(12) in Table 10.2). In this section, the performance for fault detection is further examined based on three fault scenarios, including disturbance of reactor cooling water, disturbance of condenser cooling water, and a constant position of the steam 4 valves.

Experiment 1: Disturbance in Reactor Cooling Water (Quality-Independent Fault)

The faults related to the reactor cooling water are IDV (4), IDV (11), and IDV (14). As mentioned above, they have little effect on the product quality but are process related. The results of monitoring the variation of the reactor cooling water are shown in Fig. 10.5. Here IDV (14) is given for example in order to compare with other quality-related methods, such as GPLPLS given in Chap. 9.

Fig. 10.5
figure 5

PLS, GLPLS, and LPPLS monitoring for IDV(14)

The faults related to the reactor cooling water will cause the variation of reactor temperature, but the reactor temperature is controlled by a cascade controller. So any disturbances, including step fault IDV(4), random fault IDV(11), and valve sticking disturbances IDV(14), do not affect the product quality. Table 10.2 shows the fault detection rates for the PLS, GLPLS, and LPPLS methods. The \(\mathrm Q\) statistics of all three methods detect these process-related faults in the input space with higher FDR. The FDR values for LPPLS for the \(\mathrm {T}^2\) statistic are much smaller than other methods, which indicates that these faults are quality-independent. Fault IDV(14) is a special case. When the traditional analysis methods, such as filtering or PLS, are applied to this fault, most information about the fault feature are lost. This leads to this fault is difficult to detect in the input space, thus preventing it from detecting the fault in the input space. Now Let’s check the detection result for fault IDV(14). FDR in the \(\mathrm {T}^2\) statistic for PLS and GLPLS model are 33.5% and 96.88%, far higher than LPPLS. It means that PLS and GLPLS distinguish fault IDV(14) as quality related. The FDR of LPPLS in \(\mathrm {T}^2\) statistic is 2.5%, near to that of GPLPLS (Tables 9.2 and 9.3). So LPPLS can effectively filter the quality-irrelevant faults, similar as GPLPLS method.

Experiment 2: Disturbance in Condenser Cooling Water (Quality-Related Fault)

These faults include the quality-related faults IDV (5) and IDV (12). The fault IDV (5) is caused by a step change in the cooling water flow rate of the condenser. Since the series controller compensates for this step change, the separator temperature returns to setpoint. The PLS and GLPLS have similar predicted results, returning to the setpoint 10 h after the fault. But LLPLS-based monitoring provides a persistent alarm in statistic (\(\mathrm {T} ^ 2 \)) (Fig. 10.6). “The persistence of the fault detection statistic is demonstrated by the fact that it continues to alert the operator to process anomalies even though all process variables appear to have returned to their normal values, especially important in quality-related process fault detection (Lee et al. 2006)”. In fact, the disturbance in condenser cooling water, such as its flow rate, always affects the output quality. It should be pointed that the cooling water flow rate of the condenser plays an important role both in the output quality and the safety of the chemical plant. This fault cannot be eliminated by the series controller and should be alarming. Although the controller can compensate the variations caused by this fault, the process-related monitoring in \(\mathrm Q \) statistic, (Fig. 10.6), provides a consistent alarm. Experimental results show that the PLS and GLPLS models do not actually capture the source of the fault, while LPPLS does.

Fig. 10.6
figure 6

PLS, GLPLS, and LPPLS monitoring for IDV(5)

Experiment 3: Constant Position in Valve of Steam 4

Fault IDV (21) due to the slow output drift has been little studied. The sensitivity of fault detection is related to the magnitude of the mass drift. Therefore, fast detection of fault IDV(21) is beneficial for quality control. The process monitoring results are shown in Fig. 10.7. For GLPLS, LPPLS, and PLS, this fault is fully detected as quality-related after about 650, 720, and 780 samples, respectively. LPPLS and GLPLS detect the fault IDV(21) faster than PLS method.

Fig. 10.7
figure 7

PLS, GLPLS, and LPPLS monitoring for IDV(21)

The following conclusions are drawn from the above experiments.

  • PLS is a linear model, so it cannot accurately identify some faults for the strong nonlinear systems.

  • GLPLS and LPPLS shows better extracting for nonlinear correlation by introducing the locality-preserving ability of LPP strategy.

  • GLPLS aims at preserving the local features in the input space and output space, but lacks the correlation between them. GLPLS is actually a linear PLS plus partial locality preserving, in which the role of LPP is not fully reflected. This may lead to the false detection or missed detection in fault detection.

  • LPPLS makes full use of the LPP algorithm to achieve local nonlinear structure preservation. It decomposes the global nonlinear problem into a combination of multiple local linear problems by introducing local structure information. Therefore, LPPLS establishes an more effective model for the nonlinear correlation between the input space and the output space compared with GLPLS.

10.4 Conclusions

In this chapter, the LPPLS statistical model is proposed and the LPPLS-based quality-related fault detection and prediction is given. LPPLS not only retains the local information of the original data, but also maintains the correlation between X and \({\boldsymbol{Y}}\) to the maximum extent, thus achieving accurate prediction of quality variables. The LPPLS encapsulates the excellent detection performance for locally nonlinear systems, due to the local feature extraction ability controlled by two parameters, \(\delta _x\) and \(\delta _{y}\). Experiment results on the artificial three-dimensional data sets, S-curve and Swiss roll, show that LPPLS maintains local structural features well. The experiment results on TEP simulator show that LPPLS extracts the local nonlinear features more effectively and has better fault detection performance than PLS and GLPLS models.