1 Introduction

Rolling mills are important processing equipment in the steel industry. However, under the production requirements of thinner specification and faster rolling speed, the restrictions of mill chatter on production efficiency, equipment safety and product quality are becoming increasingly serious. It is necessary to investigate the chatter problem through both mechanism modeling and data-driven methods.

Because of the complexity of the equipment and production process, the interpretations and suppression of chatter phenomenon are generally depended on the construction of correct and effective mechanistic models of rolling mill. Thus, in recent years, scholars have made further explorations in mechanistic modeling of rolling chatter and the improvement of rolling stability. Mehrabi et al. [1] established a finite element model of rolling mill and explored the influence of rolling parameters on cold rolling chatter. Kapil et al. [2] developed the governing equation of working roll motion based on finite element method to predict the shape of the outcoming strip profile and exit stress variation. Mosayeb et al. [3] established a vibration model with two degrees of freedom and investigated the movement of roll during vibration. Heidari et al. [4] proposed a chatter model of the cold strip rolling with consideration of unsteady lubrication and investigated the influence of lubrication parameters on critical rolling speed. Fujita et al. [5] investigated the effect of lubrication properties to the oil film and realized the dynamic control of friction coefficient in a tandem rolling mill based on self-excited vibration model. Cao et al. [6] proposed an unsteady lubrication model considering entraining and squeezing effect and investigated the parameters change during unstable rolling process. LIU et al. [7] investigated the effect of different factors on the stability of tandem rolling mills and revealed the mechanism of regenerative chatter. Zheng et al. [8, 9] established the spatial rolling mill vibration model based on the finite element method and modified Riccati transfer matrix method respectively. Zeng et al. [10] suggested a vertical-torsional-horizontal coupling chatter model and analyzed the system stability based on Routh criterion. LU et al. [11] proposed a time-varying stability criterion to evaluate mill stability and made optimal design of rolling reduction and friction. GAO et al. [12, 13] established a dynamic model of cold rolling with structure-process-control coupled and put forward a definition of critical rolling speed to predict and suppress chatter.

These researches have made great contribution to the explanation and suppression of chatter phenomenon. However, because of the complexity of the equipment and nonlinear coupling characteristics of production process, the mechanistic models have to make certain assumptions to simplify the calculation, which leads to the incomplete reflection of actual production. And many models are focus on the optimal design of process before production, thus it is not well-suited for the online monitoring of rolling process. With the development of sensing technology, researchers began to apply advanced data analysis methods to study chatter problem and fault diagnosis in various fields of iron and steel industry, which provides inspiration to solve the problems of mechanistic models. Lamraoui et al. [14] applied two different neural networks to identify the milling chatter. Chen et al. [15] used the method of information entropy and support vector machine to predict chatter in milling. Wang et al. [16] achieved some conclusions about chatter identification and diagnosis based on signal Q-factor and support vector machine. Chen et al. [17] combined the extracted image features and the support vector machine to execute the detection of chatter. Zheng et al. [18] detected milling chatter by the approaches of wavelet packet transform and support vector machine. Lu et al. [19] used support vector regression, neural network, and extreme gradient boosting methods to realize the real-time prediction of vibration. Liu et al. [20] realized the one-step-ahead prediction of the vibration energy with the method of LSTM recurrent neural network. Serdio et al. [21] proposed a new fault detection method for rolling mill through residual approach and compared with other fault diagnosis methods. Peng et al. [22] developed a new fault diagnosis method in hot rolling process based on principal component regression. Hdz-Jasso et al. [23] proposed a new method to monitor absorption heat transformer components based on the statistical control process. Pan et al. [24] developed a data-driven feature identification method to detect the faults of bearing in hot rolling mill. Ma et al. [25] proposed a novel fault diagnosis method for hot strip mill based on the combination of multiple algorithms.

These methods consider more influential factors and reflect the production process more comprehensively. However, compared with the chatter identification in milling and fault diagnosis in rolling process, there are relatively few studies on the cold rolling chatter analysis based on data-driven methods. In addition, the current studies are focus on the real-time prediction [19] or one-step-ahead prediction [20] of chatter state in cold rolling process, which cannot greatly improve the rapid response ability when chatter occurs. Therefore, in this paper, a method of multi-step-ahead prediction of cold rolling chatter state based on the combination of Functional Data Analysis and General Autoregression Model is proposed. The structure of this paper is as follows:

In Sect. 2, the data collection system and sampling frequency of multi-source cold rolling data are described, and the criteria for the occurrence of cold rolling chatter are explained. In Sect. 3, the Functional Data Analysis is used to realize the smooth fitting of the unequally sampled rolling process data and compared with the interpolation method. In Sect. 4, the sample space with 15 process parameters as input features and vibration energy as output feature is constructed based on chatter mechanism. In Sect. 5, the multi-step-ahead prediction of cold rolling chatter are realized through different machine learning algorithms in General Autoregression Model, and the prediction effects under different algorithms are compared. The optimal algorithm is selected according to the maximum prediction step. Section 6 is the conclusion of the investigation.

2 Collection of cold rolling data

2.1 Vibration data

The vibration data of the 1420 tandem cold rolling mill are collected by the vibration online monitoring system shown in Fig. 1, and the vibration data mainly include the vibration acceleration signal and vibration energy of different stands.

Fig. 1
figure 1

Vibration online monitoring system

The vibration acceleration signal is obtained by the piezoelectric vibration acceleration sensor installed on the upper part of the rolling mill, and its sampling frequency is 5120 Hz. Moreover, in order to more clearly reflect the intensity and trend of vibration, the concept of vibration energy shown in Eq. (1) is proposed.

$$E\, = \,\sum\limits_{i = 1}^{n} {[y(i)]^{2} }$$
(1)

where \(n\) denotes the number of sampling points within sliding window; \(y(i)\) is the amplitude of vibration acceleration. \(n\) is set to 1024 in monitoring system, thus the sampling interval of vibration energy is 0.2 s. The vibration energy within the picked time window is calculated by Eq. (1), which can accurately reflect the vibration trend. In addition, the vibration signal is smoothed through the approach of sliding window and the actual signal noise can be effectively reduced.

The warning line for the vibration energy is calculated by the monitoring system to distinguish between the vibration state and the steady state. When the vibration energy exceeds the warning line, the system immediately issues an alarm and the measures such as reducing the rolling speed are taken to avoid more severe vibration. The rolling speed and the vibration data of the fifth stand are shown in Fig. 2.

Fig. 2
figure 2

Rolling speed and vibration data of the fifth stand

2.2 Rolling process data

The process parameters of each stand are sampled by the process data acquisition system and its sampling intervals are different from 5 to 7 s. The types of process parameters are shown in Fig. 3.

Fig. 3
figure 3

Rolling process data

3 Data preprocessing based on Function Data Analysis

The cold rolling data are collected from different systems, so the sampling frequencies are inconsistent between vibration data and rolling process data, affecting the subsequent calculation and analysis. In order to solve this problem, the Functional Data Analysis is used to expand the rolling process data.

3.1 Functional Data Analysis

The key point of Functional Data Analysis [26, 27] is to fit the original discrete data into function form by linear combination of the basis function, as shown in Eq. (2).

$$x_{i} (t)\, = \,\sum\limits_{n = 1}^{{N_{i} }} {c_{i,n} \varphi_{i,n} (t)} \, = \,{\mathbf{C}}_{i}^{{\text{T}}} {{\varvec{\Phi}}}_{i} (t)$$
(2)

where \(x_{i} (t)\) is the functional expression of the \(ith\) variable; \(\varphi_{i,n} (t)\) is the \(nth\) basis function used for fitting \(x_{i} (t)\); \({\mathbf{C}}_{i} \, = \,[c_{i,1} ,...,c_{{i,N_{i} }} ]^{{\text{T}}}\) is the vector of coefficient; \({{\varvec{\Phi}}}_{i} \, = \,[\varphi_{i,1} ,...,\varphi_{{i,N_{i} }} ]^{{\text{T}}}\) is the vector of basis function; \(N_{i}\) is the number of basis function.

Then, a penalty term is introduced to avoid over-fitting, as shown in Eq. (3).

$$PEN\left( t \right)\, = \,\int {\left[ {{\text{D}}^{2} (x_{i} (t))} \right]^{2} {\text{d}}t}$$
(3)

where \({\text{D}}^{2} ( \cdot )\) is the second derivative of a function.

The final expression of the penalty item is obtained by substituting Eq. (2) for Eq. (3), as shown in Eq. (4).

$$\begin{gathered} PEN\left( t \right)\, = \,\int {\left[ {{\text{D}}^{2} (x_{i} (t))} \right]^{2} {\text{d}}t} \hfill \\ \;\;\;\;\;\;\;\;\;\;\; = \int {\left[ {{\text{D}}^{2} ({\mathbf{C}}_{i}^{{\text{T}}} {{\varvec{\Phi}}}_{i} (t))} \right]^{2} {\text{d}}t} \hfill \\ \;\;\;\;\;\;\;\;\;\;\; = \int {{\mathbf{C}}_{i}^{{\text{T}}} {\text{D}}^{2} ({{\varvec{\Phi}}}_{i} (t)){\text{D}}^{2} ({{\varvec{\Phi}}}_{{_{i} }}^{{\text{T}}} (t)){\mathbf{C}}_{i}^{{}} {\text{d}}t} \hfill \\ \;\;\;\;\;\;\;\;\;\;\; = {\mathbf{C}}_{i}^{{\text{T}}} (\int {{\text{D}}^{2} ({{\varvec{\Phi}}}_{i} (t)){\text{D}}^{2} ({{\varvec{\Phi}}}_{{_{i} }}^{{\text{T}}} (t)){\text{d}}t} ){\mathbf{C}}_{i}^{{}} \hfill \\ \;\;\;\;\;\;\;\;\;\;\; = {\mathbf{C}}_{i}^{{\text{T}}} {\mathbf{R}}_{i} {\mathbf{C}}_{i}^{{}} \hfill \\ \end{gathered}$$
(4)

where \({\mathbf{R}}_{i} \, = \,\int {{\text{D}}^{2} ({{\varvec{\Phi}}}_{i} (t)){\text{D}}^{2} ({{\varvec{\Phi}}}_{{_{i} }}^{{\text{T}}} (t)){\text{d}}t}\).

Finally, the least square method is used to estimate the vector of coefficient \({\mathbf{C}}_{i}\), as shown in Eq. (5).

$$\begin{gathered} L\left( {{\mathbf{C}}_{i} } \right)\, = \,\sum\limits_{{t = t_{0} }}^{T} {\left[ {z_{i} (t)\, - \,x_{i} (t)} \right]^{2} \, + \,\lambda PEN(t)} \hfill \\ \;\;\;\;\;\;\;\;\; = \left( {{\mathbf{Z}}_{{\text{i}}} \, - \,{\mathbf{HC}}_{{\text{i}}} } \right)^{{\text{T}}} \left( {{\mathbf{Z}}_{{\text{i}}} \, - \,{\mathbf{HC}}_{{\text{i}}} } \right)\, + \,\lambda {\mathbf{C}}_{i}^{{\text{T}}} {\mathbf{R}}_{i} {\mathbf{C}}_{i}^{{}} \hfill \\ \end{gathered}$$
(5)

where \(z_{i} (t)\) is the original discrete sample points of the \(ith\) variable; \({\mathbf{Z}}_{i} \, = \,[z_{i} (t_{0} ),...,z_{i} (T)]^{{\text{T}}}\) is the vector of original discrete sample points; \(\lambda\) is the penalty coefficient; \(T\) is the number of original discrete sample points; \({\mathbf{H}}\) is shown in Eq. (6).

$${\mathbf{H}}\, = \,\left[ \begin{gathered} \varphi_{i,1} \left( {t_{0} } \right)\;\;\;\;\;\; \cdots \;\;\;\;\varphi_{{i,N_{i} }} \left( {t_{0} } \right) \hfill \\ \;\;\;\;\; \vdots \;\;\;\;\;\;\;\;\;\; \ddots \;\;\;\;\;\; \vdots \hfill \\ \;\;\varphi_{i,1} \left( T \right)\;\;\;\; \cdots \;\;\;\;\varphi_{{i,N_{i} }} \left( T \right)\; \hfill \\ \end{gathered} \right]$$
(6)

The vector of coefficient solved by the least square method is shown in Eq. (7).

$${\mathbf{C}}_{i} \, = \,\left( {{\mathbf{H}}^{{\text{T}}} {\mathbf{H}} + \lambda {\mathbf{R}}_{i} } \right)^{ - 1} {\mathbf{H}}^{{\text{T}}} {\mathbf{Z}}_{i}$$
(7)

3.2 Preprocessing of heterogeneous data

In order to make the sampling frequencies of the vibration energy and rolling process data consistent, the different process parameters are regarded as variables and the Functional Data Analysis is used to fit these variables into functional forms. Then, the process parameters are resampled at intervals of 0.2 s to expand the amount of data. Taking the strip speed of the fifth stand as an example, the Functional Data Analysis method and interpolation method are used to expand the data respectively, and the comparison results are shown in Fig. 4. Compared with the interpolation method, the smooth fitting and expansion of the discrete data and even its derivatives are realized by the Functional Data Analysis method, especially in the position where the value sudden changes.

Fig. 4
figure 4

Data expansion based on Functional Data Analysis method and interpolation method

4 Construction of sample space

The fifth stand with the most frequent vibration and the highest vibration intensity is taken as the research object. The vibration energy of the fifth stand is taken as the output feature \(y(t)\) to predict, and different process parameters are taken as the input features \(x_{i}^{{}} (t)\) used for prediction.

However, because there are many kinds of process parameters, it is still necessary to screen the important process parameters may leading to vibration of the fifth stand. In the vibration mechanism of cold rolling, the process parameters of the fifth stand shown in Fig. 5 are closely related to the occurrence of chatter [13]. In addition, as shown in Fig. 3, different stands are connected together by strip in cold tandem rolling process, and the stability of the fifth stand is also affected by the process parameters of the fourth stand adjacent to the fifth stand [7]. Therefore, according to the above analysis of chatter mechanism, the important process parameters related to chatter occurrence are screened as the input features and the sample space with the vibration energy as the output feature is constructed, as shown in Fig. 6.

Fig. 5
figure 5

Process parameters of the fifth stand

Fig. 6
figure 6

Sample space

There is a large difference in the order of magnitude between different process parameters, affecting the calculation accuracy of the model. Thus, the normalization method shown in Eq. (8) is used to eliminate this difference.

$$x_{i}^{\prime } (t)\, = \,\frac{{x_{i}^{{}} (t)\, - \,(x_{i}^{{}} (t))_{\min } }}{{(x_{i}^{{}} (t))_{\max } \, - \,(x_{i}^{{}} (t))_{\min } }}$$
(8)

where \(x_{i}^{\prime } (t)\) is the normalized value of the \(ith\) process parameter; \(x_{i}^{{}} (t)\) is the original value of the \(ith\) process parameter.

The sample space contains 45273 sample points, with 90% of the sample points as the training set and 10% of the sample points as the test set. The output feature of the test set is shown in Fig. 7.

Fig. 7
figure 7

Output feature of the test set

5 Chatter prediction based on General Autoregression Model

The traditional autoregressive model [28] assumes that there is a linear relationship between the historical data and the data to be predicted. The General Autoregression Model used in this paper is an extension of the autoregressive model, because the fitting relationships between historical data and the data to be predicted can be obtained through different machine learning algorithms, including linear fitting relationship in autoregressive models and nonlinear fitting relationships. Thus, it is more applicable and convenient for selecting the optimal algorithm by comparing the prediction effects of different algorithms.

5.1 General Autoregression Model

The General Autoregression Model relies on the histories of both the output feature values and input features values to make k-step-ahead prediction. The model structure is represented as follows.

$$\begin{gathered} y(t\, + \,k * m)\, = \,f(y(t),...,y(t\, - \,(p\, - \,1) * m),x_{1} (t),...,x_{1} (t\, - \,(p\, - \,1) * m), \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;......,x_{i} (t),...,x_{i} (t\, - \,(p\, - \,1) * m),......) \hfill \\ \end{gathered}$$
(9)

where \(x_{i} (t)\) is the \(ith\) input feature used for prediction; \(y(t)\) is the output feature to predict; \(k\) is the prediction step (k-step-ahead prediction); \(p\) is the regression order (historical data of input features and output feature); \(m\) is the sampling interval and \(m = 0.2\); \(f( \cdot )\) is the actual model that varies among different machine learning algorithms.

An example of General Autoregression Model is shown in Fig. 8. When the regression order \(p{ = }4\) and prediction step \(k{ = 3}\) and there are 8 sample points in the time series, the input and output features from the \(1st\) sample point to the \(4th\) sample point are used to predict the output feature of the \(7th\) sample point, and so on.

Fig. 8
figure 8

An example of General Autoregression Model

Compared with real-time prediction [19] or one-step-ahead prediction [20], the future information can be predicted more quickly through the General Autoregression Model, that is, multi-step-ahead prediction. However, the prediction effects of General Autoregression Model largely depend on the choice of machine learning algorithms. In order to investigate the algorithm that more suitable for chatter prediction, five classical regression algorithms are used to carry out experiments, which are K-Neighbors Regression (KNR), Support Vector Regression (SVR), Random Forest Regression (RFR), Extra Tree Regression (ETR) and Extreme Gradient Boosting Tree Regression (XGBR).

Different fitting relationships \(f( \cdot )\) that varies among these regression algorithms are obtained from the training set, then the multi-step-ahead prediction of vibration energy in test set is realized. In order to compare different algorithms, the Mean Absolute Percentage Error (MAPE) and Mean Square Error (MSE) are used to evaluate the prediction effects.

$$MAPE\, = \,\frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left| {\frac{{y_{i} \, - \,\widehat{{y_{i} }}}}{{y_{i} }}} \right|}$$
(10)
$$MSE\, = \,\frac{1}{N}\sum\nolimits_{i = 1}^{N} {(y_{i} - \widehat{{y_{i} }})}^{2}$$
(11)

5.2 Prediction effects of different algorithms

Fixing the regression order \(p\) to 3 and the prediction step \(k\) to 2, the predictions of vibration energy in test set by different algorithms are shown in Fig. 9 and the corresponding evaluation criteria are shown in Fig. 10. The predicted values of RFR, ETR and XGBR accurately reflect the trend of the actual value and exceed the warning line when chatter occurs. In addition, the low MAPE values and MSE values show that RFR, ETR and XGBR possess excellent prediction effects. However, the predicted values of KNR and SVR are distorted. Thus, the prediction effects change with different algorithms.

Fig. 9
figure 9

Prediction of vibration energy by different algorithms

Fig. 10
figure 10

Evaluation criteria of different algorithms

5.3 Prediction effects under different prediction step

Fixing the regression order \(p\) to 8 and the algorithm to ETR, the prediction step \(k\) changes from 2 to 11. The predictions of vibration energy in test set are shown in Fig. 11 and the corresponding evaluation criteria are shown in Fig. 12.

Fig. 11
figure 11figure 11

The influence of prediction step

Fig. 12
figure 12

Evaluation criteria under different prediction step

When fixing the regression order \(p\) and the machine learning algorithm, with the increase of the prediction step \(k\), the difficulty of prediction gradually increase and the prediction effects get worse. Thus, the false alarm of chatter and the distortion of prediction also appear. However, it is a gradual process from correct prediction to prediction distortion. Therefore, the concept of maximum prediction step is proposed, which has the following implications. (1) When the designed prediction step is less than or equal to the maximum prediction step, the correct prediction of vibration energy is realized, namely that the predicted values correctly reflect the trend of actual value and exceed the warning line when chatter occurs. (2) When the designed prediction step is greater than the maximum prediction step, false alarm and prediction distortion occur. For example, in Fig. 11, the maximum prediction step is 4.

The larger the maximum prediction step, the faster and earlier the algorithm can alarm the cold rolling chatter, improving the rapid response ability when chatter occurs and leaving more time for applying chatter suppression measures. Thus, the algorithm with larger maximum prediction step under different working conditions is more suitable for chatter prediction.

5.4 Optimal selection of algorithms

By changing the regression order \(p\) and the prediction step \(k\) to form a variety of working conditions, the maximum prediction step of various algorithms under different working conditions are summarized, as shown in Fig. 13. However, KNR and SVR can only make one-step-ahead prediction at most, otherwise there will be prediction distortion, as shown in Fig. 9. Thus, the maximum prediction steps of ETR, RFR and XGBR are mainly compared.

Fig. 13
figure 13

The maximum prediction step of different algorithms under different working conditions

The maximum prediction step is taken as the standard to make the optimal selection of different algorithms. Under most working conditions, the maximum prediction step of ETR algorithm is higher than that of other algorithms, thus the ETR algorithm is more suitable to achieve multi-step-ahead prediction of vibration energy and realize early warning of cold rolling chatter state.

6 Conclusion

In order to achieve the goal of early prediction and control of cold rolling chatter, a new method of chatter monitoring is proposed based on the combination of Functional Data Analysis (FDA) and General Autoregression Model (GAM). The FDA is used to achieve the consistency of sampling frequency of multi-source cold rolling data. Then, the multi-step-ahead prediction of cold rolling chatter is realized by the GAM. Different machine learning algorithms are applied to predict the chatter state and the maximum prediction step is defined as an index evaluating their prediction effects. The larger the maximum prediction step, the earlier the chatter state can be predicted and suppressed. Under different working conditions, the maximum prediction step of Extra Tree Regression algorithm is higher than that of other algorithms which is more suitable to achieve multi-step-ahead prediction of cold rolling chatter.

Compared with the real-time prediction or one-step-ahead prediction, the multi-step-ahead prediction of cold rolling chatter will be helpful to improve the identification sensitivity of chatter state and provide more time to execute suppression measures. In the future research, more machine learning algorithms can be used to fit the relationship between historical data and data to be predicted, and this prediction method can be applied to other industrial fields that require earlier prediction of events that may interrupt normal production.