Background

Hand, foot, and mouth disease (HFMD) is a common infectious disease caused by a group of enteroviruses, particularly among children under the age of 5 [

Results

The development of ARIMA-EEMD-LSTM

In this study, the original time series was divided into a training set, covering the period from 1 January 2015, to 7 January 2022 (80% of the data), and a testing set, covering the period from 8 January 2022, to 27 July 2023 (20% of the data). A rolling forecast approach was employed, where 60 days of historical data were used to predict the next 1 day.

To begin, the 'forecast' package in R was utilized. The 'auto.arima' function was employed to identify the optimal model parameters for the training data, resulting in the creation of an ARIMA(5,1,2) model. The ARIMA model was fitted to the training set and used to make predictions on the testing set.

The EEMD method was applied to decompose the residual series of the ARIMA model, and the results are shown in Fig. 3. The original residual series was decomposed into 11 IMF series and 1 trend series. The IMF series with lower indices represent high-frequency signals in the original sequence, while the IMF series with higher indices represent low-frequency signals. From the decomposition results, it can be observed that the original data contains significant high-frequency signals. When these signals are included in the original time series, they are not easily learned by the LSTM model. However, separating these signals facilitates the learning process for LSTM.

Fig. 3
figure 3

The results of the original data decomposed by EEMD

These decomposed series were used as inputs to train the LSTM models, and the performances of these models on the testing set is shown in Fig. 4. It can be observed that the predicted values of each component series closely match the true values in terms of numerical values and trend, without significant lag.

Fig. 4
figure 4

Comparison of predicted values and real values for each IMFs

The predicted values of the IMF series and the trend series were summed up to obtain the predicted results of the residual series, as shown in Fig. 5. Compared to the actual residual series, the predicted series demonstrates strong consistency in terms of frequency and amplitude of fluctuations, indicating a good predictive effect for the residual series.

Fig. 5
figure 5

Comparison of predicted values and real values for residual series

Finally, the predicted values of the ARIMA model and the residual series were added up to obtain the final predicted values, which were compared to the true values in Fig. 6. From the figure, it can be observed that the model accurately predicts the changing trend of the original time series and can capture significant fluctuations.

Fig. 6
figure 6

Comparison of predicted values and real values for HFMD confirmed cases

The development of other models

In this study, we developed 4 more models as comparison: the ARIMA model, the LSTM model, the ARIMA-LSTM model and the EEMD-LSTM model. The results of those models are shown in Supplemental Figures 14.

Model evaluation and comparison

The evaluation results of the hybrid ARIMA-EEMD-LSTM model, as well as the ARIMA, LSTM, ARIMA-LSTM, and EEMD-LSTM models on the training set and the testing set, are shown in Table 1.

Table 1 Comparison of the prediction performances between ARIMA-EEMD-LSTM and other models

The proposed ARIMA-EEMD-LSTM model achieved an RMSE of 4.37, MAE of 2.94, and an R2 of 0.996 on the testing set, demonstrating accurate predictions of the incidence of HFMD. In comparison, the ARIMA model had an RMSE of 6.95, MAE of 3.68, and an R2 of 0.990, while the LSTM model had an RMSE of 13.93, MAE of 8.07, and an R2 of 0.961. The hybrid model outperformed these single models in accuracy and goodness of fit, achieving better predictive performance.

Furthermore, two other hybrid models, ARIMA-LSTM and EEMD-LSTM, were also developed. On the testing set, the ARIMA-LSTM model had an RMSE of 9.85, MAE of 8.11, and an R2 of 0.980, while the EEMD-LSTM model had an RMSE of 6.20, MAE of 3.98, and an R2 of 0.992. Compared with the LSTM model, EEMD-LSTM showed improvements in RMSE from 13.93 to 6.20, MAE from 8.07 to 3.98, and R2 from 0.961 to 0.992. Compared with the ARIMA-LSTM model, ARIMA-EEMD-LSTM showed improvements in RMSE from 9.85 to 4.37, MAE from 8.11 to 2.94, and R2 from 0.980 to 0.996. These results indicate that the inclusion of the EEMD method significantly enhances the predictive performance of the models.

Overall, the hybrid ARIMA-EEMD-LSTM model demonstrates superior predictive accuracy and fitness compared with the ARIMA, LSTM, ARIMA-LSTM, and EEMD-LSTM models. The addition of the EEMD method contributes to the improvement of the model's predictive performance.

Discussion

In this study, we proposed a novel hybrid prediction model which combined the strength of linear statistical model, advanced deep learning model and the cutting-edge EEMD technology to achieve accurate prediction for HFMD incidence. The proposed hybrid ARIMA-EEMD-LSTM model outperformed the other four prediction models developed in this study-ARIMA, LSTM, ARIMA-LSTM and EEMD-LSTM according to the evaluation results, which means the ARIMA-EEMD-LSTM model provides more accurate predictions.

ARIMA, as a classical time series prediction model, has been applied widely in disease predictions [18,19,20]. However, since belongs to lineal models, ARIMA can only capture the linear characteristics. Many time series in real world contain a mixture of linear and non-linear features, which poses challenges for the predictions of ARIMA model. But the deep learning algorithm can compensate for this limitation. The combination of ARIMA model and LSTM model,the widely used deep learning model for time series,keeps ARIMA’s advantage in capturing linear trends and dependencies within time series while excels at capturing complex,nonlinear patterns and long-term dependencies.

EEMD is a novel technology for processing non-linear and non-stationary data, and has been successfully applied in various fields [21,22,23]. However, there have been few studies which use EEMD for epidemic predictions. With EEMD method, complex data can be decomposed into relatively simple components that are more suitable for model training. This compensates for the limitation of the LSTM model in dealing with nonstationary time series.

In this study, we compared the hybrid ARIMA-EEMD-LSTM model with two single models-ARIMA and LSTM, and two hybrid models-ARIMA-LSTM and EEMD-LSTM. The evaluation results showed that the ARIMA-EEMD-LSTM model exhibited the best predictive performance with the RMSE, MAPE and R2 of 4.37, 2.94 and 0.996, respectively. The predcition performance of the proposed model suggests its potential utility in epidemic prevention and control. And the two models integrated with EEMD method showed significant improvement in predictive capability when compared with other three models. The inclusion of EEMD can have great impact on model performance, offering novel insights for modeling of disease time series.

There are also several limitations in this study. Firstly, the data used in this study were from the National Children's Regional Medical Center (Southwest Region), and more cross-center studies are needed to verify the validity and generalizability of the results. Secondly, models developed in this study only utilized daily cases of HFMD, and more related factors such as temperature and humidity should be considered to furtherly enhance the prediction performance.

Conclusion

In conclusion, this study proposed an innovative hybrid ARIMA-EEMD-LSTM model for predicting the incidence of HFMD. By integrating the strengths of the ARIMA model, LSTM model, and EEMD method, the hybrid model achieved enhanced prediction accuracy and fit, and can serve as a valuable tool for healthcare professionals and policymakers in understanding and managing the spread of HFMD and other epidemics.