Log in

ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in \(<\,2\,\hbox {h}\), and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31(3):606–660

    Article  MathSciNet  Google Scholar 

  • Bagnall A, Lines J, Vickers W, Keogh E (2019) The UEA & UCR time series classification repository. http://www.timeseriesclassification.com

  • Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. ar**v:1803.01271

  • Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(5):1–10

    MathSciNet  MATH  Google Scholar 

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  • Bostrom A, Bagnall A (2015) Binary shapelet transform for multiclass time series classification. In: Madria S, Hara T (eds) Big data analytics and knowledge discovery. Springer, Cham, pp 257–269

    Chapter  Google Scholar 

  • Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311

    Article  MathSciNet  Google Scholar 

  • Boureau YL, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: Fürnkranz J, Joachims T (eds) Proceedings of the 27th international conference on machine learning, Omnipress, USA, pp 111–118

  • Cox D, Pinto N (2011) Beyond simple features: a large-scale feature search approach to unconstrained face recognition. Face Gesture 2011:8–15

    Google Scholar 

  • Cui Z, Chen W, Chen Y (2016) Multi-scale convolutional neural networks for time series classification. ar**v:1603.06995

  • Dau HA, Bagnall A, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The UCR time series archive. J Autom Sinica 6(6):1293–1305

    Article  Google Scholar 

  • Dau HA, Keogh E, Kamgar K et al (2018) UCR time series classification archive (briefing document). https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Dongarra J, Gates M, Haidar A, Kurzak J, Luszczek P, Tomov S, Yamazaki I (2018) The singular value decomposition: anatomy of optimizing an algorithm for extreme scale. SIAM Rev 60(4):808–865

    Article  MathSciNet  Google Scholar 

  • Farahmand A, Pourazarm S, Nikovski D (2017) Random projection filter bank for time series data. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. MIT Press, Cambridge, pp 6562–6572

    Google Scholar 

  • Franceschi J, Dieuleveut A, Jaggi M (2019) Unsupervised scalable representation learning for multivariate time series. In: Seventh international conference on learning representations, learning from limited labeled data workshop

  • García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694

    MATH  Google Scholar 

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Disc 28(4):851–881

    Article  MathSciNet  Google Scholar 

  • Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P (2019a) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963

    Article  MathSciNet  Google Scholar 

  • Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P (2019b) Deep neural network ensembles for time series classification. In: International joint conference on neural networks, pp 1–6

  • Ismail Fawaz H, Lucas B, Forestier G, Pelletier C, Schmidt DF, Weber J, Webb GI, Idoumghar L, Muller P, Petitjean F (2019c) InceptionTime: finding AlexNet for time series classification. ar**v:1909.04939

  • Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th international conference on computer vision, pp 2146–2153

  • Jimenez A, Raj B (2019) Time signal classification using random convolutional features. In: 2019 IEEE international conference on acoustics, speech and signal processing

  • Karlsson I, Papapetrou P, Boström H (2016) Generalized random shapelet forests. Data Min Knowl Disc 30(5):1053–1085

    Article  MathSciNet  Google Scholar 

  • Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: Third international conference on learning representations. ar**v:1412.6980

  • Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates Inc, Red Hook, pp 1097–1105

    Google Scholar 

  • Lam SK, Pitrou A, Seibert S (2015) Numba: a LLVM-based python JIT compiler. In: Proceedings of the second workshop on the LLVM compiler infrastructure in HPC, pp 1–6

  • Le Nguyen T, Gsponer S, Ilie I, O’Reillly M, Ifrim G (2019) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min Knowl Disc 33(4):1183–1222

    Article  MathSciNet  Google Scholar 

  • Lin M, Chen Q, Yan S (2014) Network in network. In: Second international conference on learning representations. ar**v:1312.4400

  • Lines J, Taylor S, Bagnall A (2018) Time series classification with HIVE-COTE: the hierarchical vote collective of transformation-based ensembles. ACM Trans Knowl Discov Data 12(5):52:1–52:35

    Article  Google Scholar 

  • Lubba CH, Sethi SS, Knaute P, Schultz SR, Fulcher BD, Jones NS (2019) catch22: CAnonical Time-series CHaracteristics. Data Min Knowl Disc 33(6):1821–1852

    Article  Google Scholar 

  • Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Disc 33(3):607–635

    Article  Google Scholar 

  • Middlehurst M, Vickers W, Bagnall A (2019) Scalable dictionary classifiers for time series classification. In: Yin H, Camacho D, Tino P, Tallón-Ballesteros AJ, Menezes R, Allmendinger R (eds) Intelligent data engineering and automated learning. Springer, Cham, pp 11–19

    Google Scholar 

  • Morrow A, Shankar V, Petersohn D, Joseph A, Recht B, Yosef N (2016) Convolutional kitchen sinks for transcription factor binding site prediction. In: NIPS workshop on machine learning in computational biology

  • Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free? Weakly-supervised learning with convolutional neural networks. In: 2015 IEEE conference on computer vision and pattern recognition, pp 685–694

  • Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS autodiff workshop

  • Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Petitjean F, Inglada J, Gancarski P (2012) Satellite image time series analysis under time war**. IEEE Trans Geosci Remote Sens 50(8):3081–3095

    Article  Google Scholar 

  • Pinto N, Doukhan D, DiCarlo JJ, Cox DD (2009) A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput Biol 5(11):1–12

    Article  MathSciNet  Google Scholar 

  • Rahimi A, Recht B (2008) Random features for large-scale kernel machines. In: Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems, vol 20. Curran Associates Inc, Red Hook, pp 1177–1184

    Google Scholar 

  • Rahimi A, Recht B (2009) Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. MIT Press, Cambridge, pp 1313–1320

    Google Scholar 

  • Raza A, Kramer S (2019) Accelerating pattern-based time series classification: a linear time and space string mining approach. Knowl Inf Syst 62:1113–1141

    Article  Google Scholar 

  • Renard X, Rifqi M, Erray W, Detyniecki M (2015) Random-shapelet: an algorithm for fast shapelet discovery. In: IEEE international conference on data science and advanced analytics, pp 1–10

  • Rifkin RM, Lippert RA (2007) Notes on regularized least squares. Technical report, MIT

  • Saxe A, Koh PW, Chen Z, Bhand M, Suresh B, Ng A (2011) On random weights and unsupervised feature learning. In: Getoor L, Scheffer T (eds) Proceedings of the 28th international conference on machine learning, Omnipress, USA, pp 1089–1096

  • Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Disc 29(6):1505–1530

    Article  MathSciNet  Google Scholar 

  • Schäfer P, Leser U (2017) Fast and accurate time series classification with WEASEL. In: Proceedings of the 2017 ACM conference on information and knowledge management, pp 637–646

  • Shifaz A, Pelletier C, Petitjean F, Webb GI (2020) TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min Knowl Discov 34:742–775

    Article  MathSciNet  Google Scholar 

  • Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 international joint conference on neural networks, pp 1578–1585

  • Wistuba M, Grabocka J, Schmidt-Thieme L (2015) Ultra-fast shapelets for time series classification. ar**v:1503.05018

  • Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. MIT Press, Cambridge, pp 3320–3328

    Google Scholar 

  • Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Fourth international conference on learning representations. ar**v:1511.07122

  • Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) European conference on computer vision. Springer, Cham, pp 818–833

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by an Australian Government Research Training Program Scholarship; the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under award number FA2386-18-1-4030; and the Australian Research Council under awards DE170100037 and DP190100017. The authors would like to thank Professor Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. Figures showing the ranking of different classifiers and variants of Rocket were produced using code from Ismail Fawaz et al. (2019a).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angus Dempster.

Additional information

Responsible editor: Aristides Gionis, Carlotta Domeniconi, Eyke Hüllermeier, Ira Assent.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

Relative accuracy

1.1 ‘Bake Off’ datasets

See Fig. 13.

Fig. 13
figure 13

Relative accuracy of Rocket versus state-of-the-art classifiers on the ‘bake off’ datasets

1.2 Additional 2018 datasets

See Fig. 14.

Fig. 14
figure 14

Relative accuracy of Rocket versus state-of-the-art classifiers, additional 2018 datasets

‘Development’ and ‘Holdout’ datasets

See Figs. 15 and 16.

Fig. 15
figure 15

Mean rank of Rocket versus state-of-the-art classifiers on the ‘holdout’ datasets

Fig. 16
figure 16

Mean rank of Rocket versus state-of-the-art classifiers on the ‘development’ datasets

Additional plots for the sensitivity analysis

See Figs. 17181920212223 and 24.

Fig. 17
figure 17

Relative accuracy of \(k=10{,}000\) versus \(k=5000\) on the ‘development’ datasets

Fig. 18
figure 18

Relative accuracy of \(l \in \{7, 9, 11\}\) versus \(l \in \{5, 7, 9\}\) on the ‘development’ datasets

Fig. 19
figure 19

Relative accuracy, normally-distributed versus integer weights, ‘development’ datasets

Fig. 20
figure 20

Relative accuracy of always versus random centering on the ‘development’ datasets

Fig. 21
figure 21

Relative accuracy, uniformly versus normally-distributed bias, ‘development’ datasets

Fig. 22
figure 22

Relative accuracy of exponential versus uniform dilation on the ‘development’ datasets

Fig. 23
figure 23

Relative accuracy of random versus always padding on the ‘development’ datasets

Fig. 24
figure 24

Relative accuracy of ppv and max versus only ppv on the ‘development’ datasets

Resamples

We also evaluate Rocket on 10 resamples of both the ‘bake off’ and additional 2018 datasets, using the same first 10 resamples (not including the original training/test split) as in Bagnall et al. (2017). Figure 25 shows the mean rank of Rocket versus HIVE-COTE, TS-CHIEF, Shapelet Transform, Proximity Forest and BOSS on the resampled ‘bake off’ datasets. Figure 27 shows the relative accuracy of Rocket and each of the other methods on the resampled ‘bake off’ datasets. The results for HIVE-COTE, Shapelet Transform, and BOSS are taken from Bagnall et al. (2019). Figures 26 and 28 show the mean rank and relative accuracy of Rocket versus Proximity Forest and TS-CHIEF for 10 resamples of the additional 2018 datasets (published results are not available for other methods for these resamples).

The results for the resamples and the original training/test splits are very similar for both the ‘bake off’ and additional 2018 datasets. In fact, while HIVE-COTE ranks ahead of Rocket, Rocket appears to be ‘stronger’ against both HIVE-COTE and TS-CHIEF on the resamples of the ‘bake off’ datasets than on the original training/test splits. For the resampled ‘bake off’ datasets, Rocket is ahead of HIVE-COTE in terms of win/draw/loss (47/2/36), as it is for the original training/test split (45/7/33). These results confirm that the results for the original training/test split are sound, and representative of the expected performance of Rocket relative to the other methods included in the comparison.

Fig. 25
figure 25

Mean rank of Rocket versus other classifiers on the resampled ‘bake off’ datasets

Fig. 26
figure 26

Mean rank of Rocket versus other classifiers, resampled additional 2018 datasets

Fig. 27
figure 27

Relative accuracy of Rocket versus other classifiers on the resampled ‘bake off’ datasets

Fig. 28
figure 28

Relative accuracy of Rocket versus other classifiers, resampled additional 2018 datasets

Other methods

We also compare Rocket against four recently-proposed scalable methods for time series classification (see Sect. 2.2), namely, MrSEQL, cBOSS, MiSTiCl, and catch22. We have run each of these four methods on the ‘bake off’ datasets, the additional 2018 datasets, and the scalability experiments in terms of both training set size and time series length. We have run each method with its recommended settings per the relevant papers, and using the same experimental conditions as for Rocket in each case. We have used the Python wrapper for MrSEQL (https://github.com/alan-turing-institute/sktime), the Java version of cBOSS (https://github.com/uea-machine-learning/tsml), the Java version of MiSTiCl (https://github.com/atifraza/MiSTiCl), and the Python wrapper for catch22 (https://github.com/chlubba/catch22).

1.1 ‘Bake Off’ and additional 2018 datasets

Figures 29 and 32 show the mean rank and relative accuracy of Rocket versus MrSEQL, cBOSS, MiSTiCl, and catch 22 for the 85 ‘bake off’ datasets. Figures 30 and 33 show the same for the additional 2018 datasets. Figure 31 shows total compute time.

The results show that Rocket is significantly more accurate and, with one exception, more scalable than these methods. Rocket is considerably ahead of the most accurate of these methods, MrSEQL, in terms of win/draw/loss (54/8/23) on the ‘bake off’ datasets, and Rocket is approximately an order of magnitude faster in terms of total compute time than MrSEQL, cBOSS, and MiSTiCl. While catch22 is very fast, it is the least accurate method.

As for the ‘bake off’ datasets, Rocket is significantly more accurate than MrSEQL, cBOSS, MiSTiCl, or catch22 on the additional 2018 datasets and, with the exception of catch22, considerably faster. Again, Rocket is substantially ahead of the most accurate of these methods in terms of win/draw/loss (32/0/11). Rocket is 4 times faster than cBOSS, 16 times faster than MrSEQL, and almost 22 times faster than MiSTiCl on these datasets. Again, catch22 is the fastest but least accurate method.

Note that while catch22 was originally used in conjunction with a single decision tree, we found that this produced very low accuracy and, of several ‘off the shelf’ classifiers, random forest produced the highest accuracy. Accordingly, we have used catch22 in conjunction with a random forest classifier. MiSTiCl would not run on the ElectricDevices dataset in its published configuration. For this dataset we used MiSTiCl in conjunction with AdaBoost, rather than the default extremely randomised trees. MiSTiCl would not run at all on the Chinatown dataset, so it has been ranked behind the other methods for this dataset, and this dataset has been removed from the relevant plot in Fig. 33.

Fig. 29
figure 29

Mean rank of Rocket versus other classifiers on the ‘bake off’ datasets

Fig. 30
figure 30

Mean rank of Rocket versus other classifiers, additional 2018 datasets

Fig. 31
figure 31

Total compute time for ‘bake off’ datasets (left) and additional 2018 datasets (right)

Fig. 32
figure 32

Relative accuracy of Rocket versus other classifiers on the ‘bake off’ datasets

Fig. 33
figure 33

Relative accuracy of Rocket versus other classifiers, additional 2018 datasets

1.2 Scalability

1.2.1 Training set size

Figure 34 shows accuracy and training time versus training set size for Rocket—with 10,000 (default), 1000 and 100 kernels—and the other four methods for the Satellite Image Time Series Dataset. Figure 34 shows that MrSEQL, cBOSS, and MiSTiCl are all fundamentally less scalable than Rocket in terms of training set size. By approximately 32,000 training examples, MrSEQL is approximately 75 times slower than Rocket, MiSTiCl is approximately 200 times slower than Rocket, and cBOSS is more than 300 times slower than Rocket. Additionally, all four methods are noticeably less accurate than Rocket for the same training set size. We note that there appears to be a problem with MrSEQL with more than approximately 8000 training examples. While slower than catch22 with its default settings (i.e., 10,000 kernels), in contexts where this speed difference is important, restricted to 100 kernels Rocket is an order of magnitude faster than catch22 and still significantly more accurate (see Fig. 34).

1.2.2 Time series length

Figure 35 shows training time versus time series length for Rocket versus the other four methods for the InlineSkate dataset. Figure 35 shows that, in practice, the scalability of Rocket, cBOSS, and catch22 in terms of time series length appears to be similar—that is, approximately linear in time series length. Both MrSEQL and MiSTiCl are less scalable. MrSEQL, cBOSS, and MiSTiCl are all slower than Rocket for a given time series length.

Fig. 34
figure 34

Accuracy (left) and training time (right) versus training set size

Fig. 35
figure 35

Training time versus time series length

Results for ‘Bake Off’ datasets

See Table 1.

Table 1 Accuracy—‘Bake Off’ datasets

Results for additional 2018 datasets

See Table 2.

Table 2 Accuracy—additional 2018 datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dempster, A., Petitjean, F. & Webb, G.I. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min Knowl Disc 34, 1454–1495 (2020). https://doi.org/10.1007/s10618-020-00701-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00701-z

Keywords

Navigation