Abstract
Background
Contemporary data sets are frequently relational in nature. In retail, for example, data sets are more granular than traditional data, often indexing individual products, outlets, or even users, rather than aggregating them at the group level. Tensor extrapolation is used to forecast relational time series data; it combines tensor decompositions and time series extrapolation. However, previous approaches to tensor extrapolation are restricted to complete data sets. This paper adapts tensor extrapolation to situations with missing entries and examines the method’s performance in terms of forecast accuracy.
Findings
To base the evaluation on time series with both diverse and controllable characteristics, the paper develops a synthetic data set closely related to the context of retailing. Calculations performed on these data demonstrate that tensor extrapolation outperforms the univariate baseline. Furthermore, a preparatory completion of the data set is not necessary. The higher the fraction of missing data, the greater the superiority of tensor extrapolation in terms of prediction error.
Conclusions
Forecasting plays a key role in the optimization of business processes and enables data-driven decision making. As such, tensor extrapolation should be part of the forecaster’s toolkit: Even if large parts of the data are missing, the proposed method is able to extract meaningful, latent structure, and to use this information in prediction.
Similar content being viewed by others
Introduction
In general, variables can be absolute or relational [1]. A person’s age, education, gender, or level of environmental awareness are absolute variables. However, if a variable is defined by the relationship between entities, it is called a relational variable. Examples of relational variables are the difference in status between persons x and y, the intensity of friendships, the power of one person over another, and the existence and non-existence of trade relations between countries. Relational variables are often dynamic, i.e., they evolve over time.
Contemporary “big” data sets frequently exhibit relational variables [2, 3]. In retail, for instance, data sets are more granular than traditional data, often indexing individual products, outlets, or even users, rather than aggregating them at the group level [4, 5]. Consequently, there are different types of dependencies between the variables of interest (e.g., dependencies across products, dependencies among stores). The relational character of the data is, however, often neglected, notably in prediction tasks. Instead, univariate extrapolation approaches are applied; they cannot capture inter-series dependencies. Moreover, existing multivariate forecasting methods (e.g., vector autoregressions) are restricted to low-dimensional settings and are, hence, not suitable for practical use in large-scale forecasting problems [6, 7].
Tensor extrapolation intends to forecast relational time series data using multi-linear algebra. It proceeds as follows: Multi-way data are arranged in the form of multi-dimensional arrays, i.e., tensors. Tensor decompositions are then used to identify periodic patterns in the data. Subsequently, these patterns serve as input for time series methods. Tensor extrapolation originated in the work of Dunlavy et al. [8] and Spiegel et al. [9], but was limited to preselected time series approaches and binary data. Only recently, Schosser [10, 11] laid the foundations for applications in large-scale forecasting problems.
So far, tensor extrapolation has been restricted to complete data sets. However, values are often missing from time series data. Instances of corruption, inattention, or sensor malfunctions all require forecasting processes to be equipped to handle missing elements [8] and adapted by Schosser [11]. Except for the scaling (and, of course, missing values), the resulting data correspond to those used by Schosser [11]. First of all, the component matrices are generated. Thereby, we are not subject to specific limitations. In particular, the CP model does not place orthogonality constraints on the component matrices [8, 29]. The matrices \({\mathbf {A}}\) and \({\mathbf {B}}\) of size \((I \times R)\) and \((J \times R)\), respectively, are “entity participation” matrices. In other words, column \({\mathbf {a}}_r\) (resp. \({\mathbf {b}}_r\)) is the vector of participation levels of all the entities in component r. User participation is shown in matrix \({\mathbf {A}}\): Here, the users are supposed to react in a different way to a specific product-time combination. The extent of this reaction is measured according to the density function of a Gaussian distribution (cf. Fig. 1a). Product participation is demonstrated in matrix \({\mathbf {B}}\): There are groups of products that respond similarly to the same time and user effect combination (cf. Fig. 1b). The columns of matrix \({\mathbf {C}}\) of size \((T \times R)\) record different periodic patterns. We create a sinusoidal pattern, a trend, a structural break, and another break (cf. Fig. 1c). By aggregating the matrices \({\mathbf {A}}\), \({\mathbf {B}}\), and \({\mathbf {C}}\), a noise-free version of the tensor emerges. Finally, we add Gaussian noise to every entry. The standard deviation of the error term is assumed to equal half of the mean of the noise-free tensor entries.
To investigate the effect of missing entries, we eliminate different sized fractions of the data. In the words of Rubin [40], the elements are missing completely at random. That means there is no relationship between whether an entry is missing and any other entry in the data set, missing or observed. For instance, where the desired level of missingness equals 20%, each element has a probability of 0.2 of being missing. Our implementation uses consecutive draws of a Bernoulli random variable with probability of success equal to the intended level of missingness [41]. Due to the large number of entries, the realized amount of missing values differs only slightly from the level intended. As a consequence of our procedure, the missing elements are randomly scattered across the array without any specific pattern. Please note that we do not investigate systematically missing entries (the subject-based literature uses the somewhat confusing terms missing at random and missing not at random; [40]). The CP factorization proposed by Tomasi and Bro [14] gets along with systematic missingness. For techniques of preparatory completion, this only applies to a limited extent or not at all. A meaningful comparison is, therefore, not possible.
For each fraction of missing values, we split the data into an estimation sample and a hold-out sample. The latter consists of the 20 most recent observations. We thus obtain \(\underline{{\mathbf {X}}}^{est}\) of size \((160 \times 120 \times 80)\) and \(\underline{{\mathbf {X}}}^{hold}\) of size \((160 \times 120 \times 20)\), respectively. The methods under consideration are implemented, or trained, on the estimation sample. The forecasts are produced for the whole of the hold-out sample and arranged as tensor \(\underline{{\hat{\mathbf {X}}}}\) of size \((160 \times 120 \times 20)\). Finally, forecasts are compared to the actual withheld observations.
We should be aware of the fact that the included time series are differently scaled. This has two consequences. First, since CP minimizes squared error, differences in scale may lead to distortions. Therefore, data preprocessing is necessary [11]. We choose a simple centering across the time mode [31]. It is carried out by averaging the observations over the (available) elements of the respective time series, i.e., across mode C, and then subtracting each thus obtained average from all the observations that partake in it. Formally, the preprocessing step (please compare Line 1 in Algorithm 1) implies
where the subscript dot is used to indicate the mean across \(t \in 1, \ldots , T\). During back-transformation (please compare Line 9 in Algorithm 1), the averages previously deducted are added back. This involves
for \(t \in T+1, \ldots , T+L\). Second, only scale-free performance measures can be employed. We use the Mean Absolute Percentage Error (MAPE) and the Symmetric Mean Absolute Percentage Error (sMAPE) [11, 42].
Following Schosser [11], our application of tensor extrapolation connects a state-of-the-art tensor decomposition with an automatic forecasting procedure based on a general class of state-space time series models. For this purpose, we use the programming language Python. The function parafac (library TensorLy; [43]) supports the factorization proposed by Tomasi and Bro [14] and, hence, allows for missing values. When parafac is called, the number of components R must be specified. The function ETSModel (library statsmodels; [44]) offers a Python-based implementation of the automatic exponential smoothing algorithm developed by Hyndman et al. [35] and Hyndman and Khandakar [24]. This algorithm provides a rich set of possible models and has been shown to perform well in comparisons with other forecasting methods [23, 26, 45, 46]. Further, in relation to more complex forecasting techniques, the requirements in terms of data availability and computational resources are fairly low [7, 11]. As a baseline, we use a univariate, i.e., per-series, extrapolation by means of ETSModel. Here, the missing values must be filled in. To this end, we propagate the last valid observation forward. Any remaining gaps are filled in backwards.
Results and discussion
Table 1 displays our results. With regard to small amounts of missing data, the situation is as follows: As measured by MAPE, tensor extrapolation outperforms the baseline. If, for instance, 5% of the data are missing, up to 21.40% of the prediction error can be reduced. On the basis of sMAPE, no clear ranking can be determined. As the level of missing data increases, tensor extrapolation becomes even more attractive. In the case of MAPE, the distance to univariate extrapolation increases. Where half of the data are missing, up to 26.28% of the prediction error can be reduced. In terms of sMAPE, our proposed method now also dominates. Our results are largely unaffected by the hyperparameter choice, i.e., the number of components R. One way to circumvent the problem of committing to a specific number of components is to use an (equally weighted) combination or ensemble of forecasts. Here, again, the results are encouraging, as is often the case with the combination of forecasts [47]. Moreover, the signal-to-noise ratio does not influence the hierarchy described. Detailed results on this are available upon request.
Using the Python library timeit, we quantify the computational burden associated with the methods in question. The measurements refer to a commodity notebook with Intel Core i5-6300 CPU 2x2.40 GHz and 8 GB RAM. By way of example, we assume 20% of the data to be missing. Given 100 executions, the average runtime of tensor extrapolation with \(R=4\) components equals 21.28 s. The baseline, ETSModel, takes on average 193.16 s. The reason for this difference lies in the computational cost associated with the automatic exponential smoothing algorithm, i.e., the selection and estimation of an adequate exponential smoothing method. Tensor extrapolation requires tensor decomposition, but significantly reduces the dimension of the forecasting problem. The automatic exponential smoothing algorithm is applied to \(R=4\) time series. In contrast, 19,200 function calls are necessary for the baseline approach. Regardless of computational resources, tensor extrapolation should be computationally cheaper even with the combination of forecasts.
Conclusions
In spite of the possibilities arising from the “big data revolution”, the relational character of many time series is largely neglected in forecasting tasks. Recently, tensor extrapolation has been shown to be effective in forecasting large-scale relational data [11]. However, the results so far are limited to complete data sets. The paper at hand adapts tensor extrapolation to situations with missing entries. The results demonstrate that the method can be successfully applied for up to 50% missing values. Notwithstanding the missing elements, tensor extrapolation is able to extract meaningful, latent structure in the data and to use this information for prediction. A preparatory completion of the data set (e.g., by replacing missing elements) is not required. Given the importance of missing values in practice [48], the findings of this paper provide a compelling argument in favor of tensor extrapolation.
Availability of data and materials
All data generated or analyzed during this study are available in Additional file 1.
References
Wasserman S, Faust K. Social network analysis: Methods and applications. Cambridge: Cambridge University Press; 1994.
Kitchin R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. 2014;1(1):1–12.
Kitchin R, McArdle G. What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data Soc. 2016;3(1):1–10.
Müller O, Junglas I, vom Brocke J, Debortoli S. Utilizing big data analytics for information systems research: Challenges, promises and guidelines. Eur J Inform Syst. 2016;25(4):289–302.
Fildes R, Ma S, Kolassa S. Retail forecasting: research and practice. Int J Forecasting. 2022. https://doi.org/10.1016/j.ijforecast.2019.06.004.
Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. Melbourne: OTexts; 2018.
De Stefani J, Bontempi G. Factor-based framework for multivariate and multi-step-ahead forecasting of large scale time series. Front Big Data. 2021;4(1):e690267.
Dunlavy DM, Kolda TG, Acar E. Temporal link prediction using matrix and tensor factorizations. ACM T Knowl Discov D. 2011;5(2):e10.
Spiegel S, Clausen J, Albayrak S, Kunegis J. Link prediction on evolving data using tensor factorization. In: New frontiers in applied data mining: PAKDD 2011 International Workshops. Springer; 2012. p. 100–110.
Schosser J. Multivariate extrapolation: A tensor-based approach. In: Neufeld JS, Buscher U, Lasch R, Möst D, Schönberger J, editors. Operations Research Proceedings 2019. New York: Springer; 2020. p. 53–9.
Schosser J. Tensor extrapolation: forecasting large-scale relational data. J Oper Res Soc. 2022. https://doi.org/10.1080/01605682.2021.1892460.
Alexandrov A, Benidis K, Bohlke-Schneider M, Flunkert V, Gasthaus J, Januschowski T, et al. GluonTS: Probabilistic time series models in Python. ar**v:1906.05264; 2019.
Shah SY, Patel D, Vu L, Dang XH, Chen B, Kirchner P, et al. AutoAI-TS: AutoAI for time series forecasting. In: Proceedings of the 2021 International Conference on Management of Data (SIGMOD). ACM; 2021. p. 2584–96.
Tomasi G, Bro R. PARAFAC and missing values. Chemometr Intell Lab. 2005;75(2):163–80.
Bi X, Tang X, Yuan Y, Zhang Y, Qu A. Tensors in statistics. Annu Rev Stat Appl. 2021;8(1):345–68.
Hill C, Li J, Schneider M. The tensor auto-regressive model. J Forecasting. 2021;40(4):636–52.
Hoff PD. Multilinear tensor regression for longitudinal relational data. Ann Appl Stat. 2015;9(3):1169–93.
Minhas S, Hoff PD, Ward MD. A new approach to analyzing coevolving longitudinal networks in international relations. J Peace Res. 2016;53(3):491–505.
Feuerverger A, He Y, Khatri S. Statistical significance of the Netflix challenge. Stat Sci. 2012;27(2):202–31.
Donoho D. 50 years of data science. J Comput Graph Stat. 2017;26(4):745–66.
Liberman M. Obituary: Fred Jelinek. Comput Linguist. 2010;36(4):595–9.
Makridakis S, Spiliotis E, Assimakopoulos V. The M4 competition: results, findings, conclusion and way forward. Int J Forecasting. 2018;34(4):802–8.
Makridakis S, Spiliotis E, Assimakopoulos V. The M5 accuracy competition: results, findings and conclusions. Int J Forecasting. 2022. https://doi.org/10.1016/j.ijforecast.2021.11.013.
Hyndman RJ, Khandakar Y. Automatic time series forecasting: The forecast package for R. J Stat Softw. 2008;27(3):1–22.
Salinas D, Flunkert V, Gasthaus J, Januschowski T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int J Forecasting. 2020;36(3):1181–91.
Gastinger J, Nicolas S, Stepić D, Schmidt M, Schülke A. A study on ensemble learning for time series forecasting and the need for meta-learning. ar**v:2104.11475; 2021.
Cichocki A, Zdunek R, Phan AH, Amari S. Nonnegative matrix and tensor factorizations: Applications to exploratory multiway data analysis and blind source separation. Chichester: Wiley; 2009.
Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev. 2009;51(3):455–500.
Papalexakis EE, Faloutsos C, Sidiropoulos ND. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. ACM T Intel Syst Tec. 2016;8(2):e16.
Rabanser S, Shchur O, Günnemann S. Introduction to tensor decompositions and their applications in machine learning. ar**v:1711.10781; 2017.
Kiers HAL. Towards a standardized notation and terminology in multiway analysis. J Chemometr. 2000;14(3):105–22.
Hitchcock FL. The expression of a tensor or polyadic as a sum of products. J Math Phys. 1927;6(1):164–89.
Carroll JD, Chang JJ. Analysis of individual preferences in multidimensional scaling via an N-way generalization of ‘Eckart-Young’ decomposition. Psychometrika. 1970;35(3):283–319.
Harshman RA. Foundations of the PARAFAC procedure: Models and conditions for an ‘explanatory’ multimodal factor analysis. UCLA Working Papers Phonetics. 1970;16:1–84.
Hyndman RJ, Koehler AB, Snyder RD, Grose S. A state space framework for automatic forecasting using exponential smoothing. Int J Forecasting. 2002;18(3):439–54.
Fildes R. Evaluation of aggregate and individual forecast method selection rules. Manage Sci. 1989;35(9):1056–65.
Petropoulos F, Makridakis S, Assimakopoulos V, Nikolopoulos K. ‘Horses for Courses’ in demand forecasting. Eur J Oper Res. 2014;237(1):152–63.
Kang Y, Hyndman RJ, Li F. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics. Stat Anal Data Min. 2020;13(4):354–76.
Schouten RM, Lugtig P, Vink G. Generating missing values for simulation purposes: A multivariate amputation procedure. J Stat Comput Sim. 2018;88(15):2909–30.
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
Little RJA, Rubin DB. Statistical analysis with missing data. Hoboken: Wiley; 2002.
Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J Forecasting. 2006;22(4):679–88.
Kossaifi J, Panagakis Y, Anandkumar A, Pantic M. TensorLy: Tensor learning in Python. J Mach Learn Res. 2019;20(26):1–6.
Seabold S, Perktold J. Statsmodels: Econometric and statistical modeling with Python. In: Proceedings of the 9th Python in Science Conference (SCIPY2010); 2010. p. 57–61
Januschowski T, Gasthaus J, Wang Y, Salinas D, Flunkert V, Bohlke-Schneider M, et al. Criteria for classifying forecasting methods. Int J Forecasting. 2020;36(1):167–77.
Karlsson Rosenblad A. Accuracy of automatic forecasting methods for univariate time series data: A case study predicting the results of the 2018 Swedish general election using decades-long series. Commun Stat Case Stud. 2021;7(3):475–93.
Bates JM, Granger CW. The combination of forecasts. J Oper Res Soc. 1969;20(4):451–68.
Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. J Big Data. 2021;8(1): e140.
Author information
Authors and Affiliations
Contributions
JS is the sole author. The author read and approved the final manuscript.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Python code. The notebook highlights core components of the code applied in the study.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Schosser, J. Tensor extrapolation: an adaptation to data sets with missing entries. J Big Data 9, 26 (2022). https://doi.org/10.1186/s40537-022-00574-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40537-022-00574-7