Tensor extrapolation: an adaptation to data sets with missing entries

Schosser, Josef

doi:10.1186/s40537-022-00574-7

Tensor extrapolation: an adaptation to data sets with missing entries

Short report
Open access
Published: 25 February 2022

Volume 9, article number 26, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Big Data Submit manuscript

Tensor extrapolation: an adaptation to data sets with missing entries

Download PDF

Josef Schosser ORCID: orcid.org/0000-0002-6910-5041¹

4450 Accesses
3 Citations
Explore all metrics

Abstract

Background

Contemporary data sets are frequently relational in nature. In retail, for example, data sets are more granular than traditional data, often indexing individual products, outlets, or even users, rather than aggregating them at the group level. Tensor extrapolation is used to forecast relational time series data; it combines tensor decompositions and time series extrapolation. However, previous approaches to tensor extrapolation are restricted to complete data sets. This paper adapts tensor extrapolation to situations with missing entries and examines the method’s performance in terms of forecast accuracy.

Findings

To base the evaluation on time series with both diverse and controllable characteristics, the paper develops a synthetic data set closely related to the context of retailing. Calculations performed on these data demonstrate that tensor extrapolation outperforms the univariate baseline. Furthermore, a preparatory completion of the data set is not necessary. The higher the fraction of missing data, the greater the superiority of tensor extrapolation in terms of prediction error.

Conclusions

Forecasting plays a key role in the optimization of business processes and enables data-driven decision making. As such, tensor extrapolation should be part of the forecaster’s toolkit: Even if large parts of the data are missing, the proposed method is able to extract meaningful, latent structure, and to use this information in prediction.

Multivariate Extrapolation: A Tensor-Based Approach

Tendi: Tensor Disaggregation from Multiple Coarse Views

Time Series AutoML; Hierarchical Factor Based Forecasting

Introduction

In general, variables can be absolute or relational [1]. A person’s age, education, gender, or level of environmental awareness are absolute variables. However, if a variable is defined by the relationship between entities, it is called a relational variable. Examples of relational variables are the difference in status between persons x and y, the intensity of friendships, the power of one person over another, and the existence and non-existence of trade relations between countries. Relational variables are often dynamic, i.e., they evolve over time.

Contemporary “big” data sets frequently exhibit relational variables [2, 3]. In retail, for instance, data sets are more granular than traditional data, often indexing individual products, outlets, or even users, rather than aggregating them at the group level [4, 5]. Consequently, there are different types of dependencies between the variables of interest (e.g., dependencies across products, dependencies among stores). The relational character of the data is, however, often neglected, notably in prediction tasks. Instead, univariate extrapolation approaches are applied; they cannot capture inter-series dependencies. Moreover, existing multivariate forecasting methods (e.g., vector autoregressions) are restricted to low-dimensional settings and are, hence, not suitable for practical use in large-scale forecasting problems [6, 7].

Tensor extrapolation intends to forecast relational time series data using multi-linear algebra. It proceeds as follows: Multi-way data are arranged in the form of multi-dimensional arrays, i.e., tensors. Tensor decompositions are then used to identify periodic patterns in the data. Subsequently, these patterns serve as input for time series methods. Tensor extrapolation originated in the work of Dunlavy et al. [8] and Spiegel et al. [9], but was limited to preselected time series approaches and binary data. Only recently, Schosser [10, 11] laid the foundations for applications in large-scale forecasting problems.

So far, tensor extrapolation has been restricted to complete data sets. However, values are often missing from time series data. Instances of corruption, inattention, or sensor malfunctions all require forecasting processes to be equipped to handle missing elements [8] and adapted by Schosser [11]. Except for the scaling (and, of course, missing values), the resulting data correspond to those used by Schosser [11]. First of all, the component matrices are generated. Thereby, we are not subject to specific limitations. In particular, the CP model does not place orthogonality constraints on the component matrices [8, 29]. The matrices ${\mathbf {A}}$ and ${\mathbf {B}}$ of size $(I \times R)$ and $(J \times R)$, respectively, are “entity participation” matrices. In other words, column ${\mathbf {a}}_r$ (resp. ${\mathbf {b}}_r$) is the vector of participation levels of all the entities in component r. User participation is shown in matrix ${\mathbf {A}}$: Here, the users are supposed to react in a different way to a specific product-time combination. The extent of this reaction is measured according to the density function of a Gaussian distribution (cf. Fig. 1a). Product participation is demonstrated in matrix ${\mathbf {B}}$: There are groups of products that respond similarly to the same time and user effect combination (cf. Fig. 1b). The columns of matrix ${\mathbf {C}}$ of size $(T \times R)$ record different periodic patterns. We create a sinusoidal pattern, a trend, a structural break, and another break (cf. Fig. 1c). By aggregating the matrices ${\mathbf {A}}$, ${\mathbf {B}}$, and ${\mathbf {C}}$, a noise-free version of the tensor emerges. Finally, we add Gaussian noise to every entry. The standard deviation of the error term is assumed to equal half of the mean of the noise-free tensor entries.

Table 1 Forecasting accuracy based on synthetic data in terms of MAPE and sMAPE

Full size table

To investigate the effect of missing entries, we eliminate different sized fractions of the data. In the words of Rubin [40], the elements are missing completely at random. That means there is no relationship between whether an entry is missing and any other entry in the data set, missing or observed. For instance, where the desired level of missingness equals 20%, each element has a probability of 0.2 of being missing. Our implementation uses consecutive draws of a Bernoulli random variable with probability of success equal to the intended level of missingness [41]. Due to the large number of entries, the realized amount of missing values differs only slightly from the level intended. As a consequence of our procedure, the missing elements are randomly scattered across the array without any specific pattern. Please note that we do not investigate systematically missing entries (the subject-based literature uses the somewhat confusing terms missing at random and missing not at random; [40]). The CP factorization proposed by Tomasi and Bro [14] gets along with systematic missingness. For techniques of preparatory completion, this only applies to a limited extent or not at all. A meaningful comparison is, therefore, not possible.

For each fraction of missing values, we split the data into an estimation sample and a hold-out sample. The latter consists of the 20 most recent observations. We thus obtain $\underline{{\mathbf {X}}}^{est}$ of size $(160 \times 120 \times 80)$ and $\underline{{\mathbf {X}}}^{hold}$ of size $(160 \times 120 \times 20)$, respectively. The methods under consideration are implemented, or trained, on the estimation sample. The forecasts are produced for the whole of the hold-out sample and arranged as tensor $\underline{{\hat{\mathbf {X}}}}$ of size $(160 \times 120 \times 20)$. Finally, forecasts are compared to the actual withheld observations.

We should be aware of the fact that the included time series are differently scaled. This has two consequences. First, since CP minimizes squared error, differences in scale may lead to distortions. Therefore, data preprocessing is necessary [11]. We choose a simple centering across the time mode [31]. It is carried out by averaging the observations over the (available) elements of the respective time series, i.e., across mode C, and then subtracting each thus obtained average from all the observations that partake in it. Formally, the preprocessing step (please compare Line 1 in Algorithm 1) implies

$$\begin{aligned} x_{ijt,trans} = x_{ijt} - {\bar{x}}_{ij\cdot }, \end{aligned}$$

where the subscript dot is used to indicate the mean across $t \in 1, \ldots , T$. During back-transformation (please compare Line 9 in Algorithm 1), the averages previously deducted are added back. This involves

$$\begin{aligned} {\hat{x}}_{ijt} = {\hat{x}}_{ijt,trans} + {\bar{x}}_{ij\cdot } \end{aligned}$$

for $t \in T+1, \ldots , T+L$. Second, only scale-free performance measures can be employed. We use the Mean Absolute Percentage Error (MAPE) and the Symmetric Mean Absolute Percentage Error (sMAPE) [11, 42].

Following Schosser [11], our application of tensor extrapolation connects a state-of-the-art tensor decomposition with an automatic forecasting procedure based on a general class of state-space time series models. For this purpose, we use the programming language Python. The function parafac (library TensorLy; [43]) supports the factorization proposed by Tomasi and Bro [14] and, hence, allows for missing values. When parafac is called, the number of components R must be specified. The function ETSModel (library statsmodels; [44]) offers a Python-based implementation of the automatic exponential smoothing algorithm developed by Hyndman et al. [35] and Hyndman and Khandakar [24]. This algorithm provides a rich set of possible models and has been shown to perform well in comparisons with other forecasting methods [23, 26, 45, 46]. Further, in relation to more complex forecasting techniques, the requirements in terms of data availability and computational resources are fairly low [7, 11]. As a baseline, we use a univariate, i.e., per-series, extrapolation by means of ETSModel. Here, the missing values must be filled in. To this end, we propagate the last valid observation forward. Any remaining gaps are filled in backwards.

Results and discussion

Table 1 displays our results. With regard to small amounts of missing data, the situation is as follows: As measured by MAPE, tensor extrapolation outperforms the baseline. If, for instance, 5% of the data are missing, up to 21.40% of the prediction error can be reduced. On the basis of sMAPE, no clear ranking can be determined. As the level of missing data increases, tensor extrapolation becomes even more attractive. In the case of MAPE, the distance to univariate extrapolation increases. Where half of the data are missing, up to 26.28% of the prediction error can be reduced. In terms of sMAPE, our proposed method now also dominates. Our results are largely unaffected by the hyperparameter choice, i.e., the number of components R. One way to circumvent the problem of committing to a specific number of components is to use an (equally weighted) combination or ensemble of forecasts. Here, again, the results are encouraging, as is often the case with the combination of forecasts [47]. Moreover, the signal-to-noise ratio does not influence the hierarchy described. Detailed results on this are available upon request.

Using the Python library timeit, we quantify the computational burden associated with the methods in question. The measurements refer to a commodity notebook with Intel Core i5-6300 CPU 2x2.40 GHz and 8 GB RAM. By way of example, we assume 20% of the data to be missing. Given 100 executions, the average runtime of tensor extrapolation with $R=4$ components equals 21.28 s. The baseline, ETSModel, takes on average 193.16 s. The reason for this difference lies in the computational cost associated with the automatic exponential smoothing algorithm, i.e., the selection and estimation of an adequate exponential smoothing method. Tensor extrapolation requires tensor decomposition, but significantly reduces the dimension of the forecasting problem. The automatic exponential smoothing algorithm is applied to $R=4$ time series. In contrast, 19,200 function calls are necessary for the baseline approach. Regardless of computational resources, tensor extrapolation should be computationally cheaper even with the combination of forecasts.

Conclusions

In spite of the possibilities arising from the “big data revolution”, the relational character of many time series is largely neglected in forecasting tasks. Recently, tensor extrapolation has been shown to be effective in forecasting large-scale relational data [11]. However, the results so far are limited to complete data sets. The paper at hand adapts tensor extrapolation to situations with missing entries. The results demonstrate that the method can be successfully applied for up to 50% missing values. Notwithstanding the missing elements, tensor extrapolation is able to extract meaningful, latent structure in the data and to use this information for prediction. A preparatory completion of the data set (e.g., by replacing missing elements) is not required. Given the importance of missing values in practice [48], the findings of this paper provide a compelling argument in favor of tensor extrapolation.

Availability of data and materials

All data generated or analyzed during this study are available in Additional file 1.

References

Wasserman S, Faust K. Social network analysis: Methods and applications. Cambridge: Cambridge University Press; 1994.
Book MATH Google Scholar
Kitchin R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. 2014;1(1):1–12.
Article Google Scholar
Kitchin R, McArdle G. What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data Soc. 2016;3(1):1–10.
Article Google Scholar
Müller O, Junglas I, vom Brocke J, Debortoli S. Utilizing big data analytics for information systems research: Challenges, promises and guidelines. Eur J Inform Syst. 2016;25(4):289–302.
Article Google Scholar
Fildes R, Ma S, Kolassa S. Retail forecasting: research and practice. Int J Forecasting. 2022. https://doi.org/10.1016/j.ijforecast.2019.06.004.
Article Google Scholar
Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. Melbourne: OTexts; 2018.
Google Scholar
De Stefani J, Bontempi G. Factor-based framework for multivariate and multi-step-ahead forecasting of large scale time series. Front Big Data. 2021;4(1):e690267.
Article Google Scholar
Dunlavy DM, Kolda TG, Acar E. Temporal link prediction using matrix and tensor factorizations. ACM T Knowl Discov D. 2011;5(2):e10.
Google Scholar
Spiegel S, Clausen J, Albayrak S, Kunegis J. Link prediction on evolving data using tensor factorization. In: New frontiers in applied data mining: PAKDD 2011 International Workshops. Springer; 2012. p. 100–110.
Schosser J. Multivariate extrapolation: A tensor-based approach. In: Neufeld JS, Buscher U, Lasch R, Möst D, Schönberger J, editors. Operations Research Proceedings 2019. New York: Springer; 2020. p. 53–9.
Chapter Google Scholar
Schosser J. Tensor extrapolation: forecasting large-scale relational data. J Oper Res Soc. 2022. https://doi.org/10.1080/01605682.2021.1892460.
Article Google Scholar
Alexandrov A, Benidis K, Bohlke-Schneider M, Flunkert V, Gasthaus J, Januschowski T, et al. GluonTS: Probabilistic time series models in Python. ar**v:1906.05264; 2019.
Shah SY, Patel D, Vu L, Dang XH, Chen B, Kirchner P, et al. AutoAI-TS: AutoAI for time series forecasting. In: Proceedings of the 2021 International Conference on Management of Data (SIGMOD). ACM; 2021. p. 2584–96.
Tomasi G, Bro R. PARAFAC and missing values. Chemometr Intell Lab. 2005;75(2):163–80.
Article Google Scholar
Bi X, Tang X, Yuan Y, Zhang Y, Qu A. Tensors in statistics. Annu Rev Stat Appl. 2021;8(1):345–68.
Article MathSciNet Google Scholar
Hill C, Li J, Schneider M. The tensor auto-regressive model. J Forecasting. 2021;40(4):636–52.
Article MathSciNet Google Scholar
Hoff PD. Multilinear tensor regression for longitudinal relational data. Ann Appl Stat. 2015;9(3):1169–93.
Article MathSciNet MATH Google Scholar
Minhas S, Hoff PD, Ward MD. A new approach to analyzing coevolving longitudinal networks in international relations. J Peace Res. 2016;53(3):491–505.
Article Google Scholar
Feuerverger A, He Y, Khatri S. Statistical significance of the Netflix challenge. Stat Sci. 2012;27(2):202–31.
Article MathSciNet MATH Google Scholar
Donoho D. 50 years of data science. J Comput Graph Stat. 2017;26(4):745–66.
Article MathSciNet Google Scholar
Liberman M. Obituary: Fred Jelinek. Comput Linguist. 2010;36(4):595–9.
Article Google Scholar
Makridakis S, Spiliotis E, Assimakopoulos V. The M4 competition: results, findings, conclusion and way forward. Int J Forecasting. 2018;34(4):802–8.
Article Google Scholar
Makridakis S, Spiliotis E, Assimakopoulos V. The M5 accuracy competition: results, findings and conclusions. Int J Forecasting. 2022. https://doi.org/10.1016/j.ijforecast.2021.11.013.
Article Google Scholar
Hyndman RJ, Khandakar Y. Automatic time series forecasting: The forecast package for R. J Stat Softw. 2008;27(3):1–22.
Article Google Scholar
Salinas D, Flunkert V, Gasthaus J, Januschowski T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int J Forecasting. 2020;36(3):1181–91.
Article Google Scholar
Gastinger J, Nicolas S, Stepić D, Schmidt M, Schülke A. A study on ensemble learning for time series forecasting and the need for meta-learning. ar**v:2104.11475; 2021.
Cichocki A, Zdunek R, Phan AH, Amari S. Nonnegative matrix and tensor factorizations: Applications to exploratory multiway data analysis and blind source separation. Chichester: Wiley; 2009.
Book Google Scholar
Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev. 2009;51(3):455–500.
Article MathSciNet MATH Google Scholar
Papalexakis EE, Faloutsos C, Sidiropoulos ND. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. ACM T Intel Syst Tec. 2016;8(2):e16.
Google Scholar
Rabanser S, Shchur O, Günnemann S. Introduction to tensor decompositions and their applications in machine learning. ar**v:1711.10781; 2017.
Kiers HAL. Towards a standardized notation and terminology in multiway analysis. J Chemometr. 2000;14(3):105–22.
Article Google Scholar
Hitchcock FL. The expression of a tensor or polyadic as a sum of products. J Math Phys. 1927;6(1):164–89.
Article MATH Google Scholar
Carroll JD, Chang JJ. Analysis of individual preferences in multidimensional scaling via an N-way generalization of ‘Eckart-Young’ decomposition. Psychometrika. 1970;35(3):283–319.
Article MATH Google Scholar
Harshman RA. Foundations of the PARAFAC procedure: Models and conditions for an ‘explanatory’ multimodal factor analysis. UCLA Working Papers Phonetics. 1970;16:1–84.
Hyndman RJ, Koehler AB, Snyder RD, Grose S. A state space framework for automatic forecasting using exponential smoothing. Int J Forecasting. 2002;18(3):439–54.
Article Google Scholar
Fildes R. Evaluation of aggregate and individual forecast method selection rules. Manage Sci. 1989;35(9):1056–65.
Article Google Scholar
Petropoulos F, Makridakis S, Assimakopoulos V, Nikolopoulos K. ‘Horses for Courses’ in demand forecasting. Eur J Oper Res. 2014;237(1):152–63.
Article MathSciNet MATH Google Scholar
Kang Y, Hyndman RJ, Li F. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics. Stat Anal Data Min. 2020;13(4):354–76.
Article MATH Google Scholar
Schouten RM, Lugtig P, Vink G. Generating missing values for simulation purposes: A multivariate amputation procedure. J Stat Comput Sim. 2018;88(15):2909–30.
Article MathSciNet MATH Google Scholar
Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–92.
Article MathSciNet MATH Google Scholar
Little RJA, Rubin DB. Statistical analysis with missing data. Hoboken: Wiley; 2002.
Book MATH Google Scholar
Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J Forecasting. 2006;22(4):679–88.
Article Google Scholar
Kossaifi J, Panagakis Y, Anandkumar A, Pantic M. TensorLy: Tensor learning in Python. J Mach Learn Res. 2019;20(26):1–6.
MATH Google Scholar
Seabold S, Perktold J. Statsmodels: Econometric and statistical modeling with Python. In: Proceedings of the 9th Python in Science Conference (SCIPY2010); 2010. p. 57–61
Januschowski T, Gasthaus J, Wang Y, Salinas D, Flunkert V, Bohlke-Schneider M, et al. Criteria for classifying forecasting methods. Int J Forecasting. 2020;36(1):167–77.
Article Google Scholar
Karlsson Rosenblad A. Accuracy of automatic forecasting methods for univariate time series data: A case study predicting the results of the 2018 Swedish general election using decades-long series. Commun Stat Case Stud. 2021;7(3):475–93.
Google Scholar
Bates JM, Granger CW. The combination of forecasts. J Oper Res Soc. 1969;20(4):451–68.
Article Google Scholar
Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. J Big Data. 2021;8(1): e140.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Business, Economics and Information Systems, University of Passau, Passau, Germany
Josef Schosser

Authors

Josef Schosser
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JS is the sole author. The author read and approved the final manuscript.

Corresponding author

Correspondence to Josef Schosser.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Python code. The notebook highlights core components of the code applied in the study.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schosser, J. Tensor extrapolation: an adaptation to data sets with missing entries. J Big Data 9, 26 (2022). https://doi.org/10.1186/s40537-022-00574-7

Download citation

Received: 07 September 2021
Accepted: 06 February 2022
Published: 25 February 2022
DOI: https://doi.org/10.1186/s40537-022-00574-7

Tensor extrapolation: an adaptation to data sets with missing entries