Abstract
The existing algorithms for processing skyline queries cannot adapt to big data. This paper proposes two approximate skyline algorithms based on sampling. The first algorithm obtains a fixed size sample and computes the approximate skyline on the sample. The error of the first algorithm is relatively small in most cases, and is almost independent of the input relation size. The second algorithm returns an \((\epsilon ,\delta )\)-approximation for the exact skyline. The size of sample required by the second algorithm can be regarded as a constant relative to the input relation size, so is the running time.
This work was supported by the National Natural Science Foundation of China under grant 61732003, 61832003, 61972110 and U1811461.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bai, Z.D., Chao, C.C., Hwang, H.K., Liang, W.Q.: On the variance of the number of maxima in random vectors and its applications. In: Advances in Statistics, pp. 164–173. World Scientific (2008)
Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. (TODS) 33(4), 1–49 (2008)
Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time algorithms for computing maxima and convex hulls. Algorithmica 9(2), 168–183 (1993)
Bentley, J.L., Kung, H.T., Schkolnick, M., Thompson, C.D.: On the average number of maxima in a set of vectors and applications. J. ACM (JACM) 25(4), 536–543 (1978)
Borzsony, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings 17th International Conference on Data Engineering, pp. 421–430. IEEE (2001)
Buchta, C.: On the average number of maxima in a set of vectors. Inf. Process. Lett. 33(2), 63–65 (1989)
Cai, Z., Miao, D., Li, Y.: Deletion propagation for multiple key preserving conjunctive queries: approximations and complexity. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 506–517. IEEE (2019)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: ICDE, vol. 3, pp. 717–719 (2003)
Devroye, L.: A note on finding convex hulls via maximal vectors. Inf. Process. Lett. 11(1), 53–56 (1980)
Gao, X., Li, J., Miao, D., Liu, X.: Recognizing the tractability in big data computing. Theor. Comput. Sci. 838, 195–207 (2020)
Godfrey, P.: Skyline cardinality for relational processing. In: Seipel, D., Turull-Torres, J.M. (eds.) FoIKS 2004. LNCS, vol. 2942, pp. 78–97. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24627-5_7
Godfrey, P., Shipley, R., Gryz, J., et al.: Maximal vector computation in large data sets. VLDB 5, 229–240 (2005)
Han, X., Li, J., Yang, D., Wang, J.: Efficient skyline computation on big data. IEEE Trans. Knowl. Data Eng. 25(11), 2521–2535 (2012)
Koltun, V., Papadimitriou, C.H.: Approximately dominating representatives. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 204–214. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30570-5_14
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: VLDB 2002: Proceedings of the 28th International Conference on Very Large Databases, pp. 275–286. Elsevier (2002)
Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. J. ACM (JACM) 22(4), 469–476 (1975)
Lee, K.C., Lee, W.C., Zheng, B., Li, H., Tian, Y.: Z-sky: an efficient skyline query processing framework based on z-order. VLDB J. 19(3), 333–362 (2010)
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 86–95. IEEE (2007)
Magnani, M., Assent, I., Mortensen, M.L.: Taking the big picture: representative skylines based on significance and diversity. VLDB J. 23(5), 795–815 (2014)
Miao, D., Cai, Z., Li, J.: On the complexity of bounded view propagation for conjunctive queries. IEEE Trans. Knowl. Data Eng. 30(1), 115–127 (2017)
Miao, D., Cai, Z., Li, J., Gao, X., Liu, X.: The computation of optimal subset repairs. Proc. VLDB Endow. 13(12), 2061–2074 (2020)
Miao, D., Liu, X., Li, J.: On the complexity of sampling query feedback restricted database repair of functional dependency violations. Theor. Comput. Sci. 609, 594–605 (2016)
Miao, D., Yu, J., Cai, Z.: The hardness of resilience for nested aggregation query. Theor. Comput. Sci. 803, 152–159 (2020)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478 (2003)
Søholm, M., Chester, S., Assent, I.: Maximum coverage representative skyline. In: EDBT, pp. 702–703 (2016)
Tan, K.L., Eng, P.K., Ooi, B.C., et al.: Efficient progressive skyline computation. VLDB 1, 301–310 (2001)
Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 892–903. IEEE (2009)
**ngxing **ao, J.L.: Sampling based approximate skyline calculation on big data (2020). https://arxiv.org/abs/2008.05103
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
**ao, X., Li, J. (2020). Sampling-Based Approximate Skyline Calculation on Big Data. In: Wu, W., Zhang, Z. (eds) Combinatorial Optimization and Applications. COCOA 2020. Lecture Notes in Computer Science(), vol 12577. Springer, Cham. https://doi.org/10.1007/978-3-030-64843-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-64843-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64842-8
Online ISBN: 978-3-030-64843-5
eBook Packages: Computer ScienceComputer Science (R0)