Abstract
Cloud computing enables the outsourcing of big data analytics, where a third-party server is responsible for data management and processing. A major security concern of the outsourcing paradigm is whether the untrusted server returns correct results. In this paper, we consider approximate record matching in the outsourcing model. Given a target record, the service provider should return all records from the outsourced dataset that are similar to the target. Identifying approximately duplicate records in databases plays an important role in information integration and entity resolution. In this paper, we design ALARM, an Authentication soLution of outsourced Approximate Record Matching to verify the correctness of the result. The key idea of ALARM is that besides returning the similar records, the server constructs the verification object (VO) to prove their authenticity, soundness and completeness. ALARM consists of four authentication approaches, namely \(VS^2\), E-\(VS^2\), G-\(VS^2\) and P-\(VS^2\). These approaches endeavor to reduce the verification cost from different aspects. We theoretically prove the robustness and security of these approaches, and analyze the time and space complexity for each approach. We perform an extensive set of experiment on real-world datasets to demonstrate that ALARM can verify the record matching results with cheap cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proceedings of the VLDB Endowment (2006)
Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: Proceedings of the Workshop on Privacy in the Electronic Society (2003)
Bajaj, S., Sion, R.: CorrectDB: SQL engine with practical query authentication. Proc. VLDB Endow. 6(7), 529–540 (2013)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: ACM Sigmod Record, vol. 19, pp. 322–331. ACM (1990)
Bonomi, L., **ong, L., Chen, R., Fung, B.: Frequent grams based embedding for privacy preserving record linkage. In: Proceedings of the International Conference on Information and Knowledge Management (2012)
Bronstein, A.M., Bronstein, M.M., Guibas, L.J., Ovsjanikov, M.: Waldhash: sequential similarity-preserving hashing. Technical report CIS-2010-03, Technion, Israel (2010)
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: Proceedings of the International Conference on Management of Data (2003)
Chen, Q., Hu, H., Xu, J.: Authenticated online data integration services. In: Proceedings of the International Conference on Management of Data (2015)
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13(1), 64–78 (2001)
Comer, D.: Ubiquitous b-tree. Comput. Surv. (1979)
Cormen, T.H.: Introduction to Algorithms. MIT press, Cambridge (2009)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Dong, B., Wang, H.: Efficient authentication of outsourced string similarity search. CoRR abs/1603.02727 (2016). http://arxiv.org/abs/1603.02727
Dong, B., Liu, R., Wang, W.H.: PraDa: Privacy-preserving data-deduplication-as-a-service. In: Proceedings of the International Conference on Conference on Information and Knowledge Management (2014)
Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley, New York (2014)
Durham, E.A., Kantarcioglu, M., Xue, Y., Toth, C., Kuzu, M., Malin, B.: Composite bloom filters for secure record linkage. Trans. Knowl. Data Eng. (2014)
Eidenbenz, S., Stamm, C.: Maximum clique and minimum clique partition in visibility graphs. In: Theoretical Computer Science, pp. 200–212 (2000)
Faloutsos, C., Lin, K.I.: FastMap: A Fast Algorithm for Indexing, Data-mining and Visualization of Traditional and Multimedia Datasets, vol. 24 (1995)
Feigenbaum, J.J.: A machine learning approach to census record linking (2016). Accessed 28 Mar 2016
Goodrich, M.T., Papamanthou, C., Nguyen, D., Tamassia, R., Lopes, C.V., Ohrimenko, O., Triandopoulos, N.: Efficient verification of web-content searching through authenticated web crawlers. Proc. VLDB Endow. (2012)
Gottapu, R.D., Dagli, C., Ali, B.: Entity resolution using convolutional neural network. Proc. Comput. Sci. 95, 153–158 (2016)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D., et al.: Approximate string joins in a database (almost) for free. In: VLDB, vol. 1, pp. 491–500 (2001)
Guha, S., Mishra, N.: Clustering data streams. In: Data Stream Management, pp 169–187. Springer (2016)
Hacigümüş, H., Iyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the International Conference on Management of Data (2002)
Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameterized matching. ACM Trans. Algorithms (TALG) 3(3), 29 (2007)
Hjaltason, G., Samet, H.: Contractive embedding methods for similarity searching in metric spaces. Technical report TR-4102, Computer Science Department (2000)
Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. Trans. Pattern Anal. Mach. Intell. (2003)
**, L., Li, C., Mehrotra, S.: Efficient record linkage in large data sets. In: International Conference on Database Systems for Advanced Applications (2003)
Kamel, I., Faloutsos, C.: Hilbert r-tree: An improved r-tree using fractals. Technical report (1993)
Koudas, N., Sarawagi, S., Srivastava, D.: Record linkage: similarity measures and algorithms. In: Proceedings of the International Conference on Management of Data (2006)
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: International Conference on Data Engineering (2008)
Li, F., Hadjieleftheriou, M., Kollios, G., Reyzin, L.: Dynamic authenticated index structures for outsourced databases. In: Proceedings of the International Conference on Management of Data (2006)
Li, F., Hadjieleftheriou, M., Kollios, G., Reyzin, L.: Authenticated index structures for aggregation queries. Trans. Inf. Syst. Secur. (2010)
Merkle, R.C.: Secure communications over insecure channels. Commun. ACM 21(4), 294–299 (1978)
Miller, A., Hicks, M., Katz, J., Shi, E.: Authenticated data structures, generically. In: ACM SIGPLAN Notices, vol. 49, pp. 411–423 (2014)
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)
Morris, P.: Introduction to game theory. Springer, New York (2012)
Morton, G.M.: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company, New York (1966)
Mykletun, E., Narasimha, M., Tsudik, G.: Authentication and integrity in outsourced databases. ACM Trans. Storage (TOS) 2(2), 107–138 (2006)
O’Connell, R.T., Koehler, A.B.: Forecasting, time series, and regression: An applied approach, vol. 4. South-Western Pub (2005)
Pang, H., Mouratidis, K.: Authenticating the query results of text search engines. Proc. VLDB Endow. (2008)
Pang, H., Zhang, J., Mouratidis, K.: Scalable verification for outsourced dynamic databases. Proc. VLDB Endow. (2009)
Papadopoulos, D., Papamanthou, C., Tamassia, R., Triandopoulos, N.: Practical authenticated pattern matching with optimal proof size. Proc. VLDB Endow. 8(7), 750–761 (2015)
Papadopoulos, S., Wang, L., Yang, Y., Papadias, D., Karras, P.: Authenticated multistep nearest neighbor search. Trans. Knowl. Data Eng. (2011)
Papadopoulos, S., Kiayias, A., Papadias, D.: Exact in-network aggregation with integrity and confidentiality. Trans. Knowl. Data Eng. (2012)
Papamanthou, C., Tamassia, R.: Time and space efficient algorithms for two-party authenticated data structures. In: International Conference on Information and Communications Security, pp. 1–15. Springer (2007)
Papamanthou, C., Tamassia, R., Triandopoulos, N.: Optimal verification of operations on dynamic sets. In: Advances in Cryptology (2011)
Park, Y., Cafarella, M., Mozafari, B.: Neighbor-sensitive hashing. Proc. VLDB Endow. 9(3), 144–155 (2015)
Parsons, S., Wooldridge, M.: Game theory and decision theory in multi-agent systems. Auton. Agents Multi-Agent Syst. 5(3), 243–254 (2002)
Raimondi, F., Pecheur, C., Lomuscio, A.: Applications of model checking for multi-agent systems: verification of diagnosability and recoverability. In: Proceedings of Concurrency, Specification & Programming (CS&P), Warsaw University, pp. 433–444 (2005)
Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: the Workshop on Privacy and Security Aspects of Data Mining (2004)
Merkle, R.C.: Protocols for public key cryptosystems. In: Symposium on Security and Privacy (1980)
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Salmela, L., Tarhio, J.: Fast parameterized matching with q-grams. J. Discrete Algorithms 6(3), 408–419 (2008)
Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inf. Decis. Making (2009)
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, Boston, vol. 400, pp. 525–526 (2000)
Storer, M.W., Greenan, K., Long, D.D., Miller, E.L.: Secure data deduplication. In: Proceedings of the International Workshop on Storage Security and Survivability (2008)
Sun, W., Wang, B., Cao, N., Li, M., Lou, W., Hou, Y.T., Li, H.: Verifiable privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. IEEE Trans. Parallel Distrib. Syst. 25(11), 3025–3035 (2014)
Sutinen, E., Tarhio, J.: On using q-gram locations in approximate string matching. In: AlgorithmsESA 1995, pp. 327–340 (1995)
Turner V, Gantz JF, Reinsel D, Minton S (2014) The digital universe of opportunities: Rich data and the increasing value of the internet of things. IDC Analyze the Future
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)
Wilson, D.R.: Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage. In: The International Joint Conference on Neural Networks, pp. 9–14. IEEE (2011)
**ao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. Proc. VLDB Endow. (2008)
Yang, Y., Papadias, D., Papadopoulos, S., Kalnis, P.: Authenticated join processing in outsourced databases. In: Proceedings of the International Conference on Management of Data (2009)
Zhang, R., Shi, J., Zhang, Y., Zhang, C.: Verifiable privacy-preserving aggregation in people-centric urban sensing systems. J. Sel. Areas Commun. (2013a)
Zhang, Y., Wong, W.K., Yiu, S.M., Mamoulis, N., Cheung, D.W.: Lightweight privacy-preserving peer-to-peer data integration. In: VLDB Endowment, pp. 157–168 (2013b)
Zhang, Z., Hadjieleftheriou, M., Ooi, B.C., Srivastava, D.: Bed-tree: an all-purpose index structure for string similarity search based on edit distance. In: Proceedings of the International Conference on Management of Data (2010)
Zimmer, M.: The twitter archive at the library of congress: challenges for information practice and information policy. First Monday 20(7) (2015)
Acknowledgements
This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 1350324 and 1464800.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Dong, B., Wang, H.W. (2019). Efficient Authentication of Approximate Record Matching for Outsourced Databases. In: Bouabana-Tebibel, T., Bouzar-Benlabiod, L., Rubin, S. (eds) Theory and Application of Reuse, Integration, and Data Science. IEEE IRI 2017 2017. Advances in Intelligent Systems and Computing, vol 838. Springer, Cham. https://doi.org/10.1007/978-3-319-98056-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-98056-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98055-3
Online ISBN: 978-3-319-98056-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)