Efficient Authentication of Approximate Record Matching for Outsourced Databases

Dong, Boxiang; Wang, Hui Wendy

doi:10.1007/978-3-319-98056-0_6

Boxiang Dong^17,18 &
Hui Wendy Wang¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 838))

Included in the following conference series:

International Conference on Information Reuse and Integration

Abstract

Cloud computing enables the outsourcing of big data analytics, where a third-party server is responsible for data management and processing. A major security concern of the outsourcing paradigm is whether the untrusted server returns correct results. In this paper, we consider approximate record matching in the outsourcing model. Given a target record, the service provider should return all records from the outsourced dataset that are similar to the target. Identifying approximately duplicate records in databases plays an important role in information integration and entity resolution. In this paper, we design ALARM, an Authentication soLution of outsourced Approximate Record Matching to verify the correctness of the result. The key idea of ALARM is that besides returning the similar records, the server constructs the verification object (VO) to prove their authenticity, soundness and completeness. ALARM consists of four authentication approaches, namely \(VS^2\), E-\(VS^2\), G-\(VS^2\) and P-\(VS^2\). These approaches endeavor to reduce the verification cost from different aspects. We theoretically prove the robustness and security of these approaches, and analyze the time and space complexity for each approach. We perform an extensive set of experiment on real-world datasets to demonstrate that ALARM can verify the record matching results with cheap cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 85.59; Price includes VAT (Germany)

Softcover Book: EUR 106.99; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysis and Improvement of an Efficient and Secure Identity-Based Public Auditing for Dynamic Outsourced Data with Proxy

Verifiable algorithm for outsourced database with updating

Article 04 September 2017

RDAS: A Symmetric Key Scheme for Authenticated Query Processing in Outsourced Databases

Notes

References

Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proceedings of the VLDB Endowment (2006)
Google Scholar
Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: Proceedings of the Workshop on Privacy in the Electronic Society (2003)
Google Scholar
Bajaj, S., Sion, R.: CorrectDB: SQL engine with practical query authentication. Proc. VLDB Endow. 6(7), 529–540 (2013)
Article Google Scholar
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: ACM Sigmod Record, vol. 19, pp. 322–331. ACM (1990)
Article Google Scholar
Bonomi, L., **ong, L., Chen, R., Fung, B.: Frequent grams based embedding for privacy preserving record linkage. In: Proceedings of the International Conference on Information and Knowledge Management (2012)
Google Scholar
Bronstein, A.M., Bronstein, M.M., Guibas, L.J., Ovsjanikov, M.: Waldhash: sequential similarity-preserving hashing. Technical report CIS-2010-03, Technion, Israel (2010)
Google Scholar
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: Proceedings of the International Conference on Management of Data (2003)
Google Scholar
Chen, Q., Hu, H., Xu, J.: Authenticated online data integration services. In: Proceedings of the International Conference on Management of Data (2015)
Google Scholar
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13(1), 64–78 (2001)
Article Google Scholar
Comer, D.: Ubiquitous b-tree. Comput. Surv. (1979)
Google Scholar
Cormen, T.H.: Introduction to Algorithms. MIT press, Cambridge (2009)
MATH Google Scholar
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Article MathSciNet Google Scholar
Dong, B., Wang, H.: Efficient authentication of outsourced string similarity search. CoRR abs/1603.02727 (2016). http://arxiv.org/abs/1603.02727
Dong, B., Liu, R., Wang, W.H.: PraDa: Privacy-preserving data-deduplication-as-a-service. In: Proceedings of the International Conference on Conference on Information and Knowledge Management (2014)
Google Scholar
Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley, New York (2014)
MATH Google Scholar
Durham, E.A., Kantarcioglu, M., Xue, Y., Toth, C., Kuzu, M., Malin, B.: Composite bloom filters for secure record linkage. Trans. Knowl. Data Eng. (2014)
Google Scholar
Eidenbenz, S., Stamm, C.: Maximum clique and minimum clique partition in visibility graphs. In: Theoretical Computer Science, pp. 200–212 (2000)
Google Scholar
Faloutsos, C., Lin, K.I.: FastMap: A Fast Algorithm for Indexing, Data-mining and Visualization of Traditional and Multimedia Datasets, vol. 24 (1995)
Article Google Scholar
Feigenbaum, J.J.: A machine learning approach to census record linking (2016). Accessed 28 Mar 2016
Google Scholar
Goodrich, M.T., Papamanthou, C., Nguyen, D., Tamassia, R., Lopes, C.V., Ohrimenko, O., Triandopoulos, N.: Efficient verification of web-content searching through authenticated web crawlers. Proc. VLDB Endow. (2012)
Google Scholar
Gottapu, R.D., Dagli, C., Ali, B.: Entity resolution using convolutional neural network. Proc. Comput. Sci. 95, 153–158 (2016)
Article Google Scholar
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D., et al.: Approximate string joins in a database (almost) for free. In: VLDB, vol. 1, pp. 491–500 (2001)
Google Scholar
Guha, S., Mishra, N.: Clustering data streams. In: Data Stream Management, pp 169–187. Springer (2016)
Google Scholar
Hacigümüş, H., Iyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the International Conference on Management of Data (2002)
Google Scholar
Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameterized matching. ACM Trans. Algorithms (TALG) 3(3), 29 (2007)
Article MathSciNet Google Scholar
Hjaltason, G., Samet, H.: Contractive embedding methods for similarity searching in metric spaces. Technical report TR-4102, Computer Science Department (2000)
Google Scholar
Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. Trans. Pattern Anal. Mach. Intell. (2003)
Google Scholar
**, L., Li, C., Mehrotra, S.: Efficient record linkage in large data sets. In: International Conference on Database Systems for Advanced Applications (2003)
Google Scholar
Kamel, I., Faloutsos, C.: Hilbert r-tree: An improved r-tree using fractals. Technical report (1993)
Google Scholar
Koudas, N., Sarawagi, S., Srivastava, D.: Record linkage: similarity measures and algorithms. In: Proceedings of the International Conference on Management of Data (2006)
Google Scholar
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: International Conference on Data Engineering (2008)
Google Scholar
Li, F., Hadjieleftheriou, M., Kollios, G., Reyzin, L.: Dynamic authenticated index structures for outsourced databases. In: Proceedings of the International Conference on Management of Data (2006)
Google Scholar
Li, F., Hadjieleftheriou, M., Kollios, G., Reyzin, L.: Authenticated index structures for aggregation queries. Trans. Inf. Syst. Secur. (2010)
Google Scholar
Merkle, R.C.: Secure communications over insecure channels. Commun. ACM 21(4), 294–299 (1978)
Article Google Scholar
Miller, A., Hicks, M., Katz, J., Shi, E.: Authenticated data structures, generically. In: ACM SIGPLAN Notices, vol. 49, pp. 411–423 (2014)
Google Scholar
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)
Article Google Scholar
Morris, P.: Introduction to game theory. Springer, New York (2012)
MATH Google Scholar
Morton, G.M.: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company, New York (1966)
Google Scholar
Mykletun, E., Narasimha, M., Tsudik, G.: Authentication and integrity in outsourced databases. ACM Trans. Storage (TOS) 2(2), 107–138 (2006)
Article Google Scholar
O’Connell, R.T., Koehler, A.B.: Forecasting, time series, and regression: An applied approach, vol. 4. South-Western Pub (2005)
Google Scholar
Pang, H., Mouratidis, K.: Authenticating the query results of text search engines. Proc. VLDB Endow. (2008)
Google Scholar
Pang, H., Zhang, J., Mouratidis, K.: Scalable verification for outsourced dynamic databases. Proc. VLDB Endow. (2009)
Google Scholar
Papadopoulos, D., Papamanthou, C., Tamassia, R., Triandopoulos, N.: Practical authenticated pattern matching with optimal proof size. Proc. VLDB Endow. 8(7), 750–761 (2015)
Article Google Scholar
Papadopoulos, S., Wang, L., Yang, Y., Papadias, D., Karras, P.: Authenticated multistep nearest neighbor search. Trans. Knowl. Data Eng. (2011)
Google Scholar
Papadopoulos, S., Kiayias, A., Papadias, D.: Exact in-network aggregation with integrity and confidentiality. Trans. Knowl. Data Eng. (2012)
Google Scholar
Papamanthou, C., Tamassia, R.: Time and space efficient algorithms for two-party authenticated data structures. In: International Conference on Information and Communications Security, pp. 1–15. Springer (2007)
Google Scholar
Papamanthou, C., Tamassia, R., Triandopoulos, N.: Optimal verification of operations on dynamic sets. In: Advances in Cryptology (2011)
Google Scholar
Park, Y., Cafarella, M., Mozafari, B.: Neighbor-sensitive hashing. Proc. VLDB Endow. 9(3), 144–155 (2015)
Article Google Scholar
Parsons, S., Wooldridge, M.: Game theory and decision theory in multi-agent systems. Auton. Agents Multi-Agent Syst. 5(3), 243–254 (2002)
Article Google Scholar
Raimondi, F., Pecheur, C., Lomuscio, A.: Applications of model checking for multi-agent systems: verification of diagnosability and recoverability. In: Proceedings of Concurrency, Specification & Programming (CS&P), Warsaw University, pp. 433–444 (2005)
Google Scholar
Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: the Workshop on Privacy and Security Aspects of Data Mining (2004)
Google Scholar
Merkle, R.C.: Protocols for public key cryptosystems. In: Symposium on Security and Privacy (1980)
Google Scholar
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Article MathSciNet Google Scholar
Salmela, L., Tarhio, J.: Fast parameterized matching with q-grams. J. Discrete Algorithms 6(3), 408–419 (2008)
Article MathSciNet Google Scholar
Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inf. Decis. Making (2009)
Google Scholar
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, Boston, vol. 400, pp. 525–526 (2000)
Google Scholar
Storer, M.W., Greenan, K., Long, D.D., Miller, E.L.: Secure data deduplication. In: Proceedings of the International Workshop on Storage Security and Survivability (2008)
Google Scholar
Sun, W., Wang, B., Cao, N., Li, M., Lou, W., Hou, Y.T., Li, H.: Verifiable privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. IEEE Trans. Parallel Distrib. Syst. 25(11), 3025–3035 (2014)
Article Google Scholar
Sutinen, E., Tarhio, J.: On using q-gram locations in approximate string matching. In: AlgorithmsESA 1995, pp. 327–340 (1995)
Google Scholar
Turner V, Gantz JF, Reinsel D, Minton S (2014) The digital universe of opportunities: Rich data and the increasing value of the internet of things. IDC Analyze the Future
Google Scholar
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)
Article MathSciNet Google Scholar
Wilson, D.R.: Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage. In: The International Joint Conference on Neural Networks, pp. 9–14. IEEE (2011)
Google Scholar
**ao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. Proc. VLDB Endow. (2008)
Google Scholar
Yang, Y., Papadias, D., Papadopoulos, S., Kalnis, P.: Authenticated join processing in outsourced databases. In: Proceedings of the International Conference on Management of Data (2009)
Google Scholar
Zhang, R., Shi, J., Zhang, Y., Zhang, C.: Verifiable privacy-preserving aggregation in people-centric urban sensing systems. J. Sel. Areas Commun. (2013a)
Google Scholar
Zhang, Y., Wong, W.K., Yiu, S.M., Mamoulis, N., Cheung, D.W.: Lightweight privacy-preserving peer-to-peer data integration. In: VLDB Endowment, pp. 157–168 (2013b)
Article Google Scholar
Zhang, Z., Hadjieleftheriou, M., Ooi, B.C., Srivastava, D.: Bed-tree: an all-purpose index structure for string similarity search based on edit distance. In: Proceedings of the International Conference on Management of Data (2010)
Google Scholar
Zimmer, M.: The twitter archive at the library of congress: challenges for information practice and information policy. First Monday 20(7) (2015)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 1350324 and 1464800.

Author information

Authors and Affiliations

Department of Computer Science, Montclair State University, Montclair, NJ, 07043, USA
Boxiang Dong
Department of Computer Science, Stevens Institute of Technology, Hoboken, NJ, 07030, USA
Boxiang Dong & Hui Wendy Wang

Authors

Boxiang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wendy Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boxiang Dong .

Editor information

Editors and Affiliations

Laboratoire de Communication dans les Systèmes Informatiques, Ecole nationale Supérieure d’Informatique, Algiers, Algeria
Thouraya Bouabana-Tebibel
Laboratoire de Communication dans les Systèmes Informatiques, Ecole nationale Supérieure d’Informatique, Algiers, Algeria
Lydia Bouzar-Benlabiod
Space and Naval Warfare Systems Center Pacific, San Diego, CA, USA
Stuart H. Rubin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, B., Wang, H.W. (2019). Efficient Authentication of Approximate Record Matching for Outsourced Databases. In: Bouabana-Tebibel, T., Bouzar-Benlabiod, L., Rubin, S. (eds) Theory and Application of Reuse, Integration, and Data Science. IEEE IRI 2017 2017. Advances in Intelligent Systems and Computing, vol 838. Springer, Cham. https://doi.org/10.1007/978-3-319-98056-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-98056-0_6
Published: 08 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98055-3
Online ISBN: 978-3-319-98056-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Efficient Authentication of Approximate Record Matching for Outsourced Databases

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis and Improvement of an Efficient and Secure Identity-Based Public Auditing for Dynamic Outsourced Data with Proxy

Verifiable algorithm for outsourced database with updating

RDAS: A Symmetric Key Scheme for Authenticated Query Processing in Outsourced Databases

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Authentication of Approximate Record Matching for Outsourced Databases

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis and Improvement of an Efficient and Secure Identity-Based Public Auditing for Dynamic Outsourced Data with Proxy

Verifiable algorithm for outsourced database with updating

RDAS: A Symmetric Key Scheme for Authenticated Query Processing in Outsourced Databases

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation