Efficient Authentication of Approximate Record Matching for Outsourced Databases

  • Conference paper
  • First Online:
Theory and Application of Reuse, Integration, and Data Science (IEEE IRI 2017 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 838))

Included in the following conference series:

Abstract

Cloud computing enables the outsourcing of big data analytics, where a third-party server is responsible for data management and processing. A major security concern of the outsourcing paradigm is whether the untrusted server returns correct results. In this paper, we consider approximate record matching in the outsourcing model. Given a target record, the service provider should return all records from the outsourced dataset that are similar to the target. Identifying approximately duplicate records in databases plays an important role in information integration and entity resolution. In this paper, we design ALARM, an Authentication soLution of outsourced Approximate Record Matching to verify the correctness of the result. The key idea of ALARM is that besides returning the similar records, the server constructs the verification object (VO) to prove their authenticity, soundness and completeness. ALARM consists of four authentication approaches, namely \(VS^2\), E-\(VS^2\), G-\(VS^2\) and P-\(VS^2\). These approaches endeavor to reduce the verification cost from different aspects. We theoretically prove the robustness and security of these approaches, and analyze the time and space complexity for each approach. We perform an extensive set of experiment on real-world datasets to demonstrate that ALARM can verify the record matching results with cheap cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 85.59
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 106.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://aws.amazon.com/.

  2. 2.

    https://www.ibm.com/cloud/.

  3. 3.

    http://www.imdb.com/interfaces.

  4. 4.

    http://dblp.uni-trier.de/xml/.

  5. 5.

    https://www.openssl.org/.

  6. 6.

    https://gmplib.org/.

References

  • Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proceedings of the VLDB Endowment (2006)

    Google Scholar 

  • Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: Proceedings of the Workshop on Privacy in the Electronic Society (2003)

    Google Scholar 

  • Bajaj, S., Sion, R.: CorrectDB: SQL engine with practical query authentication. Proc. VLDB Endow. 6(7), 529–540 (2013)

    Article  Google Scholar 

  • Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: ACM Sigmod Record, vol. 19, pp. 322–331. ACM (1990)

    Article  Google Scholar 

  • Bonomi, L., **ong, L., Chen, R., Fung, B.: Frequent grams based embedding for privacy preserving record linkage. In: Proceedings of the International Conference on Information and Knowledge Management (2012)

    Google Scholar 

  • Bronstein, A.M., Bronstein, M.M., Guibas, L.J., Ovsjanikov, M.: Waldhash: sequential similarity-preserving hashing. Technical report CIS-2010-03, Technion, Israel (2010)

    Google Scholar 

  • Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: Proceedings of the International Conference on Management of Data (2003)

    Google Scholar 

  • Chen, Q., Hu, H., Xu, J.: Authenticated online data integration services. In: Proceedings of the International Conference on Management of Data (2015)

    Google Scholar 

  • Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13(1), 64–78 (2001)

    Article  Google Scholar 

  • Comer, D.: Ubiquitous b-tree. Comput. Surv. (1979)

    Google Scholar 

  • Cormen, T.H.: Introduction to Algorithms. MIT press, Cambridge (2009)

    MATH  Google Scholar 

  • De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)

    Article  MathSciNet  Google Scholar 

  • Dong, B., Wang, H.: Efficient authentication of outsourced string similarity search. CoRR abs/1603.02727 (2016). http://arxiv.org/abs/1603.02727

  • Dong, B., Liu, R., Wang, W.H.: PraDa: Privacy-preserving data-deduplication-as-a-service. In: Proceedings of the International Conference on Conference on Information and Knowledge Management (2014)

    Google Scholar 

  • Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley, New York (2014)

    MATH  Google Scholar 

  • Durham, E.A., Kantarcioglu, M., Xue, Y., Toth, C., Kuzu, M., Malin, B.: Composite bloom filters for secure record linkage. Trans. Knowl. Data Eng. (2014)

    Google Scholar 

  • Eidenbenz, S., Stamm, C.: Maximum clique and minimum clique partition in visibility graphs. In: Theoretical Computer Science, pp. 200–212 (2000)

    Google Scholar 

  • Faloutsos, C., Lin, K.I.: FastMap: A Fast Algorithm for Indexing, Data-mining and Visualization of Traditional and Multimedia Datasets, vol. 24 (1995)

    Article  Google Scholar 

  • Feigenbaum, J.J.: A machine learning approach to census record linking (2016). Accessed 28 Mar 2016

    Google Scholar 

  • Goodrich, M.T., Papamanthou, C., Nguyen, D., Tamassia, R., Lopes, C.V., Ohrimenko, O., Triandopoulos, N.: Efficient verification of web-content searching through authenticated web crawlers. Proc. VLDB Endow. (2012)

    Google Scholar 

  • Gottapu, R.D., Dagli, C., Ali, B.: Entity resolution using convolutional neural network. Proc. Comput. Sci. 95, 153–158 (2016)

    Article  Google Scholar 

  • Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D., et al.: Approximate string joins in a database (almost) for free. In: VLDB, vol. 1, pp. 491–500 (2001)

    Google Scholar 

  • Guha, S., Mishra, N.: Clustering data streams. In: Data Stream Management, pp 169–187. Springer (2016)

    Google Scholar 

  • Hacigümüş, H., Iyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the International Conference on Management of Data (2002)

    Google Scholar 

  • Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameterized matching. ACM Trans. Algorithms (TALG) 3(3), 29 (2007)

    Article  MathSciNet  Google Scholar 

  • Hjaltason, G., Samet, H.: Contractive embedding methods for similarity searching in metric spaces. Technical report TR-4102, Computer Science Department (2000)

    Google Scholar 

  • Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. Trans. Pattern Anal. Mach. Intell. (2003)

    Google Scholar 

  • **, L., Li, C., Mehrotra, S.: Efficient record linkage in large data sets. In: International Conference on Database Systems for Advanced Applications (2003)

    Google Scholar 

  • Kamel, I., Faloutsos, C.: Hilbert r-tree: An improved r-tree using fractals. Technical report (1993)

    Google Scholar 

  • Koudas, N., Sarawagi, S., Srivastava, D.: Record linkage: similarity measures and algorithms. In: Proceedings of the International Conference on Management of Data (2006)

    Google Scholar 

  • Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: International Conference on Data Engineering (2008)

    Google Scholar 

  • Li, F., Hadjieleftheriou, M., Kollios, G., Reyzin, L.: Dynamic authenticated index structures for outsourced databases. In: Proceedings of the International Conference on Management of Data (2006)

    Google Scholar 

  • Li, F., Hadjieleftheriou, M., Kollios, G., Reyzin, L.: Authenticated index structures for aggregation queries. Trans. Inf. Syst. Secur. (2010)

    Google Scholar 

  • Merkle, R.C.: Secure communications over insecure channels. Commun. ACM 21(4), 294–299 (1978)

    Article  Google Scholar 

  • Miller, A., Hicks, M., Katz, J., Shi, E.: Authenticated data structures, generically. In: ACM SIGPLAN Notices, vol. 49, pp. 411–423 (2014)

    Google Scholar 

  • Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)

    Article  Google Scholar 

  • Morris, P.: Introduction to game theory. Springer, New York (2012)

    MATH  Google Scholar 

  • Morton, G.M.: A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company, New York (1966)

    Google Scholar 

  • Mykletun, E., Narasimha, M., Tsudik, G.: Authentication and integrity in outsourced databases. ACM Trans. Storage (TOS) 2(2), 107–138 (2006)

    Article  Google Scholar 

  • O’Connell, R.T., Koehler, A.B.: Forecasting, time series, and regression: An applied approach, vol. 4. South-Western Pub (2005)

    Google Scholar 

  • Pang, H., Mouratidis, K.: Authenticating the query results of text search engines. Proc. VLDB Endow. (2008)

    Google Scholar 

  • Pang, H., Zhang, J., Mouratidis, K.: Scalable verification for outsourced dynamic databases. Proc. VLDB Endow. (2009)

    Google Scholar 

  • Papadopoulos, D., Papamanthou, C., Tamassia, R., Triandopoulos, N.: Practical authenticated pattern matching with optimal proof size. Proc. VLDB Endow. 8(7), 750–761 (2015)

    Article  Google Scholar 

  • Papadopoulos, S., Wang, L., Yang, Y., Papadias, D., Karras, P.: Authenticated multistep nearest neighbor search. Trans. Knowl. Data Eng. (2011)

    Google Scholar 

  • Papadopoulos, S., Kiayias, A., Papadias, D.: Exact in-network aggregation with integrity and confidentiality. Trans. Knowl. Data Eng. (2012)

    Google Scholar 

  • Papamanthou, C., Tamassia, R.: Time and space efficient algorithms for two-party authenticated data structures. In: International Conference on Information and Communications Security, pp. 1–15. Springer (2007)

    Google Scholar 

  • Papamanthou, C., Tamassia, R., Triandopoulos, N.: Optimal verification of operations on dynamic sets. In: Advances in Cryptology (2011)

    Google Scholar 

  • Park, Y., Cafarella, M., Mozafari, B.: Neighbor-sensitive hashing. Proc. VLDB Endow. 9(3), 144–155 (2015)

    Article  Google Scholar 

  • Parsons, S., Wooldridge, M.: Game theory and decision theory in multi-agent systems. Auton. Agents Multi-Agent Syst. 5(3), 243–254 (2002)

    Article  Google Scholar 

  • Raimondi, F., Pecheur, C., Lomuscio, A.: Applications of model checking for multi-agent systems: verification of diagnosability and recoverability. In: Proceedings of Concurrency, Specification & Programming (CS&P), Warsaw University, pp. 433–444 (2005)

    Google Scholar 

  • Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: the Workshop on Privacy and Security Aspects of Data Mining (2004)

    Google Scholar 

  • Merkle, R.C.: Protocols for public key cryptosystems. In: Symposium on Security and Privacy (1980)

    Google Scholar 

  • Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  Google Scholar 

  • Salmela, L., Tarhio, J.: Fast parameterized matching with q-grams. J. Discrete Algorithms 6(3), 408–419 (2008)

    Article  MathSciNet  Google Scholar 

  • Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inf. Decis. Making (2009)

    Google Scholar 

  • Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  • Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, Boston, vol. 400, pp. 525–526 (2000)

    Google Scholar 

  • Storer, M.W., Greenan, K., Long, D.D., Miller, E.L.: Secure data deduplication. In: Proceedings of the International Workshop on Storage Security and Survivability (2008)

    Google Scholar 

  • Sun, W., Wang, B., Cao, N., Li, M., Lou, W., Hou, Y.T., Li, H.: Verifiable privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. IEEE Trans. Parallel Distrib. Syst. 25(11), 3025–3035 (2014)

    Article  Google Scholar 

  • Sutinen, E., Tarhio, J.: On using q-gram locations in approximate string matching. In: AlgorithmsESA 1995, pp. 327–340 (1995)

    Google Scholar 

  • Turner V, Gantz JF, Reinsel D, Minton S (2014) The digital universe of opportunities: Rich data and the increasing value of the internet of things. IDC Analyze the Future

    Google Scholar 

  • Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)

    Article  MathSciNet  Google Scholar 

  • Wilson, D.R.: Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage. In: The International Joint Conference on Neural Networks, pp. 9–14. IEEE (2011)

    Google Scholar 

  • **ao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. Proc. VLDB Endow. (2008)

    Google Scholar 

  • Yang, Y., Papadias, D., Papadopoulos, S., Kalnis, P.: Authenticated join processing in outsourced databases. In: Proceedings of the International Conference on Management of Data (2009)

    Google Scholar 

  • Zhang, R., Shi, J., Zhang, Y., Zhang, C.: Verifiable privacy-preserving aggregation in people-centric urban sensing systems. J. Sel. Areas Commun. (2013a)

    Google Scholar 

  • Zhang, Y., Wong, W.K., Yiu, S.M., Mamoulis, N., Cheung, D.W.: Lightweight privacy-preserving peer-to-peer data integration. In: VLDB Endowment, pp. 157–168 (2013b)

    Article  Google Scholar 

  • Zhang, Z., Hadjieleftheriou, M., Ooi, B.C., Srivastava, D.: Bed-tree: an all-purpose index structure for string similarity search based on edit distance. In: Proceedings of the International Conference on Management of Data (2010)

    Google Scholar 

  • Zimmer, M.: The twitter archive at the library of congress: challenges for information practice and information policy. First Monday 20(7) (2015)

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 1350324 and 1464800.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boxiang Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dong, B., Wang, H.W. (2019). Efficient Authentication of Approximate Record Matching for Outsourced Databases. In: Bouabana-Tebibel, T., Bouzar-Benlabiod, L., Rubin, S. (eds) Theory and Application of Reuse, Integration, and Data Science. IEEE IRI 2017 2017. Advances in Intelligent Systems and Computing, vol 838. Springer, Cham. https://doi.org/10.1007/978-3-319-98056-0_6

Download citation

Publish with us

Policies and ethics

Navigation