Abstract
We live in an era characterized by the abundance of data, often conveying personal information. Linking this kind of data is useful for a variety of applications, raising, however privacy concerns. To address this issue, privacy preserving record linkage has emerged, with techniques aiming at revealing to the matching parties only the actually matching records. Since the linking process usually involves large volumes of data, it is evident that such procedures could benefit from outsourcing computation to cloud infrastructures taking advantage of parallel computing platforms, such as Apache Spark. In this paper, we extend a phonetic codes based method for privacy preserving string matching, by designing a new protocol specifically tailored to operate in parallel in the cloud, employing the map reduce model. We theoretically analyze its characteristics and empirically assess its performance, comparing it with the corresponding sequential algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bachteler, T., Reiher, J.: A test data generator for evaluating record linkage methods. Technical report, German RLC Work. Paper No. wp-grlc-2012-01 (2012)
Barhamgi, M., Benslimane, D., Ghedira, C., Benharkat, A.-N., Gancarski, A.L.: PPPDM – a privacy-preserving platform for data mashup. Int. J. Grid Util. Comput. 3(2/3), 175–187 (2012)
Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Workshop on Mining Complex Data, held at IEEE ICDM 2006, Hong Kong (2006)
Cruz, I.F., Tamassia, R., Yao, D.: Privacy-preserving schema matching using mutual information. In: Barker, S., Ahn, G.J. (eds.) Data and Applications Security XXI, pp. 93–94. Springer, Heidelberg (2007)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. CACM 7(3), 171–176 (1964)
Durham, E., Xue, Y., Kantarcioglu, M., Malin, B.: Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage. Inf. Fusion 13(4), 245–259 (2012)
Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: ACM EDBT (2010)
Karakasidis, A., Koloniari, G., Verykios, V.S.: Privacy preserving blocking and meta-blocking. In: ECML PKDD (2015)
Karakasidis, A., Verykios, V.S.: Privacy preserving record linkage using phonetic codes. In: BCI (2009)
Kissner, L., Song, D.: Privacy-preserving set operations. In: Shoup, V. (ed.) Advances in Cryptology – CRYPTO 2005. LNCS, pp. 241–257. Springer, Heidelberg (2005)
Kolb, L., Thor, A., Rahm, E.: Dedoop: efficient deduplication with hadoop. Proc. VLDB Endow. 5(12), 1878–1881 (2012)
Kuzu, M., Kantarcioglu, M., Durham, E., Malin, B.: A constraint satisfaction cryptanalysis of bloom filters in private record linkage. In: PETS, pp. 226–245 (2011)
Odell, M., Russell, R.C.: The soundex coding system. US Patents, 1261167 (1918)
Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.K.: Privacy preserving schema and data matching. In: ACM SIGMOD (2007)
Schnell, R., Bachteler, T., Reiher, J.: Privacy preserving record linkage using bloom filters. BMC Med. Inform. Decis. Mak. 9(1), 41 (2009)
Shanahan, J.G., Dai, L.: Large scale distributed data science using apache spark. In: KDD (2015)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud 2010 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Karakasidis, A., Koloniari, G. (2018). Phonetics-Based Parallel Privacy Preserving Record Linkage. In: Xhafa, F., Caballé, S., Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-69835-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-69835-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69834-2
Online ISBN: 978-3-319-69835-9
eBook Packages: EngineeringEngineering (R0)