About Leaks of Confidential Data in the Process of Indexing Sites by Search Crawlers

Kratov, Sergey

doi:10.1007/978-3-030-37487-7_16

Sergey Kratov ORCID: orcid.org/0000-0001-9068-9267¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11964))

Included in the following conference series:

International Andrei Ershov Memorial Conference on Perspectives of System Informatics

307 Accesses
3 Citations

Abstract

The large number of sites for very different purposes (online stores, ticketing systems, hotel reservations, etc.) collect and store personal information of their users, as well as other confidential data, such as history and results of user interaction with these sites. Some of such data, not intended for open access, nevertheless falls into the search output and may be available to unauthorized persons when specific requests are made. This article describes the reasons for such incidents occurrence and the basic recommendations for technical specialists (developers and administrators) that will help prevent leaks.

The research has been supported by the ICMMG SB RAS budget project N 0315-2016-0006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Web Crawler for an Anonymously Processed Information Database

A quantitative measure of the information leaked from queries to search engines and a scheme to reduce it

Article 16 December 2016

Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information

References

YandexBot crawls the links that the user views (in Russian). https://habr.com/en/post/262695/. Accessed 27 Apr 2019
Terms of Use of Yandex.Metrica service. https://yandex.ru/legal/metrica_termsofuse/. Accessed 29 Aug 2018
FAQ on the SMS texts leakage from Megafon site (in Russian). https://habr.com/en/post/124387/. Accessed 27 Apr 2019
Schellekens, M.H.M.: Are internet robots adequately regulated? Comput. Law Secur. Rev. 29(6), 666–675 (2013). https://doi.org/10.1016/j.clsr.2013.09.003
Article Google Scholar
Sun, Y., Councill, I.G., Giles, C.L.: The ethicality of web crawlers. In: Proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, pp. 668–675 (2010). https://doi.org/10.1109/wi-iat.2010.316
Giles, C.L., Sun, Y., Councill, I.G.: Measuring the web crawler ethics. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1101–1102 (2010). https://doi.org/10.1145/1772690.1772824
Good Practices for Capability URLs. https://www.w3.org/TR/capability-urls/. Accessed 29 Aug 2018
Yandex began to index Google Docs with passwords (in Russian). https://habr.com/en/post/416219/. Accessed 27 Apr 2019
Martin-Galan, B., Hernandez-Perez, T., Rodriguez-Mateos, D., et al.: The use of robots.txt and sitemaps in the Spanish public administration. PROFESIONAL DE LA INFORMACION 18(6), 625–630 (2009). https://doi.org/10.3145/epi.2009.nov.05
Kolay, S., D’Alberto, P., Dasdan, A., Bhattacharjee, A.: A larger scale study of robots.txt. In: Proceeding of the 17th International Conference on World Wide Web 2008, WWW 2008, pp. 1171–1172 (2008). https://doi.org/10.1145/1367497.1367711
Sun, Y., Zhuang, Z., Councill, I.G., Giles, C.L.: Determining bias to search engines from robots.txt. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, pp. 149–155 (2007). https://doi.org/10.1109/wi.2007.98
A Standard for Robot Exclusion. http://www.robotstxt.org/orig.html. Accessed 29 Aug 2018
Tong, W., **e, X.: A research on a defending policy against the Webcrawler’s attack. In: 2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication, ASID 2009 (2009). https://doi.org/10.1109/icasid.2009.5276948
Bates, M.E.: What makes information “public”? Online (Wilton, Connecticut) 28(6), 64 (2004)
Google Scholar
Robots.txt analysis. https://webmaster.yandex.ru/tools/robotstxt/. Accessed 29 Aug 2018
robots.txt Tester. https://www.google.com/webmasters/tools/robots-testing-tool. Accessed 29 Aug 2018
Blocking URLs with a robots.txt file. https://support.google.com/webmasters/answer/6062608. Accessed 29 Aug 2018
Robots meta tag and X-Robots-Tag HTTP header specifications. https://developers.google.com/search/reference/robots_meta_tag. Accessed 29 Aug 2018

Download references

Author information

Authors and Affiliations

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Novosibirsk, Russia
Sergey Kratov

Authors

Sergey Kratov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Kratov .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Nikolaj Bjørner
AP Ershov Institute of Informatics Systems, Novosibirsk, Russia
Irina Virbitskaite
School of Computer Science, University of Manchester, Manchester, UK
Andrei Voronkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kratov, S. (2019). About Leaks of Confidential Data in the Process of Indexing Sites by Search Crawlers. In: Bjørner, N., Virbitskaite, I., Voronkov, A. (eds) Perspectives of System Informatics. PSI 2019. Lecture Notes in Computer Science(), vol 11964. Springer, Cham. https://doi.org/10.1007/978-3-030-37487-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-37487-7_16
Published: 16 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37486-0
Online ISBN: 978-3-030-37487-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

About Leaks of Confidential Data in the Process of Indexing Sites by Search Crawlers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Web Crawler for an Anonymously Processed Information Database

A quantitative measure of the information leaked from queries to search engines and a scheme to reduce it

Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

About Leaks of Confidential Data in the Process of Indexing Sites by Search Crawlers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Web Crawler for an Anonymously Processed Information Database

A quantitative measure of the information leaked from queries to search engines and a scheme to reduce it

Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation