Abstract
The large number of sites for very different purposes (online stores, ticketing systems, hotel reservations, etc.) collect and store personal information of their users, as well as other confidential data, such as history and results of user interaction with these sites. Some of such data, not intended for open access, nevertheless falls into the search output and may be available to unauthorized persons when specific requests are made. This article describes the reasons for such incidents occurrence and the basic recommendations for technical specialists (developers and administrators) that will help prevent leaks.
The research has been supported by the ICMMG SB RAS budget project N 0315-2016-0006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
YandexBot crawls the links that the user views (in Russian). https://habr.com/en/post/262695/. Accessed 27 Apr 2019
Terms of Use of Yandex.Metrica service. https://yandex.ru/legal/metrica_termsofuse/. Accessed 29 Aug 2018
FAQ on the SMS texts leakage from Megafon site (in Russian). https://habr.com/en/post/124387/. Accessed 27 Apr 2019
Schellekens, M.H.M.: Are internet robots adequately regulated? Comput. Law Secur. Rev. 29(6), 666–675 (2013). https://doi.org/10.1016/j.clsr.2013.09.003
Sun, Y., Councill, I.G., Giles, C.L.: The ethicality of web crawlers. In: Proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, pp. 668–675 (2010). https://doi.org/10.1109/wi-iat.2010.316
Giles, C.L., Sun, Y., Councill, I.G.: Measuring the web crawler ethics. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1101–1102 (2010). https://doi.org/10.1145/1772690.1772824
Good Practices for Capability URLs. https://www.w3.org/TR/capability-urls/. Accessed 29 Aug 2018
Yandex began to index Google Docs with passwords (in Russian). https://habr.com/en/post/416219/. Accessed 27 Apr 2019
Martin-Galan, B., Hernandez-Perez, T., Rodriguez-Mateos, D., et al.: The use of robots.txt and sitemaps in the Spanish public administration. PROFESIONAL DE LA INFORMACION 18(6), 625–630 (2009). https://doi.org/10.3145/epi.2009.nov.05
Kolay, S., D’Alberto, P., Dasdan, A., Bhattacharjee, A.: A larger scale study of robots.txt. In: Proceeding of the 17th International Conference on World Wide Web 2008, WWW 2008, pp. 1171–1172 (2008). https://doi.org/10.1145/1367497.1367711
Sun, Y., Zhuang, Z., Councill, I.G., Giles, C.L.: Determining bias to search engines from robots.txt. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, pp. 149–155 (2007). https://doi.org/10.1109/wi.2007.98
A Standard for Robot Exclusion. http://www.robotstxt.org/orig.html. Accessed 29 Aug 2018
Tong, W., **e, X.: A research on a defending policy against the Webcrawler’s attack. In: 2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication, ASID 2009 (2009). https://doi.org/10.1109/icasid.2009.5276948
Bates, M.E.: What makes information “public”? Online (Wilton, Connecticut) 28(6), 64 (2004)
Robots.txt analysis. https://webmaster.yandex.ru/tools/robotstxt/. Accessed 29 Aug 2018
robots.txt Tester. https://www.google.com/webmasters/tools/robots-testing-tool. Accessed 29 Aug 2018
Blocking URLs with a robots.txt file. https://support.google.com/webmasters/answer/6062608. Accessed 29 Aug 2018
Robots meta tag and X-Robots-Tag HTTP header specifications. https://developers.google.com/search/reference/robots_meta_tag. Accessed 29 Aug 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kratov, S. (2019). About Leaks of Confidential Data in the Process of Indexing Sites by Search Crawlers. In: Bjørner, N., Virbitskaite, I., Voronkov, A. (eds) Perspectives of System Informatics. PSI 2019. Lecture Notes in Computer Science(), vol 11964. Springer, Cham. https://doi.org/10.1007/978-3-030-37487-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-37487-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37486-0
Online ISBN: 978-3-030-37487-7
eBook Packages: Computer ScienceComputer Science (R0)