Keywords

1 Introduction

The rise of digital technologies has led to the emergence of new ways in which physical spaces are perceived, experienced, and mapped. The availability of high-quality satellite imaginary amplified by the unprecedented possibilities for crowdsourcing geospatial data (Crampton 2009) has enabled the emergence of multiple platforms dealing with geographic information. It was followed by the integration of geographically aware computing in the architecture of major social media platforms (Crampton et al. 2013) and the growing capabilities for location tracking embedded into mobile devices (Sansurooah and Keane 2015). Together, these changes have given rise to a global collection of services which use the geographic data for different domains’ applications. These services are currently known as “geospatial Web” (Lake and Farley 2009) or simply “geoweb” (Crampton 2009).

The emergence of geoweb and associated “neographic” (Haklay et al. 2008) practices of publishing, sharing, and visualizing information about places and people has significant implications for academic research. In the large-scale review of studies, which use geospatial data, Stock (2018) demonstrates these data’s applicability to a wide range of research fields, including recreation, crisis management, and environment studies. The reasons for the growing adoption of geospatial data vary from the emergence of geographic datasets of unprecedented size and granularity (Elwood 2010) to the transformation of citizens into geospatial subjects able to produce and employ geospatial data (Wilson 2011). Their use is amplified by innovative possibilities for identifying and map** spatial relationships enabled by artificial intelligence and big data (VoPham et al. 2018).

Russia is not an exception from this trend as shown by the increasing number of studies applying geospatial data to study subjects varying from electoral fraud (Kobak et al. 2016) to Silk Road tourism (Tikunov et al. 2018) to Second World War remembrance (Bernstein 2016). Yet, the use of geospatial data in the context of Digital Russian Studies has its own specifics attributed both to the general role of digital media in Russia’s media ecologies and to the particular importance of geoweb in this geopolitical context. The explosive growth of Internet use in Russia in 2000s has led to profound changes in the language and communication in multiple domains, including politics (Gorham et al. 2014). The importance of the digital sphere increased even further since the beginning of the Ukraine crisis in 2014, which marked the unprecedented level of state-sponsored cynicism toward the media sphere and its growing instrumentalization for propaganda and disinformation (Roudakova 2017). In this “post-truth” (Surowiec 2017) environment, geolocation data that allow to (dis)prove the existence of specific phenomena emerge as a pivotal factor for making and refuting knowledge claims (e.g. about the presence of Russian troops in Ukraine (Shim 2018)).

To further contextualize the features of Russian geoweb and examine how recent studies address opportunities and challenges provided by it, I will start by reviewing different sources of geospatial data available in the Russian context, varying from social media platforms to crowdsourced databases. I will then move toward discussing possible ways of extracting location information; these ways vary from map** location names provided through metadata to specific geographic coordinates to extracting location from verbal or visual texts or inferring it from users’ activity on social media. Then, I will explore different ways to use geospatial data, such as map** spatial distribution of socioeconomic phenomena and analyzing mediatization of cultural practices. Additionally, I will briefly discuss the ethical aspects of some of these uses, in particular privacy-related issues. Finally, I will conclude by recap** the main arguments of the chapter and scrutinizing possible directions for future uses of geospatial data in Digital Russian Studies.

2 Data Acquisition

The first question to address in research using geoweb analysis is what kind of geospatial data is to be used. As I mentioned in the introduction, the distribution of location tracking devices and geographic crowdsourcing gave rise to multiple platforms dealing with geospatial data; however, the format, scope, and quality of these data vary significantly depending on the platform. To illustrate these differences, I will review below three categories of geospatial data sources, which are of particular relevance for Digital Russian Studies: crowdsourced databases, open datasets, and social media.

2.1 Crowdsourced Databases

The availability of digital technology allowing to collect, visualize, and share geospatial data led to the emergence of multiple projects focused on crowdsourcing “volunteered geographic information” (Goodchild 2007). Unlike established sources of geographic information (e.g., open datasets produced by national map** agencies), crowdsourced databases rely on the assumption that geospatial content produced and edited by multiple individuals will eventually converge on a consensus (Elwood et al. 2012, 575). While this assumption does not guarantee the same quality of data as in the case of sources produced by certified experts, crowdsourced projects are able to account for attributes which are usually omitted by traditional map** agencies and capture fast-changing phenomena (e.g., natural disasters).

The scope and focus of volunteered geographic projects vary significantly. Some of them, such as Open Street Map (OSM) (https://www.openstreetmap.org), HERE Maps (https://mapcreator.here.com/), or Yandex People’s Map (https://n.maps.yandex.ru/), pursue the goal of creating and sustaining free digital maps or gazetteers. Other projects have limited temporal and thematic focus. Both in Russia and in the West,Footnote 1 the latter projects often arise as part of the volunteered reporting in the context of natural disastersFootnote 2 or armed conflicts.Footnote 3

Both categories of crowdsourced databases can be of use in the context of Digital Russian Studies. Many global initiatives provide relevant geospatial information, which can be used for Russia-centered research. For instance, Quinn and Tucker (2017) used OSM and Wikimapia (https://wikimapia.org/) to trace how crowdsourced maps are used to represent disputed areas such as Crimea and found substantial differences in the ways geopolitical disagreements were visualized and addressed. These differences were attributed to the OSM hosting more contributions from Western editors, whereas Wikimapia was more eager to transmit the Russian official discourse. Other examples include the study by Kulakov, Petrina, and Pavlova (2016), who used Wikimapia for evaluating digital smart services utilized for cultural heritage tourism planning, and the research by Karbovskii et al. (2014), who employed Wikimapia for simulating the process of decision making based on 2012 Krymsk flooding.

Additionally, the Russian digital landscape features a number of crowdsourced projects dealing with specific domains or topics. Despite their variety and rich data, these projects have so far received limited acknowledgement in academic scholarship. A few exceptions include, for instance, Pomnite nas (Remember Us) (http://www.pomnite-nas.ru/), a project devoted to collecting geospatial data about Second World War monuments devoted to Soviet soldiers (Bernstein 2016). Another example is RosYama (Russian Pit) (https://rosyama.ru/), a civic project initialized by Alexei Navalny, a Russian anti-systemic opposition leader and activist, who created an online crowdsourced service for reporting road potholes (Ermoshina 2014). Many of these projects are not necessarily designed as sources of geolocation data for academic research and, instead, intended to facilitate social activities (e.g. collective remembrance of the Second World War in the case of Pomnite nas). Despite these non-academic goals, these projects can still be a valuable asset to the researcher who would creatively approach their data. For instance, geolocation data offered by RosYama can be used not only for research focused on the quality of Russian roads but also for visualizing geographic networks of activists or detecting the misappropriation of funds planned by specific regions for repairing the roads (for more projects like this, see Chap. 8).

The major challenge of using crowdsourced databases is related to the quality of data provided through them. Because of the lack of authoritative control over their content, the possibility of encountering errors or conscious distortions of geographic facts is higher than in the case of open datasets. In the larger crowdsourced databases such as Wikimapia or Yandex’s People Map, such probability is lower because of the large number of contributors, which leads to faster error correction. The situation with small databases is more challenging: often, these projects are curated by small groups of users with limited time and financial resources. While the data offered by them can still be valuable (or even unavailable by other means), it is important to critically assess their quality and identify (as much as possible) who contributes to the database and for what ends.

2.2 Open Datasets

Besides the rise of volunteered geographic initiatives, the unprecedented ease of accumulating and sharing geospatial data resulted in the distribution of open datasets produced by certified actors such as state institutions and map** agencies. Generated using authoritative geographic sources, these datasets are characterized by higher data quality when compared with crowdsourced databases. While the turn toward open data that are made available through official portals (for instance, data.gov or europeandataportal.eu) originated in the West, where these datasets are often employed in academic research on the subjects varying from earthquakes to government institutions’ budgets (Ding et al. 2018)—way of detecting location is by using geographic coordinates included in the document (meta)data. Such an approach is particularly applicable for data available from open datasets as well as crowdsourced databases, which often include specific geographic coordinates. Additionally, some platforms such as Twitter and VK provide geographic coordinates for some types of their content.Footnote 6 The question of validity of these data, however, is an open one: especially in the case of geotagged content from social media platforms, there is also a need to differentiate between the place in which the content was published and the place to which it actually refers.

Locationname extractionfrom documentmetadata. In the cases when geographic coordinates are not provided, one of the alternatives is to extract place names from the metadata. This process usually consists of two steps: (1) toponym recognition: that is, identification of the toponym in the body of the metadata (Sagcan and Karagoz 2015), and (2) toponym resolution: that is, assigning of geographic coordinates to the recognized toponyms (Lieberman and Samet 2012). An example of the platform for which this approach can be highly beneficial is VK, which allows users to report their place of residence in their profiles. While the platform itself does not connect these data to a geographic information system, the location names can be retrieved via VK API and then connected to a geocoding service (e.g., Google Maps) to generate geographic coordinates (Lee et al. 2013; Baucom et al. 2009) to color and texton histograms employed in the domain of computer vision (Gallagher et al. 2009). After identifying these features for the image in question, they can then be compared with large image datasets (e.g., coming from Flickr) to identify similarities.

Locationname extractionfrom video. Similar to location extraction from image, several other major approaches for location extraction can be identified. The first of them involves the use of video metadata (e.g., geographic coordinates produced by Global Positioning System [GPS] and compass sensors, which are embedded into video descriptions). This information can be used to identify the region in which the video was produced. Then geoinformation services (e.g., OSM) can be used to extract data about visible objects in the region (e.g., monuments or office buildings) in 2D or 3D.Footnote 9 Using OSM data, the descriptive tags can be generated for different objects in the area (e.g., their addresses and names), and then the object models can be compared with objects from the videos. Then, the relevance of each tag for specific video frame is calculated (i.e., to detect if a specific tag is present or absent on the frame) (Shen et al. 2011). While currently there are no papers applying this approach to the Russian context, such an approach is language-agnostic and can be implemented for any video independently of the language in which it is produced, until there is some metadata available.

The second approach can also be employed in the cases where no video metadata is present and combines audio and visual features of videos for identifying the location shown in them. For this purpose, a geotagged collection of videos is required; this collection is then used for calculating the audiovisual similarity with non-geotagged content. Specifically, visual frames and soundtrack are extracted from the videos, and then visual and acoustic features are computed for each one of them. Following the extraction, k-nearest neighbor algorithm (a classification algorithm, which classifies the unknown objects according to the classes of k closest neighbors) is employed to identify geotagged videos which look and sound more similar to the non-geotagged content (Sevillano et al. 2015).

4 Location Use

After the location is extracted and identified, it can be used for actual analysis. As I noted earlier, the advantage of geospatial data is their versatility and applicability for addressing a wide range of research questions. In this section, I scrutinize some of the uses of geoweb in the context of Digital Russian Studies, from map** the spatial distribution of phenomena and specifying actors’ identities and relationships to scrutinizing the role of location in online cultural practices.

Map** thespatial distributionof phenomena. An important feature of using geospatial data is its rich potential for map** socioeconomic and (geo)political phenomena. These phenomena vary from tourist mobility (e.g., spatial and temporal dimensions of tourist flows [Lu and Stepchenkova 2015; Kirilenko and Stepchenkova 2017]) to electoral fraud during Russia’s federal elections (Kobak et al. 2016) and migration patterns (Zamyatina and Piliasov 2013). Geotag data can be also used for map** contested phenomena, when official reports are often subjected to censorship or disinformation, such as the involvement of Russian troops in the conflict in Eastern Ukraine based on Instagram data (Czuperski et al. 2015). While the use of geospatial data for studying such contested cases often raises multiple concerns (e.g., concerning the reproducibility and the quality of available data), it can still provide valuable insights for researchers.

Specifying actor identities and relationships. Another common use of geospatial data is for identifying specific actors and tracking connections between them. Such tasks are particularly common for studies in political communication and/or disinformation online: for instance, Zelenkauskaite and Balduccini (2017) used geospatial data to specify the origins of users commenting on Russian language news portals in Lithuania, whereas Helmus et al. (2018) employed geoweb to track the identities of users involved in Russian propaganda and counter-propaganda efforts on Twitter. Disinformation, however, is not the only subject which can be investigated in this context as shown by Smirnov et al. (2016), who used geospatial data for identifying friendship networks between youngsters on VK.

Scrutinizingdigitizationof cultural practices. The use of geospatial data increasingly becomes part of the mediatization of cultural practices, varying from war remembrance to tourism. Bernstein (2016) in his research on Second World War memory in Russia showed how the formation of a geotagged database of Soviet monuments enriches existing memory practices by producing virtual embodiments of existing memorials and re-iterating the mainstream Soviet narrative of the war. Another example is the use of geotagged images as part of sharing—and sha**—travel experiences as shown by several studies focused on the use of geospatial information to examine vacation culture in Russia (Kirilenko and Stepchenkova 2017; Tikunov et al. 2018).

Exploring identity narration. Besides extensive possibilities for tracking phenomena, digital platforms also enable new ways of (re)-imagining individual and collective identities. A number of studies (Stefanidis et al. 2013; Croitoru et al. 2015) suggest that geospatial data can serve as a strong identifier of group belonging and individual self-expression. Examples of such identifications are, for instance, elements of individual user profiles on Wikipedia, where userboxes are employed for declaring individuals’ interests, preferences, and personal details (Neff et al. 2013). In the context of Digital Russian Studies, these means of self-expression often deal with geospatial data (e.g., place of residence [Dounaevsky 2014]) or geopolitical aspects of territoriality (e.g., belonging of the Southern Ossetia to Georgia). Another example is the use of geolocation data for producing digital maps of the conflict in Eastern Ukraine (e.g., MilitaryMaps or Liveuamap), which are used to visualize the borders of imagined communities (e.g., of the self-declared confederation of Novorossiya [Makhortykh 2018]).

5 Geospatial Data and Research Ethics

The advent of big data research opens unprecedented possibilities for studying different phenomena, but it also raises multiple ethical concerns. Some of these concerns are related to the general considerations of using big data for research purposes (e.g., acquiring proper permissions for data use [Richards and King 2014]), but some are rather specific for geospatial data, in particular in the Russian context. In this pre-final section, I will briefly discuss three of these concerns: validity, privacy, and reliability.

Privacy. Security and privacy are two key concerns of using geospatial data for research purposes (Li et al. 2016). The use of portable GPS receivers in mobile devices together with the enrichment of social media data with geospatial information raise concerns about the use of these data for tracking individuals’ actions and movements (Loebel 2012). While such data can be beneficial for many types of research, their use also requires the researcher to recognize the potential consequences for the privacy of users. Such consequences are particularly important in cases dealing with highly sensitive and/or polarizing subjects, where the use of geotag data can cause material or immaterial harm for research participants.

The privacy risks are even greater when geotag data is used for studying phenomena occurring in authoritarian states. An example of a highly privacy-sensitive subject is research on anti-government protests, where geospatial data can be (ab)used to identify the location of individual protesters and expose their involvement in the protests, thus bringing legal repercussions by the state. To address this concern, the use of personal data should be minimized and (pseudo)anonymization techniques should be used. On the official level, however, Russian legislation is still catching up with the notion of big data and their uses for research purposes (for an overview, see Zharova and Elin 2017). Consequently, the protection of the data rights of individuals in Russia is still significantly less strict than in the European Union (EU) countries, where it is regulated by the EU General Data Protection Regulation (GDPR).

Validity. Sheppard (2005, 74) defines validity as the degree to which the use of a specific instrument or finding is sound, defensible, and well-grounded for the issue at hand. The question of validity is of particular relevance for the use of geospatial data, because of their significant potential for being used for manipulation: both through the data and their visualization (Sheppard and Cizek 2009). In some cases, the use of data can be invalidated by their wrong interpretation (i.e., when geospatial information is used to prove a point which is incorrect), whereas in other cases obscure visualizations of data can mislead the public.

An example of the invalid use of geographic data is the contrasting reporting of the 2018 clashes near Chigari village in Eastern Ukraine. Both the Ukrainian authorities and pro-Russian insurgents produced video records showing them controlling certain landmarks, which were claimed to be related to the village in question. Despite these claims, not all of the shown landmarks were related to Chigari and eventually it was proven that the village was controlled by the Ukrainian army, but not before causing significant confusion. A possible way of increasing validity according to Sheppard and Cizek (2009, 2112) is to use more flexible and interactive approaches for geospatial data analysis, thus allowing end users more control over results’ reporting.

Reliability. Sheppard (2005) argues that reliability is another major concern of using geospatial data. Unlike validity, which focuses on the possible (ab)uses of geospatial data for drawing invalid conclusions, reliability concerns the internal consistency of analysis and the possibility to produce the same results under similar conditions. The issue of reliability is of particular importance for analyses produced via crowdsourced databases and social media as both data sources are subjected to frequent changes and often provide limited possibilities for consistent data access.

An example of reliability issues which accompany the use of geospatial data is MilitaryMaps mentioned earlier. This crowdsourced database aggregates updates from conflicts in the post-Soviet space as well as in the Middle East and provides geotags indicating the movement of troops and outbursts of violence. From September 2018, however, the previously open project switched toward paid subscription, which made it harder to recreate analyses based on MilitaryMaps data. Another reliability-related limitation of the project is its reliance on the GoogleMaps framework, which stores markers that are added to the map only for a one-year period. Sheppard and Cizek (2009) suggest that the main way to amend these and other reliability issues is the use of more prescriptive approaches to data analysis and presentation based on recognized quality standards.

6 Conclusions

In this chapter, I discussed the possible uses of data available through geoweb, the integrated and discoverable collection of geographically related web services and data (Lake and Farley 2009), in the context of the Digital Russian Studies. Increasingly employed for academic studies worldwide, geoweb data are of particular importance for Russia-centered digital research, serving both as a pivotal factor for making and verifying knowledge claims by regional actors and an integral means of producing individual and collective narratives on subjects varying from international conflicts (Shim 2018) to presidential elections (Kobak et al. 2016).

The use of geoweb for Digital Russian Studies is facilitated by the large volume of geospatial data available today. As I discussed above, these data can be divided into three broad categories according to their source: (a) crowdsourced databases, (b) open datasets, and (c) social media. Out of these three, social media data are the hardest to get and often require extensive pre-processing; however, they are also applicable to a wide range of research questions, in particular the ones related to inter-user interactions. Furthermore, the largest Russian social media platform, VK, provides public access to multiple forms of geospatial data (e.g. users’ self-declared place of residence/work and check-in data), thus enabling more possibilities for data collection than many Western platforms.

The research possibilities provided by geospatial data are amplified by the quickly develo** toolkit of analytical techniques used to extract geographic location from different data formats. The complexity of techniques varies depending on the data format. In the simplest scenarios, geographic coordinates or the location’s administrative address are provided in the metadata and only has to be matched with data from existing geographic information systems. In the more difficult scenarios, the location has to be extracted from the content or inferred from the user’s earlier activity using a combination of machine learning and geographic gazetteers. Much still can be done to better adapt these techniques to the Russophone context, in particular in terms of improving named entity recognition techniques and develo** better gazetteers. Yet, even in the current state of research, there are plenty of possibilities for using the mentioned techniques for different types of Russia-centered studies.

The importance of location extraction techniques is exemplified by the wide range of research questions to which Russian geospatial data are applicable. These research questions vary from the spatial distribution of socioeconomic and political phenomena, such as migration and electoral fraud, to the verification of knowledge claims about the presence of Russian troops in Eastern Ukraine to the analysis of mediatization of cultural practices of war remembrance and the exploration of narrative uses of geospatial data for communicating individual and collective identities.

Despite their significant potential for Digital Russian Studies, the future of geospatial data is not fully clear. The existing concerns about complex interrelations between privacy and geospatial data are amplified by the current calls for tightening the government’s control over the Internet in Russia, leading to increasing restrictions on data retrieval from Russian platforms’ APIs, including VK. These limitations might curb the amount of geospatial data available from social media; however, the growing number of open datasets and crowdsourced databases suggests that Russia’s geoweb will remain a valuable research venue for Digital Russian Studies for years to come.