New and emerging forms of data and technologies: literature and bibliometric review

Radanliev, Petar; De Roure, David

doi:10.1007/s11042-022-13451-5

New and emerging forms of data and technologies: literature and bibliometric review

Open access
Published: 30 July 2022

Volume 82, pages 2887–2911, (2023)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

New and emerging forms of data and technologies: literature and bibliometric review

Download PDF

4626 Accesses
37 Citations
21 Altmetric
Explore all metrics

Abstract

With the increased digitalisation of our society, new and emerging forms of data present new values and opportunities for improved data driven multimedia services, or even new solutions for managing future global pandemics (i.e., Disease X). This article conducts a literature review and bibliometric analysis of existing research records on new and emerging forms of multimedia data. The literature review engages with qualitative search of the most prominent journal and conference publications on this topic. The bibliometric analysis engages with statistical software (i.e. R) analysis of Web of Science data records. The results are somewhat unexpected. Despite the special relationship between the US and the UK, there is not much evidence of collaboration in research on this topic. Similarly, despite the negative media publicity on the current relationship between the US and China (and the US sanctions on China), the research on this topic seems to be growing strong. However, it would be interesting to repeat this exercise after a few years and compare the results. It is possible that the effect of the current US sanctions on China has not taken its full effect yet.

Open-Data, Open-Source, Open-Knowledge: Towards Open-Access Research in Media Studies

The Complexity of Datafication: Putting Digital Traces in Context

Research Trends in Social Media/Big Data with the Emphasis on Data Collection and Data Management: A Bibliometric Analysis

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Traditionally, multimedia was related to a form of communication that combines data forms such as text, audio, images, animations of video in an interactive presentation. In this article, the term multimedia is related to new and emerging forms of interactive communications, build upon new and emerging types of data, analytics and visualisation techniques.

Traditional media data structures followed a simple and purpose specific design, storing relational databases in a tabular row and column. Modern multimedia data structures follow polymorphic design, where data entries from IoT apps, web sites, social networks, mobile devices, enhanced with artificial intelligence (AI), is coupled with object-oriented programming. Polymorphous data structures, stored as objects with nested elements, can change fast when new analytical features are built in dynamic new media for data analytics.

Big data usually includes terabytes or even zettabytes of structured, semi-structured and unstructured data, generated by human-machine interactions and processes. AI with deep learning (DL) and machine learning (ML) algorithms can analyse big data and derive understanding on events that occurred, and/or as they happen in real-time, and even forecast future events, enabling decisions on strategic directions. Traditional relational databases would struggle to fit, capture, manage, and process big data with low latency. There are many examples of when low-latency is desired [12], e.g. low-latency online gaming enables more realistic environment e.g., metaverse, or in high-frequency trading where trades are executed automatically with optimised algorithms capturing changing market prices. More recent example is the use of different media (e.g., wearables, social platforms) for the COVID-19 digital monitoring and decision making.

The aim of this article is to present a comparative study of qualitative and quantitative review of new and emerging forms of data and the application of such data with new and emerging technologies. The objective is to identify recent state-of-the-art in spatiotemporal data, time-stamped data, open data, real time data, and high-dimensional data. The second objective is to identify how new types of data are used with AI (ML/DL). The gap identified and researcher in this article is the application of new technologies to new types of data, to resolve contemporary problems, such as Covid-19.

This article consists of a literature review section, which also includes a review of research methodologies. Followed by a bibliometric study, comparison of results between the qualitative review and the quantitative study. Results and analysis section, and a conclusion section.

2 Literature review – Qualitative survey of literature

2.1 Processes that generate new forms of data: Data acquisition

New and emerging forms of data (NEFD) are very different types of multimedia data (e.g. transactions, registrations, internet, tracking, image), that are broadly grouped into the NEFD category. NEFD are collected from different types of connectivity. The coverage and bandwidth capabilities of different types of connectivity, mean that some NEFD (e.g. 5G), have longer range but require high energy consumption at handsets, while other NEFD (e.g. 6LoWPAN and WirelessHART) cover short communication range [16]. This means that these technologies would not be sustainable in Industrial internet of things (IIoT), because of low are coverage, or high energy consumption. But the internet of things (IoT) communications are based on a very diverse set of multimedia data sharing:

Low Power Wide Area Networks (LPWANs) is a long-range communication that relies on small, cheap batteries that can last for years. LPWANs can connect to most IoT sensors, and are used for remote monitoring, smart metering, and many other functions that can operate with small blocks of data at a low rate. There are licensed LPWANs (NB-IoT, LTE-M) and unlicensed LPWANs (e.g. MYTHINGS, LoRa, Sigfox etc.).
Mobile (5G) is a high-speed and ultra-low latency connection, best suited for autonomous vehicles and augmented reality, real-time video surveillance, real-time mobile connected health, and time-sensitive industrial automation.
Zigbee wireless standard (IEEE 802.15.4) is used in short-range (<100 m), low-power, mesh protocol, (similar to e.g. Z-Wave, Thread etc.).
Bluetooth Low-Energy (BLE) is commonly integrated into fitness and medical wearables and smart home devices and data is visualised on smartphones.
Radio Frequency Identification (RFID) uses radio waves to transmit small amounts of data within a very short distance. RFID is mostly used in retail, logistics and supply chains.

In Table 1, the LPWAN (Sigfox, LoRa, Narrowband IoT (NB-IoT)) is most represented technology. LPWAN are low power technologies, with communications range from 1 km to 10 km, and NB-IoT node can last up to 10 years on a single battery life, and support 52,547 connections [16]. The weakness of LPWAN technology is the low data rate of up to 250kps, hence its most suitable as a complimenting technology. With the rise of connected devices using different technologies, distributing code data is becoming an issue. Code data dissemination methods are even considering the idea of vehicles being used in smart cities as communication systems for maximising coverage at low cost [77]. Another issue is that analytical methods and challenges for managing NEFD are also very different. It seems inevitable that artificial intelligence will need to be deployed to resolve many of these issues, especially in smart cities [2]. Deep learning algorithms can process big data at speed and high accuracy, but when the number of hidden layers increases from six, the deep learning process cannot be solved [42]. Some research methods are discussed briefly in the text below, just to understand how specific data types require very different approaches for analytics.

Table 1 Wireless connections for IoT endpoints

Full size table

2.2 Types of social science research done using NEFD

2.2.1 Social science research using NEFD from IoT devices

The IoT generates, stores and processes real time big data that is offloaded and executed in centralised and decentralised data centres, e.g. cloud and edge servers. One of the main challenges of this process is optimising the execution time (i.e. low latency) and lowering energy consumption. To resolve these challenges, a computational offloading method is proposed for IoT cloud-edge computing [82]. Similar methods are designed for improving energy efficiency and increasing trust in IoT data, based on avoidance of unnecessary and untrustworthy data [79]. Similar technological solutions are enhancing the role of big data analytics from IoT and the Industrial IoT (IIoT), with enhancements for (1) industrial time series modelling; (2) intelligent shop floor monitoring; (3) industrial microgrids; (4) monitoring machine health; (5) intelligent predictive and preventive maintenance [68].

2.3 New methods involving NEFD as a tool for public engagement

2.3.1 Social media data as an alternative to traditional survey data

There is an increasing interest in data from social media e.g. Twitter, being used to supplement and/or substitute survey data. Studies as early as 2008 showed a strong correlation between the ‘sentiment of tweets containing the word “jobs” and survey-based measures of consumer confidence’, but a more recent study on tweets as an alternative to survey responses, showed lack of evidence and concluded that the 2008 data was a ‘chance occurrence’ [13]. One of the possible explanations presented in the study was that Twitter has become mainstream and the text has evolved into a modern language that differentiates from the original sentiment. But even after adjustments for the language differences, the correlation from 2008 could not be reproduced for recent years.

Oher studies show alternative viewpoint. Different methodological approach, using Facebook and Twitter data, has shown that data collection from 1000 participants, without any survey data, can provide effective forecast scores for millions of people [83]. Twitter ‘sentiment’ has also been used to replace customer satisfaction surveys, and the social media approach was found advantageous over survey data [29]. Social media showed stronger potential than survey data, because of dynamic feedback showing reactions to new and product releases. The social media approach was determined to offer a continuous, automated and lower cost feedback process, that added new insights that were not recorded in annual survey data [29].

Social media data was also found as more beneficial than survey data in visitor monitoring of a national park in Finland [30]. The main benefits of social media data vs survey data are stated as the continuous real-time data on visitor’s behaviour and preferences. Even without asking any questions, the geotagged social media content provides sufficient data about who the visitors are, their activities, where in the park they went, among many other things. Geotagged social media data was also found useful in finding spatio-temporal activity patterns in areas where visitor monitoring was not taking place.

2.3.2 Big data as an alternative to traditional survey data

Big data is less costly and readily available, while survey data collection is slow and expensive, but the most promising outcome is to integrate both data sources [44]. The argument for integrating the low cost big data with the survey abilities to address specific questions with precise official statistics [20]. Nonetheless, the traditional value of survey data is diminishing with the declining survey participation, and in the future, it might be just one element of information data, which would comprise of different sources, including records and big data [55]. There is an argument that the survey research method, and the discipline of survey methodology, have adapted to technological innovations and will continue to evolve [15]. Because they represent strong and adaptable tools, supported by established theory and extensive evidence. Some studies applied manual grounded theory approach for improvement of computationally data driven intensive theory [8]. It is possible than survey methods would also evolve into the digital world by applying similar approach.

2.4 Background study for ML and DL in healthcare

In recent years (especially since the emergence of Covid-19), ML/DL approaches have been used extensively in healthcare settings. In this section we list some of the unexpected emerging innovations from the Covid-19 pandemic. Tables 2 and 3.

Table 2 Background study/review of recent ML/DL approaches used in healthcare settings

Full size table

Table 3 NEFD and new technologies

Full size table

3 Types of NEFD – Bibliometric (statistical) review of data records

3.1 Spatiotemporal data

Spatiotemporal data includes location and time of individual events, enabling analysis of how events change in physical locations, e.g. changes in population over time; tracking objects in motion. Spatiotemporal data can be used for identifying caused in the past of anomalies that occur or become visible in a different time, e.g. time stamp of product anomaly, compared with production time of the same product. Spatiotemporal data can be analysed for building interactive visual analytics and show abstract view of data points in similar and different regions. Spatiotemporal data can be used to analyse data from localised areas, or to map spatial distribution in individual regions or to compare regions. GeoBrick [61] is one integrative technique that does all that. Other studies use spatiotemporal data to predict: the urban flow using machine learning [81]; air quality interpolation and visualisation in real-time [46]; cloud-terminal ‘SuperMap’ big data engine [78]; climate summer temperature zones [80]; and even the cholera hotspots in Zambia [58]. Spatiotemporal data models are used in various studies, including self-sustainable IoT networks that harvest energy from cellular network [7], and vehicle trajectory data in smart city [85]. Figures 1, 2 and 3.

3.2 Time-stamped data

Time-stamped data is common in collecting user behaviour data and contains sequences of even time when data point was captured, or processed time when data point was collected. Time-stamped data enables understanding, predicting, and estimating individual actions over time, though journey analysis, individuals’ steps taken and changes in time and responses. Time-series databases require data storage that can serve the execution of smart infrastructure queries, but cloud-based time-series storage can be expensive. With the increasing computing power and memory in connected devices, time-series data storage and analytics can be moved to the edge with distributed hash tables (DHT) [48]. Edge computing enables different types of predictive analytics, including big data driven predictive catching in mobile wireless networks (MWN) at the wireless edge [10]. Different types of time-stamped data analytics (e.g. IoT, edge, fog and cloud [22]) can also be integrated to analyse the spatio-temporal mobility data (GPS logs), to predict locations of moving agents in real time, e.g. Mobi-IoST [24]. Edge computation is considered as speed-up over cloud computation, which could be of advantage in assessment based on street level data for on-edge traffic congestion [51]. Cloud computations on the other hand have proven abilities to obtain a ‘time synchronization accuracy below 0.1 ms’ in estimating the accuracy of remote virtual machines for accurate time synchronisation in the Industrial IoT [69].

3.3 Open data

Open data is free for everyone to use, and it has been used for many different purposes. For example, Open Spending App^{Footnote 1} that tracks government spending worldwide in a standardised format, Elgin^{Footnote 2} provides real-time data on roadworks in the UK, and DataViva^{Footnote 3} delivers multiple visualisations on the Brazilian economy. There are many examples of open data usage. But so far, open aggregation of anonymised values, trends or analytics from private data, without disclosing sources, presented as open data, and used for commercial purposed, is only present in academic techniques. Given the increased EU wide regulations on personal data usage, tech companies that are built upon data security, seem reluctant to engage in public commercial usage of anonymised private data presented as open data. In addition, some data is extremely difficult to use, such as unverified outdated data. Unverified data is classified as collected data entries, without understanding of relevance, value, accuracy, or even if it’s the correct data entry. One way of describing outdated data is when the evidence changes, but the data entries do not change. Using unverified outdated data could cause more damage than conducing analysis with no data Figures 4 and 5.

In addition to unverified outdated data, there is also dark data. Dark data is dormant data that is collected, processed, and stored during some form of regular activity, but it’s not used for any purpose, and is kept as a dormant digital information asset. Dark data is often collected and stored as operational data. Companies collect and store application logs and metrics, events data, and information from third parties and microservices applications. Operational analytics is usually focused on how to turn that data into insights. The real question should be how to reverse engineer operational data metrics into a data strategy mindset of getting the right information from the start. Data strategy should include rethinking of what constitutes datasets and creating new possibilities for working with data. According to IBM, companies analyse only 1% of their data,^{Footnote 4} and over 60% of the data collected loses value immediately.^{Footnote 5} Dark data is different from unverified outdated data, but most of the dark data is unstructured and difficult to analyse.

The Open Data Institute^{Footnote 6} argues that open data is only useful if it’s easy to understand, and can be tracked back to its origin, hence should be shared in standardised format. However, for open data to be more valuable, people need to trust in different interpretation of open data. Technologies such as the secure multi-party computation (also known as privacy-preserving computation),^{Footnote 7} and differential privacy,^{Footnote 8} could enable a broader interpretation, based on trust in privacy preserving technologies.

More work needs to be done on identifying and promoting the benefits for private companies to start sharing open data.^{Footnote 9} One example on how this could be taken forward is the PETRAS-IoT Data Management and Sharing Infrastructure (PEDASI) as a concept for a secure and legally trustworthy brokerage framework [26]. The PEDASI provides architecture for user, data and applications secure access to decentralised combined edge data sources. Similar privacy-preserving computation and differential privacy architectures can result with a significant increase in new data from IoT devices and edge computing. Edge devices with high computational power can be enhanced with AI embedded autonomous and remote controlled edge processing, while low computational power, data could be sent to a server as an image, or through a real-time media stream [56].

Different example of promoting open data sharing is to identify open access machine data, and to promote the development of a ‘digital single market’ [25]. Machine data can be explained as a biproduct of everyday activities, e.g. data from mobile phone calls, driving connected cars, computer logs, etc., produced in unpredictable formats and often ignored. Although the term is in use for over 50 years, the rise of IoT brings new light for analysing data near real-time. Edge analytics for example enables automated analytical computation, at the point of collection.

3.4 Real-time data

Real-time data is becoming of ever-increasing relevance with the rise of edge analytics and 5G technology. The instant and immediate analytics of data creates value in different domains, mostly in the domain of smart cities. Real-time data is most valuable in crucial infrastructure e.g. emergency services, traffic control. But real-time data also has strong value in commercial events, e.g. marketing and advertising delivered at the precise moment, based on location and preferences Figure 6.

Real-time data is also used for monitoring and securing crucial systems producing high-dimensional data, where timely detection of abnormal data is crucial. For timely detection of abnormal data, a scalable algorithm is proposed that enables real-time nonparametric anomaly detection in high-dimensional settings [49], through ‘Geometric Entropy Minimization’. Figs. 7 and 8.

3.5 High-dimensional data

High-dimensional data is defined as a high number of dimensions, in which the number of features exceeds the number of observations. For example, industrial high-dimensional big data reliability is studied and compared with a multi-method approach [52]. If we can analyse high-dimensional data, the research potential with is enormous. High-dimensional data has been used recently for IoT based smart farming in India [45], by applying improved genetic algorithm for extreme learning machine (IGA-ELM). High-dimensional data is also used in: finance, high resolution imaging, facial recognition technologies, etc.

High-dimensional data contains rich information, but also presents challenges in analysis and visualisation, while map** high-dimensional data into lower-dimensional spaces often leads to information loss, unless multidimensional scaling (MDS) is performed [76]. A new data visualisation method called TMAP, enables visualisation of very large high-dimensional data sets as minimum spanning trees [63]. While for predictive analytics, a Bayesian framework for function-on-scalars regression proposed with many predictors [47].

The increased application of high-dimensional data from IoT, fog and cloud computing, has created many privacy concerns. Differential privacy preserving has emerged as a method to introduce noise and confuse adversaries by mixing sensitive input with noisy results. However, differential privacy has been criticised for poor utility and high complexity, caused by introducing noise to already complex high-dimensional data. A new compressed sensing mechanism (CSM) has been proposed to provide more accurate results [84], for differential privacy preserving. Figures 9 and 10.

4 Comparison results and discussion: An overview of the international landscape

The response to NEFD around the world is conflicted between the values and erosion of privacy. Some authors have expressed significant concerns about the appropriation of big data, and even compared the capture of social data to ‘nothing less than a new social order, based on continuous tracking, and offering unprecedented new opportunities for social discrimination and behavioral influence’ [14]. Similar text call for disconnecting from the ‘cybernetic loop’, because even a google search reveals our intentions, and we should ‘fight…against the rise of intelligent machines’ [31].

Other authors have focused on the more practical aspect and values of NEFD, such as smart health and monitoring, and promoted advancements towards ‘autonomous wearable sensing for Internet of Things using big data analytics’ [19]. Educational research has called for the implementation of big data in education [17].

The collaborative map in Fig. 11 shows very weak connection between the US and UK in this area. The results are derived by the following search terms (i.e. we searched the Web of Science Core Collection): ALL FIELDS: (“Emerging Data”). This search produced 147 results, refined by: RESEARCH AREAS: (AUTOMATION CONTROL SYSTEMS OR BEHAVIORAL SCIENCES OR COMPUTER SCIENCE OR ARTS HUMANITIES OTHER TOPICS OR INSTRUMENTS INSTRUMENTATION OR PHYSICS OR SOCIOLOGY OR MATHEMATICS OR INFORMATION SCIENCE LIBRARY SCIENCE OR TELECOMMUNICATIONS OR COMMUNI CATION). To check if these results are caused by the search terms, we searched the Web of Science Core Collection again for: ALL FIELDS: (“Emerging Data”), Timespan: Last 5 years, but without the filtering. This produced 1775 results. Figures 12 and 13.

4.1 IoT data marketplace

IoT data marketplaces with built in artificial intelligence, machine learning and edge computing enable device owners to sell their data [73]. IoT data is securely collected, stored, shared and sold in marketplaces,^{Footnote 10} with an increased focus on IoT data quality [53]. Concepts for blockchain enhanced global data marketplaces, based on smart contracts, are already in development [6, 9, 50]. IoT and blockchain are the two disruptive technologies associated most frequently in recent literature on smart cities [62]. Relevant questions have also been answered on the abilities of data providers to fulfil the data collection in on-demand decentralised marketplaces [60]. IoT data in marketplaces is usually distributed in real-time, or stored in the cloud for future sale, and cloud server storage creates a single point of failure. To prevent risk from this single point of failure, the concept of ‘jointcloud’ is proposed, and data trade is supported with blockchain [32]. To achieve trust, transparency and non-reputability in a decentralised IoT marketplace with a limited trust, smart contracts can be used to mediate among trading brokers, data producers and consumers [6]. IoT data marketplaces are trying to securely monetarise data, and that requires development of strong reputation systems, hence the increased reference to blockchain in IoT data marketplaces [43]. The monetising of various IoT data, also requires different pricing mechanisms that ensure maximum value in different market settings [54]. Smart cities generate vast amount of sensor data among various different types of data from IoT devices captured in diverse data formats. This data needs to be transformed before sharing and loading [65]. Creating marketplace of services in smart community could be one method for synthesising and aggregating data resources that could be shared among different sets of open communities [21].

However, there are ethical limits of blockchain enabled marketplaces for IoT private data, and a lack of careful consideration of the techno-economic impact, could result with the opposite effect, leading to ‘the erosion of privacy for IoT users’ [33]. The concern is that even in a transparent private data market, we cannot be certain if the evaluation of diminishing data privacy reflects the established norms on privacy. The limitations of a traditional IoT and blockchain can be partially addressed with a permissioned demand driven analytics, enabling data democracy in the data supply chain [70].

4.2 Major NEFD research infrastructure initiatives and their impact

Existing enterprise based NEFD infrastructures include (1) supervisory control and data acquisition (SCADA); (2) manufacturing execution systems (MES); (3) enterprise resource planning (ERP) systems; (4) systems, customer relationship management (CRM) systems [68].

5 Results and analysis

5.1 Privacy-preserving data mechanisms for IoT data

Industrial IoT (IIoT) sensing as a service models suggest that industries are losing significant value from inadequate data sharing, which is usually caused by lack of property rights enforcement and relevant pricing models [75]. A new dynamic pricing mechanism, based on reinforcement learning, has been proposed for intelligent IoT data pricing [74]. Blockchain technology called ‘sensor data protection system’ (SDPS) has been proven successful in a tamper resistant IoT sensor data gathering, processing and exchange [11]. The main evaluation criteria for the SDPS is based on (1) tamper resistant in all stages of processing; (2) privacy preserving of the data owner; (3) capabilities to handle big data; and (4) economic feasibility of protection outweighing the cost. The use case study found that blockchain technology for security IoT sensor data, cannot assure tamper-proof without additional cross-validation, neither could assure data privacy without additional privacy such as IoT access control management [64]. Therefore, data protection should be implemented early in the processing, and hybrid blockchain is needed for certified data scaling, but economic feasibility is possible.

5.2 Findings on NEFD privacy preserving

With the rise of Bitcoin in 2009, we have seen a fast emergence of various blockchain technologies, which currently stand at over 17,000 on CoinMarketCap. ^{Footnote 11}While some analysts have been sceptical of blockchains and crypto in the past, the volume of new technologies can no longer be ignored. Since Covid-19, the world has increased the adoption of new technologies, such as virtual reality and the metaverse, which support the further adoption of blockchain technologies. In Table 4 below, we list some of the recent security solutions based on blockchain technologies.

Table 4 NEFD and security

Full size table

5.3 Findings on NEFD analytics methods

The marketplace in different countries can be compared with the following NEFD categories – as listed in Table 5. We included some of the existing solutions, but we imagine these solutions are constantly on the rise. In not very distant future, we can imagine blockchain technologies contributing to the rise of digital decentralised marketplaces for NEFD.

Table 5 NEFD and applied solutions

Full size table

6 Conclusion

This article conducted a literature review and bibliometric analysis on NEFD. The qualitative literature review engages with search of the most prominent journal and conference publications on this topic and produces some interesting insights on new methods involving NEFD as a tool for public engagement. The qualitative review outlines the processes that generate new forms of data (i.e. data acquisition) and reviews the types of social science research done using NEFD, including the social science research using NEFD from IoT endpoints. The qualitative review derives new methods involving NEFD as a tool for public engagement, with the use of social media data as an alternative to traditional survey data.

The second part of this article engages with statistical R analysis of Web of Science data records. The data records are searched for a few different types of NEFD data records, including spatiotemporal data, time-stamped data, open data, real-time data, and high-dimensional data. The results are somewhat surprising. By country and research output by key topics – (search parameters: social networks AND spatiotemporal data), confirms that the US and China are leading the research efforts in this area. But the number of publications are drop** in 2018, 2019 and 2020. Second unexpected results emerged from the analysis of country research output by key topics with a slightly edited search parameters: social networks AND time-stamped data. This analysis showed much lower research output from China, with US and Japan at the lead points. Since this result seems in conflict with the first result, we conducted further analysis of the same data records, and we discovered that US, China and UK have individual and isolated collaboration networks. Could this be interpreted that a lack of collaboration between the US and China affects the Chinese scientific output more than the US? We need further data to investigate that question.

The analysis of open data seems to provide some insights into this question. Considering the Chinese policies for open data sharing, we would expect China to be at the leading point of this area. However, the Web of Science data records show that the US is leading in scientific output on open data and the UK is leading in collaborative research on open data. Could this be caused by lack of regulatory compliance in analysing open data between different countries (e.g. EU- GDPR)? Again, we need further data to investigate that question. The statistical analysis of real-time data records presented closer collaboration between US and China, but it also showed that UK is performing strongly in this research area – while not collaborating strongly with either the US or China. The US research dominance changes in the area of high-dimensional data, where China seems to lead strongly – in research output and collaboration.

Finally, we conducted a statistical analysis of the international landscape of research collaborations on NEFD, and the result were again unexpected. The research collaboration between the US and China seems to be growing stronger over the past few years. To eliminate doubts and data bias, we conduced alternative search and we reanalysed the data – just to reach the same results. It appears that politics is not affecting science as much as we expected. However, scientific results (i.e. output) can often take many years and this results need to be reanalysed with updated data records in a few years’ time, to check if the results remain the same.

Data availability

all data and materials included in the article.

Notes

References

Akter L, Ferdib-Al-Islam I, Milon M, Mabrook Al-Rakhami S, Haque MR (2021) Prediction of cervical Cancer from behavior risk using machine learning techniques. SN Comput Sci 2021 23 2(3):1–10
Google Scholar
Allam Z, Dhunny ZA (2019) On big data, artificial intelligence and smart cities. Cities 89:80–91
Article Google Scholar
Al-Rakhami MS, Islam Md M, Islam Md Z, Asraf A, Sodhro AH, Ding W (2021) “Diagnosis of COVID-19 from X-rays Using Combined CNN-RNN Architecture with Transfer Learning,” medRxiv
Asraf A, Islam MZ, Haque MR, Islam MM (2020) Deep Learning Applications to Combat Novel Coronavirus (COVID-19) Pandemic. SN Comput Sci 1(6):363
Article Google Scholar
Ayon SI, Islam Md M, Hossain Md R (2020) “Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques,” https://doi.org/10.1080/03772063.2020.1713916
Bajoudah S, Dong C, Missier P Toward a decentralized, trust-less marketplace for brokered IoT data trading using blockchain, in Proceedings - 2019 2nd IEEE international conference on Blockchain. Blockchain 2019(2019):339–346
Benkhelifa F, ElSawy H, McCann JA, Alouini M-S (2020) “Recycling Cellular Energy for Self-Sustainable IoT Networks: A Spatiotemporal Study,” IEEE Trans Wirel Commun, pp. 1–1
Berente N, Seidel S, Safadi H (2019) “Data-driven computationally intensive theory development,” Information Systems Research, vol. 30, no. 1. INFORMS Inst for Operations Res and the Management Sciences, pp. 50–64
Brandão A, Mamede HS, Gonçalves R (2019) Trusted Data’s Marketplace. Adv Intell Sys Comput 930:515–527
Google Scholar
Chan CA, Yan M, Gygax AF, Li W, Li L, Chih-Lin I, Yan J, Leckie C (2019) “Big data driven predictive caching at the wireless edge,” in 2019 IEEE International Conference on Communications Workshops, ICC Workshops 2019 - Proceedings
Chanson M, Bogner A, Bilgeri D, Fleisch E, Wortmann F (2019) Blockchain for the IoT: privacy-preserving protection of sensor data. J Assoc Inf Syst 20(9):1271–1307
Google Scholar
Chen TJ, Sheu JP, Kuo YC (2020) Prefetching and caching schemes for IoT data in hierarchical edge computing architecture. Int J Ad Hoc Ubiquitous Comput 33(2):109–121
Article Google Scholar
Conrad FG, Gagnon-Bartsch JA, Ferg RA, Schober MF, Pasek J, Hou E (2019) “Social Media as an Alternative to Surveys of Opinions About the Economy,” Soc Sci Comput. Rev., p. 089443931987569
Couldry N, Mejias UA (2019) Data colonialism: rethinking big Data’s relation to the contemporary subject. Telev New Media 20(4):336–349
Article Google Scholar
Couper MP (2017) New developments in survey data collection. Annu Rev Sociol 43(1):121–145
Article Google Scholar
Dai H-N, Wang H, Xu G, Wan J, Imran M (2019) “Big data analytics for manufacturing internet of things: opportunities, challenges and enabling technologies,” Enterp Inf Syst, pp. 1–25
Daniel BK (Jan. 2019) Big data and data science: a critical review of issues for educational research. Br J Educ Technol 50(1):101–113
Article Google Scholar
Das S, Sadi MS, Haque Md A, Islam Md M (2019) “A Machine Learning Approach to Protect Electronic Devices from Damage Using the Concept of Outlier,” 1st Int Conf Adv Sci Eng Robot Technol 2019, ICASERT 2019
Din, Sadia, and Paul, Anand, “Erratum to ‘Smart health monitoring and management system: Toward autonomous wearable sensing for Internet of Things using big data analytics’ (Future Generation Computer Systems (2019) 91 (611–619), (S0167739X17315078), https://doi.org/10.1016/j.future.2017.12.059),” Futur Gener Comput Syst, vol. 108. Elsevier B.V., pp. 1350–1359
Eck A, Cazar ALC, Callegaro M, Biemer P (2019) “‘Big Data Meets Survey Science,’” Soc Sci Comput Rev, p. 089443931988339
Eltoweissy M, Azab M, Olariu S, Gracanin D (2019) “A new paradigm for a marketplace of services: Smart communities in the IoT era,” in 2019 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2019
Farahani B, Firouzi F, Chakrabarty K (2020) “Healthcare IoT,” in Intelligent Internet of Things, Springer International Publishing, pp. 515–545
Ferdib-Al-Islam, Akter L, Islam Md M (2021) “Hepatocellular Carcinoma Patient’s Survival Prediction Using Oversampling and Machine Learning Techniques,” Int Conf Robot Electr Signal Process Tech, pp. 445–450
Ghosh S, Mukherjee A, Ghosh SK, Buyya R (2019) “Mobi-IoST: Mobility-aware Cloud-Fog-Edge-IoT Collaborative Framework for Time-Critical Applications,” IEEE Trans Netw Sci Eng, pp. 1–1
Giannopoulou A (2019) Access and Reuse of Machine-Generated Data for Scientific Research. Erasmus Law Rev 12(2):1
Article Google Scholar
Hall W, Cox A, Crouch S, Schueler M, Graham J (2019) “PETRAS-IoT Data Management and Sharing Infrastructure: An Evolution of IoT Observatory (PEDASI),”
Haque Md R, Islam Md M, Iqbal H, Reza Md S, Hasan Md K (2018) “Performance Evaluation of Random Forests and Artificial Neural Networks for the Classification of Liver Disorder,” Int Conf Comput Commun Chem Mater Electron Eng IC4ME2 2018
Hasan M, Islam MM, Zarif MII, Hashem MMA (2019) Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet Things 7:100059
Article Google Scholar
Hasson SG, Piorkowski J, McCulloh I Social media as a main source of customer feedback – Alternative to customer satisfaction surveys, in Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining. ASONAM 2019(2019):829–832
Heikinheimo V, Di Minin E, Tenkanen H, Hausmann A, Erkkonen J, Toivonen T (2017) User-Generated Geographic Information for Visitor Monitoring in a National Park: A Comparison of Social Media Data and Visitor Survey. ISPRS Int J Geo-Information 6(3):85
Article Google Scholar
Helbing D, Frey BS, Gigerenzer G, Hafen E, Hagner M, Hofstetter Y, Van Den Hoven J, … Zwitter A (2018) “Will democracy survive big data and artificial intelligence?,” in Towards Digital Enlightenment: Essays on the Dark and Light Sides of the Digital Revolution, Springer International Publishing, pp. 73–98
Huang K, Zhang X, Mu Y, Rezaeibagha F, Wang X, Li J, **a Q, Qin J (2020) EVA: efficient versatile auditing scheme for IoT-based Datamarket in Jointcloud. IEEE Internet Things J 7(2):882–892
Article Google Scholar
Ishmaev G (2019) “The Ethical Limits of Blockchain-Enabled Markets for Private IoT Data,” Philos Technol, pp. 1–22
Islam Md M, Iqbal H, Haque Md R, Hasan Md K (2018) “Prediction of breast cancer using support vector machine and K-Nearest neighbors,” 5th IEEE Reg 10 Humanit Technol Conf 2017, R10-HTC 2017, vol 2018-January, pp 226–229
Islam MZ, Islam MM, Asraf A (2020) A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. Inf Med Unlocked 20:100412
Article Google Scholar
Islam MM, Haque MR, Iqbal H, Hasan MM, Hasan M, Kabir MN (2020) Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques. SN Comput Sci 1(5):290
Article Google Scholar
Islam M, Tayan O, Islam R, Islam S, Nooruddin S, Kabir MN, Islam R (2020) Deep Learning Based Systems Developed for Fall Detection: A Review. IEEE Access 8:166117–166137
Article Google Scholar
Islam MM, Islam MR, Islam MS (2020) An Efficient Human Computer Interaction through Hand Gesture Using Deep Convolutional Neural Network. SN Comput Sci 1(4):211
Article Google Scholar
Islam MM, Karray F, Alhajj R, Zeng J (2021) A Review on Deep Learning Techniques for the Diagnosis of Novel Coronavirus (COVID-19). IEEE Access 9:30551–30572
Article Google Scholar
Islam MR, Moni MA, Islam MM, Rashed-Al-Mahfuz M, Islam MS, Hasan MK, Hossain MS, … Lio P (2021) Emotion Recognition from EEG Signal Focusing on Deep Learning and Shallow Learning Techniques. IEEE Access 9:94601–94624
Article Google Scholar
Islam MR, Islam MM, Rahman MM, Mondal C, Singha SK, Ahmad M, Awal A, … Moni MA (2021) EEG Channel Correlation Based Model for Emotion Recognition. Comput Biol Med 136:104757
Article Google Scholar
Jan B, Farman H, Khan M, Imran M, Islam IU, Ahmad A, Ali S, Jeon G (2019) Deep learning in big data Analytics: A comparative study. Comput Electr Eng 75:275–287
Article Google Scholar
Javaid A, Zahid M, Ali I, Khan RJUH, Noshad Z, Javaid N (2020) Reputation System for IoT Data Monetization Using Blockchain. Lect Notes Netw Sys 97, Springer:173–184
Article Google Scholar
Johnson TP, Smith TW (2017) “Big data and survey research: Supplement or substitute?,” in Springer Geography, Springer, pp. 113–125
Kale AP, Sonavane SP (2019) IoT based Smart Farming: Feature subset selection for optimized high-dimensional data using improved GA based approach for ELM. Comput Electron Agric 161:225–232
Article Google Scholar
Kalo M, Zhou X, Li L, Tong W, Piltner R (2020) “Sensing air quality: Spatiotemporal interpolation and visualization of real-time air pollution data for the contiguous United States,” in Spatiotemporal Analysis of Air Pollution and Its Application in Public Health, Elsevier, pp. 169–196
Kowal, DR, Bourgeois DC (2020) “Bayesian Function-on-Scalars Regression for High-Dimensional Data,” J Comput Graph Stat, pp. 1–10
Krentz, Timothy., Dubey, Abhishek., and Karsai, Gabor, “Short paper: Towards an edge-located time-series database,” in Proceedings - 2019 IEEE 22nd international symposium on real-time distributed computing, ISORC 2019, 2019, pp. 151–154.
Kurt MN, Yilmaz Y, Wang X (2020) “Real-Time Nonparametric Anomaly Detection in High-Dimensional Settings,” IEEE Trans Pattern Anal Mach Intell, pp. 1–1
Lawrenz S, Sharma P, Rausch A (2019) “Blockchain technology as an approach for data marketplaces,” in ACM International Conference Proceeding Series, vol. Part F1481, pp. 55–59
Li Y, Wang H, Buckles B (2019) “Traffic congestion assessment based on street level data for on-edge deployment,” in Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, SEC 2019, pp. 289–291
Liu C, Jia G (2019) Industrial Big Data and Computational Sustainability: Multi-Method Comparison Driven by High-Dimensional Data for Improving Reliability and Sustainability of Complex Systems. Sustainability 11(17):4557
Article Google Scholar
Liu C, Nitschke P, Williams SP, Zowghi D (2020) Data quality and the internet of things. Computing 102(2):573–599
Article Google Scholar
Mao W, Zheng Z, Wu F (2019) “Pricing for Revenue Maximization in IoT Data Markets: An Information Design Perspective,” in Proceedings - IEEE INFOCOM, vol. 2019-April, pp. 1837–1845
Miller PV (2017) Is there a future for surveys? Public Opin Q | Oxford Acad 81(S1):205–212
Article Google Scholar
Moor L., Bitter L., Prado M De., Pazos N, Ouerhani N (2019) “IoT meets distributed AI - Deployment scenarios of Bonseyes AI applications on FIWARE,” in 2019 IEEE 38th international performance computing and communications conference, IPCCC 2019
Muhammad LJ, Islam MM, Usman SS, Ayon SI (2020) Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients’ Recovery. SN Comput Sci 1(4):206
Article Google Scholar
Mwaba J, Debes AK, Shea P, Mukonka V, Chewe O, Chisenga C, Simuyandi M, … Ali M (2020) Identification of cholera hotspots in Zambia: A spatiotemporal analysis of cholera data from 2008 to 2017. PLoS Negl Trop Dis 14(4):e0008227
Article Google Scholar
Nasr M, Islam MM, Shehata S, Karray F, Quintana Y (2021) Smart Healthcare in the Age of AI: Recent Advances, Challenges, and Future Prospects. IEEE Access 9:145248–145270
Article Google Scholar
Nguyen, DD, Ali MI (2019) “Enabling On-demand decentralized IoT collectability marketplace using blockchain and crowdsensing,” in Global IoT Summit, GIoTS 2019 - Proceedings
Park JH, Nadeem S, Kaufman A (2019) GeoBrick: exploration of spatiotemporal data. Vis Comput 35(2):191–204
Article Google Scholar
Perboli G, Manfredi A, Musso S, Rosano M A Decentralized Marketplace for M2M Economy for Smart Cities, in Proceedings - 2019 IEEE 28th international conference on enabling technologies: infrastructure for collaborative enterprises. WETICE 2019(2019):27–30
Probst D, Reymond JL (2020) Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform 12(1):12
Article Google Scholar
Qiu J, Tian Z, Du C, Zuo Q, Su S, Fang B (2020) “A survey on Access Control in the Age of Internet of Things,” IEEE Internet Things J, pp. 1–1
Raghavan S, Simon BYL, Lee YL, Tan WL, Kee KK (2020) Data Integration for Smart Cities: Opportunities and Challenges. Lect Notes Electr Eng 603:393–403
Article Google Scholar
Rahman MM, Manik Md MH, Islam Md M, Mahmud S, Kim JH (2020) “An automated system to limit COVID-19 using facial mask detection in smart city network,” IEMTRONICS 2020 - Int. IOT, Electron. Mechatronics Conf Proc
Rahman MM, Islam MM, Motaleb M, Manik H, Islam MR, Mabrook Al-Rakhami S (2021) Machine learning approaches for tackling novel coronavirus (COVID-19) pandemic. SN Comput Sci 2021 25 2(5):1–10
Google Scholar
Rehman U, Habib M, Yaqoob I, Salah K, Imran M, Jayaraman PP, Perera C (2019) The role of big data analytics in industrial Internet of Things. Futur Gener Comput Syst 99:247–259
Article Google Scholar
Rinaldi S, Bellagente P, Ferrari P, Flammini A, Sisinni E (2019) “Are cloud services aware of time? an experimental analysis oriented to industry 4.0,” in IEEE International Symposium on Precision Clock Synchronization for Measurement, Control, and Communication, ISPCS, vol. 2019-September
Sachdev D (2019) Enabling data democracy in supply chain using blockchain and IoT. J Manag 6(1):66–83
Google Scholar
Safial IA, Md MI (2019) Information Engineering and Electronic Business. Inf Eng Electron Bus 2:21–27
Google Scholar
Saha P, Sadi MS, Islam MM (2021) EMCNet: Automated COVID-19 diagnosis from X-ray images using convolutional neural network and ensemble of machine learning classifiers. Inform Med Unlocked 22:100505
Article Google Scholar
Sajan KK, Ramachandran GS, Krishnamachari B (2019) “Enhancing support for machine learning and edge computing on an iot data marketplace,” in AIChallengeIoT 2019 - Proceedings of the 2019 International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, pp. 19–24
Song B, Song J, Ye J (2020) A Dynamic Pricing Mechanism in IoT for DaaS: A Reinforcement Learning Approach. Adv Intell Sys Comput 1075:604–615
Google Scholar
Sørlie JT, Altmann J (2019) Sensing as a Service Revisited: A Property Rights Enforcement and Pricing Model for IIoT Data Marketplaces. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11819 LNCS:127–139
Google Scholar
Tang L (2020) High-dimensional data visualization. Nat Methods 17(2):129
Google Scholar
Teng H, Liu Y, Liu A, **ong NN, Cai Z, Wang T, Liu X (2019) A novel code data dissemination scheme for Internet of Things through mobile vehicle of smart cities. Futur Gener Comput Syst 94:351–367
Article Google Scholar
Wang S, Zhong Y, Wang E (2019) An integrated GIS platform architecture for spatiotemporal big data. Futur Gener Comput Syst 94:160–172
Article Google Scholar
Wang T, Qiu L, Sangaiah AK, Xu G, Liu A (2020) Energy-efficient and trustworthy data collection protocol based on Mobile fog computing in internet of things. IEEE Trans Ind Informatics 16(5):3531–3539
Article Google Scholar
**a J, Li J, Dong P, Yang K (Mar. 2020) An ArcGIS add-in for spatiotemporal data mining in climate data. Earth Sci Informatics 13(1):185–190
Article Google Scholar
**e P, Li T, Liu J, Du S, Yang X, Zhang J (2020) Urban flow prediction from spatiotemporal data using machine learning: A survey. Inf Fusion 59:1–12
Article Google Scholar
Xu X, Liu Q, Luo Y, Peng K, Zhang X, Meng S, Qi L (2019) A computation offloading method over big data for IoT-enabled cloud-edge computing. Futur Gener Comput Syst 95:522–533
Article Google Scholar
Yearwood M (2018) “Big data : a new alternative approach to sampling in the digital age,” University of Cambridge
Zheng Z, Wang T, Wen J, Mumtaz S, Bashir AK, Chauhdary SH (2019) “Differentially Private High-Dimensional Data Publication in Internet of Things,” IEEE Internet Things J, pp. 1–1
Zhou L, Li Q, Tu W (2020) An Efficient Access Model of Massive Spatiotemporal Vehicle Trajectory Data in Smart City. IEEE Access 8:52452–52465
Article Google Scholar

Download references

Acknowledgements

Eternal gratitude to the Fulbright Visiting Scholar Project.

Code availability

N/A – no code was developed; code was however used for running the R Studio analysis.

Funding

This work was funded by the ESRC [grant number:].

Author information

Authors and Affiliations

Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford, UK
Petar Radanliev & David De Roure

Authors

Petar Radanliev
View author publications
You can also search for this author in PubMed Google Scholar
David De Roure
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed equally. Both authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Dr. Petar Radanliev, and Professor David De Roure. The first draft of the manuscript was written by Dr. Petar Radanliev and both authors commented on previous versions of the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Petar Radanliev.

Ethics declarations

Competing interests

On behalf of all authors, the corresponding author states that there is no conflict nor competing interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Radanliev, P., De Roure, D. New and emerging forms of data and technologies: literature and bibliometric review. Multimed Tools Appl 82, 2887–2911 (2023). https://doi.org/10.1007/s11042-022-13451-5

Download citation

Received: 30 May 2021
Revised: 26 February 2022
Accepted: 02 July 2022
Published: 30 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13451-5

Keywords

MSC classification codes

68 M11 (Internet topics).

JEL classification codes

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

New and emerging forms of data and technologies: literature and bibliometric review

Abstract

Similar content being viewed by others

Open-Data, Open-Source, Open-Knowledge: Towards Open-Access Research in Media Studies

The Complexity of Datafication: Putting Digital Traces in Context

Research Trends in Social Media/Big Data with the Emphasis on Data Collection and Data Management: A Bibliometric Analysis

1 Introduction

2 Literature review – Qualitative survey of literature

2.1 Processes that generate new forms of data: Data acquisition

2.2 Types of social science research done using NEFD

2.2.1 Social science research using NEFD from IoT devices

2.3 New methods involving NEFD as a tool for public engagement

2.3.1 Social media data as an alternative to traditional survey data

2.3.2 Big data as an alternative to traditional survey data

2.4 Background study for ML and DL in healthcare

3 Types of NEFD – Bibliometric (statistical) review of data records

3.1 Spatiotemporal data

3.2 Time-stamped data

3.3 Open data

3.4 Real-time data

3.5 High-dimensional data

4 Comparison results and discussion: An overview of the international landscape

4.1 IoT data marketplace

4.2 Major NEFD research infrastructure initiatives and their impact

5 Results and analysis

5.1 Privacy-preserving data mechanisms for IoT data

5.2 Findings on NEFD privacy preserving

5.3 Findings on NEFD analytics methods

6 Conclusion

Data availability

Notes

References

Acknowledgements

Code availability

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MSC classification codes

JEL classification codes

Search

Navigation