1 Introduction

Big data represents the leading edge of innovation, competition, and productivity [1]. A multitude of advanced analytical algorithms and applications harness big data to pioneer novel theories and technologies, such as artificial intelligence and edge computing. In the midst of the big data surge, the processes of data sharing and exchanging occur ubiquitously and continuously. Such sharing and exchanging take place between specific entities, be they individuals, devices, or databases. These entities relay information amongst themselves, with the mechanisms of transmission spanning electronic methods or specialized systems [2]. Notably, while data exchanging entails a bi-directional transfer, data sharing is a unidirectional process. In recent decades, the paradigm of hosting, sharing, and exchanging data in the cloud has emerged as the predominant design choice. This has led to the rise of third-party platforms as the preferred means for participants in data sharing and exchanging. For instance, Amazon introduced the “Amazon Web Service (AWS) Data Exchange”, a platform that allows customers to tap into third-party data sources within the AWS marketplace. This service ensures reliable access for customers on an unprecedented scale and doubles as a streamlined tool for data ingestion and utilization [3].

Data sharing and exchanging offer a plethora of benefits, including fee-less transactions, tamper resistance, enhanced services, high transparency, and real-time engagement for all involved parties. A pertinent example is Google Drive’s collaboration with WhatsApp [4], allowing users to back up their chat histories and media to the cloud, ensuring data portability and recovery without transaction fees. Nevertheless, this paradigm faces a multitude of challenges:

  • A predominant challenge in data sharing and exchanging concerns the willingness of ordinary individuals to engage and share their data resources. Concomitant issues of privacy, security, and costs (e.g., energy consumption, network bandwidth) might deter participants, especially if the rewards are not deemed sufficient. Thus, crafting mechanisms to incentivize participation becomes a pressing priority.

  • There exists an inherent trade-off between data privacy and accessibility. For cloud-based data-sharing and exchanging platforms, striking a balance between security and efficiency becomes pivotal during mechanism design.

  • As the digital market for data sharing and exchanging evolves, devising an equitable data pricing strategy emerges as a new challenge. The quest for an efficient digital market necessitates mechanisms that price data transparently while safeguarding data privacy.

Given these challenges, designing incentive-based mechanisms stands out as a pivotal research area within the realm of data sharing and exchanging.

In recent years, the design of incentive mechanisms has become increasingly prevalent in crowdsensing applications within the realm of computing. One illustrative case is Waze, a crowdsourcing-based navigation application. The platform introduced a mechanism termed “Awazeing Race” to motivate both existing and prospective users to engage with the Major Traffic Event (MTE) tool in the Waze Map Editor (WME). This initiative was aimed at enhancing the volume of user-contributed MTEs and closures, thereby enriching the overall Waze experience for local users. A review of pertinent literature reveals that incentive mechanisms in computing can be broadly classified into three categories: entertainment, service, and monetary incentives [5]. Entertainment-centric incentives predominantly employ location-based mobile games to spur participation [6,7,8]. Service-oriented incentives, on the other hand, leverage the promise of enhanced service benefits as a motivational strategy. For instance, in GPS applications, users not only consume data but also contribute to its generation, driven by the aspiration for superior service quality [9, 10].

Monetary-based incentives have emerged as a prevalent strategy to motivate mobile sensors to participate. Within this domain, price determination and the criteria for winner selection have piqued the interest of numerous researchers. For instance, ride-sharing apps like Uber use dynamic pricing algorithms that incentivize drivers (mobile sensors) by increasing fares during peak demand times, effectively balancing supply and demand [11]. Nevertheless, the intricacies of designing incentive mechanisms escalate when applied to the data-sharing and exchanging process. Our analysis reveals that, relative to the aforementioned categories, monetary incentives garner more extensive attention in computing research. As delineated in Table 2, a significant fraction of researchers have gravitated towards leveraging game theory algorithms in computing to objectives such as utility maximization [12,13,14], profit maximization [15,16,17], and social welfare maximization [13, 18,19,20,21]. Concurrently, there are studies employing economic incentives to attain analogous goals. Notwithstanding this proliferation, it is noteworthy that only a limited number of researchers have delved into the holistic design of incentive mechanisms within the entire data-sharing and exchanging platform. A real-world example is the use of cashback rewards by credit card companies to encourage consumers to share transaction data, which is then utilized for personalized marketing and data analysis [22].

We decompose the data sharing and exchanging process into four principal components: data creation, data storage, data access, and data privacy preservation. Contrary to the classifications of earlier researchers, we posit that it’s redundant to segregate incentive mechanisms into entertainment-based, service-based, and money-based categories. Instead, an amalgamation of both monetary and non-monetary incentives is imperative to galvanize holistic participation in the data-sharing and exchanging ecosystem. For instance, on such a platform, integrating service-based with monetary incentives can be an efficacious strategy. This would entail providing participants with both service credits and direct monetary rewards. Notably, even though providers in the data-sharing and exchanging paradigm might concurrently serve as requesters, the allure of service credits remains undiminished, proving invaluable when they seek access to future data resources. Take Microsoft Azure, for instance, which provides credits to users who contribute to its machine learning datasets, encouraging a reciprocal data-sharing ecosystem [23].

In the ensuing sections of this survey, we commence by presenting a preliminary definition of data sharing, data exchanging, and the underlying incentive mechanisms. Subsequently, we delve into a thorough review and discourse on the associated incentive mechanisms and optimization algorithms that underpin the life cycles of data sharing and exchanging. Ultimately, we shed light on the prevailing challenges and opportunities encompassing data creation, storage, access, and privacy preservation in the context of data exchange and sharing.

Our primary contributions to this domain can be distilled as follows:

  • We put forth a nuanced taxonomy of the incentive-driven processes in data sharing and exchanging, predicated on its lifecycle. Concurrently, we encapsulate the challenges inherent to each phase.

  • Our discourse extends to a meticulous examination of incentive mechanisms pivotal to data sharing and exchanging. Although we bifurcate these mechanisms into monetary and non-monetary classifications, our stance diverges from preceding researchers; we advocate for a synergistic integration of both categories to stimulate greater participation in data sharing and exchanging.

  • For the first time, we systematically deconstruct the lifecycle of data sharing and exchanging into its quartet of elements: data creation, data storage, data access, and data privacy preservation. Each segment is underpinned by a comprehensive exploration to serve as a point of reference. Additionally, we provide an exhaustive analysis of the nuances of privacy preservation spanning the entire lifecycle.

  • We highlight five emergent research trajectories in the ambit of incentive mechanisms for data sharing and exchanging. These span computational efficiency, trustworthiness, data privacy and security, data management system intricacies, and data quality, among others. Within each avenue, we discern current lacunae and prospective directions. A notable proposition is the conceptualization of a system rooted in blockchain technology for data sharing and exchange, synergized with diverse incentive mechanisms. The integration of such mechanisms with deep learning algorithms, we posit, will pave the way for the next generation of incentive-centric data-sharing and exchanging frameworks.

The remainder of this paper is organized in a systematic fashion. Section 2 delineates the preliminary definitions central to our discussion. In Sect. 3, we present a review of the pertinent existing literature. Section 4 offers a comprehensive analysis of the lifecycle associated with data sharing and exchanging. Challenges inherent to the field are highlighted in Sect. 5, while potential research avenues are explored in Sect. 6. Finally, Sect. 7 elucidates the research opportunities present within various incentive-based data-sharing and exchanging applications.

2 Preliminary definition

2.1 Data sharing

Data sharing occurs among n distinct entities, which can be represented by the list \(\mathbb {B}=(\beta _1, \beta _2, \beta _3,..., \beta _n)\). This sharing can take various forms: it can transpire between individuals with extensive databases, between individuals and public organizations, or between public and private entities. We can denote a collection of datasets as \(\mathbb {D}= ( d _1, d _2, d _3,..., d _m)\). In the context of data sharing, an entity represented by \(\beta _i\) can access the dataset represented by \(d _j\) provided they obtain the requisite authority from another entity. Database access grants have traditionally been utilized as a mechanism for data sharing. To access such a granted database, it is imperative for user accounts to be part of one or more user groups. Authorization to the databases is then conferred upon these users. For instance, the SQL grant database is a widely recognized method employed across various database systems, such as SQL Server [24].

Cloud-based data sharing has become an ubiquitous method in the contemporary era of data dissemination. Cloud storage and computing serve as pivotal elements in the domains of data sharing and exchange. These cloud infrastructures utilize standard protocols to provide access to a myriad of configurable resources, encompassing applications, storage solutions, networks, servers, and various services. The concept of using the cloud for data sharing can trace its origins to an internal document of Compaq in 1996. This idea matured over the subsequent decade, culminating in the advent of cloud computing. In 2006, Amazon took a significant step in this direction by launching the Elastic Compute Cloud (EC2) to bolster its Amazon Web Services. Following suit, Google introduced the Google App Engine in 2008. The 2010s witnessed the emergence of sophisticated Smarter Computing frameworks, subsequent to IBM’s unveiling of IBM SmartCloud. Within these advanced data sharing and computing architectures, cloud-based data sharing is an integral component [25].

Utilizing cloud-based platforms for data sharing offers a multitude of advantages, notably reducing costs and infrastructural management overheads. Users benefit from the “pay-as-you-go” model, incurring costs only for data processing and storage, be it in a public or private cloud. Furthermore, cloud services are renowned for their scalability, adeptly adjusting to varying demands, expediting development tasks, and delivering efficient computing solutions [26, 27]. Data sharing via the cloud empowers entities to seamlessly access data remotely [28]. Numerous applications leverage cloud data-sharing capabilities, enhancing quality of life and productivity. For instance, Google Docs [29] furnishes a collaborative environment for users to disseminate diverse data types like documents and images. Similarly, DocuSign [30] facilitates the sharing of documents requiring signatures. However, with the exponential proliferation of IoT devices and the advent of 5G technology, the demands on data sharing have intensified. Critical questions arise, such as the cloud’s capacity to process and store voluminous data from myriad IoT devices, and whether latency issues can be effectively managed for time-sensitive applications like autonomous vehicles. Addressing these concerns, recent years have seen the emergence of cutting-edge edge computing paradigms, including fog computing (FC) [31], mobile edge computing (MEC) [32], and mobile cloud computing (MCC) [33]. Shifting data sharing to the edge has proven increasingly efficient and prevalent. These avant-garde distributed computing frameworks share a core principle: rather than relying on centralized cloud resources, they harness computational power closer to the end-users - typically through smart or edge devices. Such a configuration optimizes data sharing by processing data at the edge, markedly reducing transmission times. A practical example of MEC is the deployment of edge servers by telecom operators to provide low-latency gaming experiences on mobile devices. MCC’s real-world application can be seen in services like iCloud, which seamlessly integrate edge devices with cloud storage to optimize data accessibility and processing [34].

2.2 Data exchanging

Data exchange, while falling under the umbrella of data sharing, is distinct in its bidirectional nature. In this paradigm, entities engaged in the data-sharing ecosystem reciprocally exchange their resources to fulfill their respective data requirements.

Table 1 Data exchanging highlights

Data exchange, while being a subset of data sharing, is distinct due to its bidirectional nature. This implies that different entities participating in the data-sharing process must reciprocate with their resources to access the desired data.

Similar to data sharing, data exchanging takes place among n entities and is denoted by the list \(\mathbb {B}=(\beta _1, \beta _2, \beta _3,..., \beta _n)\). These exchanges can involve various entities, from individuals with vast databases to interactions between individuals and public organizations, and between public and private organizations. The datasets involved are represented as \(\mathbb {D}=( d _1, d _2, d _3,..., d _m)\). When data sharing occurs, each entity \(\beta _n\) can access the dataset \(d _m\) if they obtain the authority from another entity. However, in data exchanging, there is a defined set of goals or targets. These are represented as \(\mathbb {T}=( t _1, t _2, t _3,..., t _i)\). To realize the data exchanging targets of \(t _i\), an appropriate incentive schema should be in place to spur the targets to completion. Data sharing historically took place in centralized databases and the cloud. However, the trend is increasingly shifting towards decentralization. Table 1 elucidates the significant milestones in the evolution of languages used for data exchange.

Data exchange enables the transfer of data between various systems and organizations while maintaining its integrity and meaning, ensuring that no modifications or alterations are made to the content [35]. This process often involves incentives to foster participation. The data requesters can compensate the data owners through various means, including monetary rewards or alternative data resources. Several algorithms, drawing from fields such as economics and game theory, have been developed to determine the optimal compensation or reward in the data-exchanging scenario. A practical example of this mechanism in action is the platform Airbnb [36]. Airbnb, a peer-to-peer service for people to list, discover, and book accommodations around the world, embodies the essence of data exchange. Property owners list their homes for travelers to rent, essentially sharing their data (property details, availability, price, etc.) with potential guests. In return, they receive monetary compensation when travelers book their spaces. Simultaneously, Airbnb uses optimization and ranking algorithms to gauge the success of each property listing. Properties that fare well, receive positive reviews, or fit specific criteria are then prioritized and given more visibility in the platform’s search results, benefiting the homeowners further. This system of rewards, both in visibility and monetary compensation, exemplifies the principles of data exchange.

Fig. 1
figure 1

Lifecycle of data sharing and exchanging

Participants in data exchange might be hesitant to share their data if they believe the data they receive in return falls short of their expectations. A significant incident occurred in 2021 with the Microsoft Exchange Server data breach, in which attackers gained access to user emails and passwords, eroding trust in secure data exchange [37]. Participants might have believed that it wasn’t worthwhile to share their data without adequate incentives. Thus, establishing a fair data valuation and incentivizing participants to engage in the exchange becomes paramount. Within the sphere of data pricing, numerous factors require optimization. Challenges such as determining the right price point and allocating value based on data quality are pivotal. Consequently, the development of robust and precise pricing algorithms is integral to the success of data exchange.

2.3 Data sharing and exchanging life-cycle

The process of data sharing and exchanging hinges on four primary components: data creation, data storage, data access, and data privacy-preserving. The genesis of this process, data creation or collection, hinges on pivotal decisions regarding the nature, method, and volume of data collection. Central questions include: What kind of data should be collated? What are the optimal methods for its collection? How extensive should the data pool be? Once created, the data’s preservation requires both secure and efficient storage solutions. Data access, the subsequent phase, revolves around granting permissions to various stakeholders involved in the data sharing and exchange process. Meanwhile, data privacy-preserving is not merely an isolated component but an omnipresent factor throughout the data lifecycle, ensuring the integrity and confidentiality of shared data. The entire lifecycle of data sharing and exchange can be visualized in Fig. 1.

The distinction between data exchanging and data sharing lies in the transactional nature of the former. Data exchange embodies a two-way data exchange mechanism, characterized by a reciprocal trading process. Hence, in the context of data exchange, it becomes paramount to motivate multiple entities to actively engage in the exchange while ensuring that the process remains robustly secure. These considerations underscore the key research themes in this realm.

2.4 Incentive mechanisms

Incentive mechanisms have traditionally played a pivotal role in the realm of human resources management, acting as catalysts to drive employee motivation, performance, and overall achievement [38]. A notable example can be seen in Google’s work environment, which has garnered a reputation for being exceptionally gratifying. The tech giant has ingeniously embedded incentive-driven strategies into its human resource management framework. Through an intricate system of incentives, ranging from peer bonuses to performance-based rewards, Google has fostered an organizational climate ripe with trust. This ecosystem not only promotes collaboration and teamwork but also empowers employees within similar departments to synergize their efforts and aid one another [39]. As the digital landscape evolved, especially with the proliferation of the Internet of Things (IoT) and the ubiquity of big data, these incentive mechanisms have found their application extended to the domain of data science.

The surge in mobile device usage has catalyzed the development of a myriad of mobile crowdsensing applications. These tools harness the power of collective intelligence, leveraging mobile users to share data for various sensing tasks in a crowdsourced fashion. Challenge.gov stands as a testament to this trend—a digital platform where the public collaboratively addresses pressing issues faced by federal agencies. Through this platform, innovative solutions are crowd-sourced, facilitating more informed and effective governmental decisions [40]. Existing research classifies incentive mechanisms within crowdsensing applications into three primary categories: entertainment-based, service-based, and money-based [5]. Entertainment-Based Mechanisms: These are designed to pique user interest by integrating elements of fun and engagement. Specifically, they encourage participation through location-based mobile games. Such gamified mechanisms have been explored and discussed in various studies, highlighting their effectiveness in promoting user engagement in crowdsensing tasks [6,7,8].

  • Entertainment-based mechanisms: These are designed to pique user interest by integrating elements of fun and engagement. Specifically, they encourage participation through location-based mobile games. Such gamified mechanisms have been explored and discussed in various studies, highlighting their effectiveness in promoting user engagement in crowdsensing tasks [6,7,8].

  • Service-based mechanisms: Such incentives offer tangible service benefits to users in return for their participation, capitalizing on the mutual relationship where both the provider and the participant stand to gain. A prime example can be observed in GPS applications. Here, users, while benefiting from the service, also act as data providers. The underlying principle is that a collective effort from all users ensures a more refined and accurate service [9, 10].

  • Money-based mechanisms: Monetary rewards remain a tried-and-true incentive. Within the realm of crowdsensing, the intricacies lie in determining the appropriate pricing strategy and selecting winners. These components are pivotal and have piqued the interest of many researchers aiming to optimize and refine monetary incentive systems.

In summary, as the digital landscape grows more interconnected, the potential of mobile crowdsensing applications continues to expand. Harnessing this potential effectively necessitates the design and implementation of robust incentive mechanisms that cater to a diverse user base.

In recent times, the burgeoning field of blockchain technology has emerged as a pivotal solution for safeguarding data privacy. Numerous scholars have delved into the realm of incentive mechanisms within the blockchain environment. Broadly, these mechanisms can be categorized into two predominant types: those rooted in game theory and external incentives [41].

Consensus algorithms, intrinsic to blockchain operations, necessitate incentives to galvanize miners to compute the hash functions, subsequently facilitating the creation of new transactions. The overarching objective of achieving consensus within blockchain networks is to ensure a unanimous agreement among all participating nodes. This process empowers even the untrusted nodes, enabling them to select an individual or a cluster of nodes responsible for instigating new transactions. Various incentive strategies have been formulated in the blockchain context, including but not limited to, Proof of Work (PoW), Proof of Stake (PoS), and Zero-Knowledge Proof.

In recent times, the burgeoning field of blockchain technology has emerged as a pivotal solution for safeguarding data privacy. Numerous scholars have delved into the realm of incentive mechanisms within the blockchain environment. Broadly, these mechanisms can be categorized into two predominant types: those rooted in game theory and external incentives [41]. During the COVID-19 pandemic, federated learning was instrumental at UCSF in develo** AI models to predict the need for supplemental oxygen in patients, leveraging data across 20 hospitals without compromising patient privacy [42].

The intricate process underlying federated learning is illustrated in Fig. 2.

Fig. 2
figure 2

Federating learning process

Consequently, introducing incentives in federated learning becomes imperative to counteract potential challenges posed by selfish nodes and participants of suboptimal quality.

3 Existing data sharing and exchanging incentive mechanisms

This section categorizes the prevailing incentives in data sharing and exchanging into two distinct types: monetary and non-monetary incentives. As depicted in Fig. 4, these incentives structure the landscape of existing research. Notably, from our analysis, a synergistic approach combining both monetary and non-monetary incentives could be more effective in motivating participants to actively engage in the data-sharing and exchanging processes.

3.1 Monetary incentives

In the realm of computing, the emphasis predominantly falls on monetary incentives, as evidenced by a majority of the research in this domain. As illustrated in Table 2, we have encapsulated 24 notable studies from the computing sector. These studies provide insights into the performance metrics, types of mechanisms employed, applications, and optimization objectives of each respective paper. A discernible trend from our analysis indicates that game theory remains the quintessential algorithmic approach for designing incentive mechanisms. A majority of these works pivot around objectives of utility maximization and social cost minimization during the formulation of their optimization strategies.

3.1.1 Game theory-based incentives

Numerous game theory algorithms have prominently featured in the incentive mechanisms for data sharing and exchanging. According to a survey by Liang et al. [43], the dominant algorithms in this realm include the Stackelberg game, non-cooperative game, bargaining game, and the Vickrey-Clarke-Groves (VCG) game.

Table 2 Related works for monetary incentives

In the Stackelberg game, the decision-making process is divided into two periods. Detailed formulations of the game can be found in Machado’s work [60]. During the initial period, every node within the network selects its respective quantity, denoted as \(\mathcal {Q}_n\). The associated production cost is represented by \(\varsigma _n \mathcal {Q}_n\). For a scenario involving one leader and one follower in the Stackelberg game, the demand curve is defined as:

$$\begin{aligned} P(\mathcal {Q}_1+\mathcal {Q}_2)=a-b(\mathcal {Q}_1+\mathcal {Q}21) \end{aligned}$$
(1)

The total profit can be denoted as \(\prod _n (\mathcal {Q}_1+\mathcal {Q}_2)\), and can be calculated by:

$$\begin{aligned} \prod {_n}(\mathcal {Q}_1+\mathcal {Q}_2)=P(\mathcal {Q}_1+\mathcal {Q}_2)\mathcal {Q}_n-\varsigma _n \mathcal {Q}_n \end{aligned}$$
(2)

In the second period, the maximum profit or revenue can be defined as:

$$\begin{aligned} \max _{\mathcal {Q}_2}\prod {^2}=(P(\mathcal {Q}_1+\mathcal {Q}_2)-\varsigma )\mathcal {Q}_2=(a-b(\mathcal {Q}_1+\mathcal {Q}_2)-\varsigma )\mathcal {Q}_2 \end{aligned}$$
(3)

In the initial period, the maximum profit or revenue can be represented as:

$$\begin{aligned} \max _{\mathcal {Q}_1}\prod {^1}=(P(\mathcal {Q}_1+\mathcal {Q}_2)-\varsigma )\mathcal {Q}_1=(a-b(\mathcal {Q}_1+R_2(\mathcal {Q}_1))-\varsigma )\mathcal {Q}_1 \end{aligned}$$
(4)

The increasing demand for more efficient and advanced data-sharing and exchanging mechanisms has led researchers to explore various game theory models. The Stackelberg game, in particular, has been at the forefront of such explorations due to its effectiveness in handling hierarchical decision-making processes. A closer look at recent literature sheds light on its widespread application across multiple domains: Li et al. [13] ventured into the domain of WiFi-based indoor localization systems. Recognizing the challenges of constructing a radio map via conventional site surveys, they turned to crowdsourcing as a remedy. Mobile users were incentivized to contribute their indoor trajectories. Employing a two-stage Stackelberg game, the authors ensured the dual goals of maximizing mobile users’ utility while ensuring profitability for the crowdsourcing platform. ** the future of data sharing and exchanging mechanisms.

The Vickrey-Clarke-Groves (VCG) mechanism stands out in the realm of mechanism design for its truth-inducing properties, fostering participants to reveal their genuine valuations. This mechanism ensures an outcome that optimizes social welfare. In VCG, each winner’s payment \(\rho _i\) will be the difference between the total cost for the other when verifier i is not participating and the total cost for the others when verifier i joins. It can be defined as:

$$\begin{aligned} \rho _i=\sum _{{\nu _j}\ne {\nu _i}}\zeta _j(W^*_{-i})-\sum _{{\nu _j}\ne {\nu _i}}\zeta _j(W^*_{i}) \end{aligned}$$
(5)

Several research endeavors have adopted the VCG mechanism within the context of blockchain ecosystems. Notably, these studies predominantly center around the allocation of computational resources between miners and edge service providers. For instance, Jiao et al. [18] formulated an auction game between edge computing service providers and miners requiring computational resources. Through their proposed auction mechanism, they managed to optimize social welfare. Furthermore, their methodology ensures individual rationality, truthfulness, and computational efficiency. In another study, Gu et al. [19] leveraged the VCG auction mechanism to address issues related to storage transactions. Implementing their model on the Ethereum platform, they were able to demonstrate that their approach fosters secure, efficient, and cost-effective resource trading.

The VCG mechanism has also been embraced in a myriad of domains including edge computing, wireless networks, crowdsourcing, and crowdsensing, among others. For instance, in the sphere of mobile crowdsensing, Li et al. [20] leveraged the VCG mechanism. Their theoretical algorithms aimed to enhance the efficiency of platforms while making them more appealing for prospective participants. Similarly, Zhou et al. [21] pioneered a novel framework within the crowdsensing domain. Their methodology combined the rewarding potential of the VCG mechanism with edge computing to alleviate computational traffic and workload. Moreover, they integrated advanced deep learning algorithms, such as Convolutional Neural Networks (CNN), to sieve out spurious and irrelevant information that could be disseminated by inauthentic participants. Their empirical case study further reinforced the robustness of their proposed framework. Liu [64], venturing into the realm of ridesharing systems, harnessed the VCG mechanism to conceive a cost-sharing architecture. He meticulously devised two VCG-centric mechanisms tailored for both rudimentary and intricate scenarios. His model notably underscored the potential of minimizing societal costs. Lastly, Borjigin et al. [65] melded VCG algorithms into their innovative multiple-Walrasian auction mechanism, particularly for the valuation service of trees in the network function virtualization market. Their primary objective in utilizing the VCG mechanism was to accentuate and maximize societal effectiveness.

In non-cooperative games, players act independently, making decisions based on predictions of other players’ strategies and payoffs, with the aim of identifying a Nash Equilibrium [66]. Such games are characterized by four fundamental components: players, actions, strategies, and payoffs. Assume we have a set of players participating in the game denotes to \(\mathbb {P}=\{\rho _1,\rho _2, ..., \rho _n\}\). A set of strategies denotes \(\mathbb {S}=\{\phi _1,\phi _2, ..., \phi _m\}\), which represents how the player will act in every possible distinguishable circumstance. The payoffs will be the utility of each player, if the utility of player i denotes \(\mu _i(\phi _i, \phi _-i )\), then other players’ strategies will be \(\overrightarrow{\phi }_{-i}=\{\phi _1, \phi _2, ..., \phi _{i-1}, \phi _{i+1}, \phi _m \}\). To find the optimal utility of players, the player i’s strategy \({\phi _i}^*\) is the best response to the strategies specified for the other \(n-1\) players. The Nash Equilibrium can be defined as follows:

$$\begin{aligned} {\phi _i}^*= \mathop {\textrm{argmax}}\limits _{\phi _i}\mu _i(\phi _i, \overrightarrow{\phi }_{-i}) \end{aligned}$$
(6)

Zhang et al. [45] introduced a game-theoretic model tailored to enhance the outcomes of the non-cooperative equilibria observed in crowdsourcing applications. Their research identified a delicate balance between social welfare and non-cooperative equilibria. In response, they developed incentive mechanisms rooted in non-cooperative games, pinpointing an optimized solution that maximizes social welfare. Zhan et al. [14] highlighted that as the Internet of Things (IoT) continues to evolve, federated learning emerges as an adept solution to address issues related to network bandwidth, storage, and most pertinently, privacy. Yet, the federated learning landscape is devoid of robust incentive mechanisms, primarily due to the challenges posed by the reluctance to share information and the complexities of contribution evaluation. Addressing this, they introduced a two-tiered incentive mechanism, with the latter stage anchored in a non-cooperative game. This mechanism aimed to galvanize edge nodes, motivating them to more actively and efficiently participate in the training process. Hossain et al. [67] utilized a non-cooperative game approach to address the challenge of resource constraints within a vehicular edge computing setting. In their model, each vehicle autonomously devises its strategy, determining whether to offload a task to a multi-access edge computing server or a cloud server, with the objective of optimizing its benefits.

A bargaining game pertains to a scenario wherein players negotiate to decide the division of benefits derived from cooperation. An illustrative example of this is the negotiation between a seller and a buyer over the price of an automobile. There exists a set of players’ strategies denoted as \(\mathbb {S}=\{\phi _1,\phi _2, ..., \phi _m\}\). For any two players, \(\phi _i\) is the seller, and \(\phi _j\) is the buyer, they will determine the selling price \({\phi _i}^*\), the expected utility for the seller denotes as \({\mu _i}^*\). Similarly, the buyer will also determine his/her utility \({\mu _j}^*\). If \({\mu _i}^*>{\mu _j}^*\), there will be disagreement between two players, and the negotiations need to be continued. When \({\mu _i}^*\le {\mu _j}^*\), the bargaining game is performed, and the price strategy \(({\phi _i}^*,{\phi _j}^* )\) is the Nash Equilibrium of this game [43].

Recent research has delved into the application of bargaining games in various sectors: Magerkurth et al. [47] crafted a multi-stage bargaining game tailored for crowdfunding platforms. Their primary objective was to navigate the challenges of crowdfunding benefit allocation, with the ultimate goal of optimizing social welfare. In another study, Lu et al. [48] advanced an incentive mechanism that integrated a bargaining game. Recognizing the constraints of non-cooperative games, they introduced a two-sided rating protocol. Through systematic rating, they devised strategies anchored on intrinsic parameters, aiming for the pinnacle of social welfare maximization. Wang et al. [49] ingeniously melded a Nash bargaining game with deep reinforcement learning methodologies, focusing on enhancing communication in heterogeneous vehicular networks. The core of their approach lies in optimizing the network’s overall performance, striving for the zenith of total reward maximization. Kim [68], on the other hand, conceived a resource management model for pervasive edge computing infrastructure, founded on a bargaining game. He embarked on a comprehensive exploration of the allocation challenges related to computation and communication resources, offering solutions via his proposed model.

3.1.2 Demand and supply models based-incentives

The challenge of determining appropriate reward pricing in incentive mechanisms is perennial. A renowned economic model, known as the demand and supply model, offers insight into determining the price associated with data sharing and exchanging. The demand and supply model elucidates the interplay between data owners and data requesters. At a specific point, an equilibrium price emerges when the quantity demanded aligns with the supply. Such an equilibrium enables efficient resource allocation. Let’s consider the subsequent equations for demand and supply functions. In these equations, P symbolizes the price corresponding to each quantity:

$$\begin{aligned} Q_d=a-b*P \end{aligned}$$
(7)
$$\begin{aligned} Q_s=-c+d*P \end{aligned}$$
(8)
Fig. 3
figure 3

Demand and supply model

In a recent study by Ma et al. [51], a time and location correlation incentive mechanism was introduced for deep data collection in crowdsourcing networks. They established a metric termed “Quality of Information Satisfaction Degree” (QoISD) to assess the adequacy of collected sensing data. By designing two demand-based incentive mechanisms, they aimed to optimize the QoISD and the associated rewards. Simulations affirmed their method’s efficacy, reducing costs and enhancing QoISD. Sun et al. [69] proposed a dynamic digital twin-based incentive mechanism for resource allocation in aerial-assisted Internet of Vehicles. This two-stage algorithm adeptly handles fluctuating resource supply and demands, ensuring efficient resource scheduling and allocation. Meanwhile, Esfandiari et al. [70] leveraged demand-supply theory to counteract nodes’ selfish behaviors in disruption-tolerant networks, enhancing criteria such as delivery ratio, delay, dropped messages, and overhead ratio (Fig. 3).

3.1.3 Cost model based-incentives

The cost model allows for the determination of the final price of a product by taking into account the total production cost and adding the intended profit margin. When applied to incentive mechanisms, this model provides a means to establish the appropriate reward or price. Let the desired income be represented by \(\eta\), the total cost be \(\varsigma\), and a predefined profit percentage be \(\rho\). The relationship between the cost and income can then be expressed as follows:

$$\begin{aligned} \eta =\varsigma (1+\rho ) \end{aligned}$$
(9)

Cost models offer a straightforward and cost-effective approach when compared to other economic-based models. The implementation of a cost model as an incentive mechanism results in efficient computation due to its relative simplicity. However, it’s important to note that cost models have limitations as they tend to overlook elements like competition and replacement costs. They primarily consider internal factors while neglecting external ones, as highlighted in prior research [43, 71].

Cheng et al. [54] identified a challenge in the context of crowdsourcing platforms, particularly when these platforms sent location-based requests to workers. The challenge revolved around optimizing the assignment of workers to tasks. To address this challenge, they devised three effective heuristic methods: the greedy approach, g-divide and conquer, and cost model-based adaptive algorithms. Experimental results demonstrated the efficiency and effectiveness of these methods in maximizing workers’ rewards within a limited budget. Xue et al. [72] applied both public and private cost models for rational miners in a Bitcoin mining pool. They introduced a Budget Feasible Reward Optimization (BFRO) model aimed at maximizing the reward function while adhering to budget constraints. To solve the BFRO problem, they developed a budget-feasible reverse auction mechanism.

3.1.4 Competition model-based incentives

Competition-based models assist organizations in formulating their pricing strategies by taking into account the pricing strategies of their competitors. In contrast to the cost model, competition-based models consider an external factor: competition within the market. Prices or rewards are determined by assessing market information. In these models, participants establish their prices by benchmarking against similar tasks, aiming to align with a leader’s pricing decisions, which are then followed by others.

Dong et al. [55] employed a competition-based model to establish QoE-ensured pricing in mobile networks. They combined game theory and the competition model to depict social behavior and understand the relationships among devices, service organizers, and users. Damien et al. [56] highlighted the common implementation of cooperation and competition modes in crowdsourcing platforms. They introduced a hybrid model called “coopetition,” which blends both approaches. Their experiments demonstrated that the hybrid model outperformed the two traditional ones. Ghasemi et al. [73] designed a competition-based pricing strategy for the cloud market environment. Their experimental results showcased a significant increase in profits for providers compared to other pricing policies discussed in previous literature.

Fig. 4
figure 4

Related incentive mechanisms

3.2 Non-monetary incentives

Non-monetary incentives in previous research can be categorized into two main types: entertainment-based incentives and service-based incentives [77] employed a service-based incentive mechanism in their global crowdsourcing platform, named “mClerk.” Using their method, low-income workers could have more new employment opportunities. The users in mClerk both sent and received tasks via SMS. Huang et al. [82] highlighted security concerns in traditional cloud data management systems and introduced a secure multi-owner data sharing management scheme named “Mona.” This scheme utilized group signature and dynamic broadcast encryption techniques to enhance security. The majority of research in data management systems has concentrated on structured data and cloud-based systems. Before data sharing and exchange can occur, it is imperative to ensure high data quality during the data creation step. Data quality plays a critical role in determining the effectiveness and efficiency of data sharing and exchange processes. In recent years, an increasing number of researchers have considered data quality as a significant parameter when designing incentive mechanisms. For example, Yang et al. [83] observed that data quality was often overlooked in the mobile crowdsensing domain. To address this issue, they integrated quality estimation and monetary incentives into their model to support data sharing. Additionally, they employed an unsupervised learning approach to quantify data quality and implemented outlier detection techniques to filter out anomalous data. Similarly, Luo et al. [84] identified limitations in using data mining techniques to control data quality. They introduced a cross-validation approach to identify a validating crowd capable of verifying the contributions made by sensor data providers. Furthermore, they employed weighted oversampling methods and privacy-aware trust algorithms to enhance the services of mobile crowdsensing systems. However, it’s worth noting that many researchers continue to rely on traditional machine learning methods for data quality filtering.

Data processing and data transformation are crucial steps aimed at converting raw data into meaningful and structured information. When creating or collecting data, it’s essential to establish a standardized data structure for efficient big data management. This involves tasks such as data format conversion, data cleaning, and factor extraction, among others. To facilitate data creation and integration, unified data processing and transformation formats become essential. These unified data integration frameworks can significantly reduce the time spent on data wrangling and help save costs. For instance, Ma et al. [85] introduced a novel graph-based data integration framework built upon a unified conceptual model. They applied this framework to address a real-world refueling problem and demonstrated improved precision and recall results. Given the diversity of data types in the realm of big data, some researchers have developed data integration frameworks tailored for unstructured data. Williams et al. [86] designed an image data integration platform for bioimages sourced from various channels, including high-content screening, multi-dimensional microscopy, and digital pathology. They also established a computational resource for remote access to their system, enabling users to re-analyze the data. Nevertheless, unified data integration frameworks may still face challenges, such as data security concerns and process efficiency optimization.

4.2 Data storage

Within the data storage process, several key components play critical roles: data backups, data replication, data deduplication, and cloud storage. Data backups serve as a crucial means of ensuring data protection and mitigating costs in the event of data loss. Some organizations still employ tape backup as their method of choice for safeguarding against data loss. This involves storing data on magnetic media. However, it’s important to note that tape backups can be vulnerable to corruption. Even when organizations opt for cloud storage or other backup solutions, the possibility of disasters leading to system shutdowns remains a concern. In modern data management practices, a secure approach involves the combination of full backups and partial backups. This strategy enhances data protection and resilience against data loss scenarios. A full backup corresponds to a specific moment in time, involving the capture of a comprehensive system image, which is then stored on a secondary device. In contrast, partial backups encompass differential and incremental methods. However, regardless of the traditional backup strategies implemented, a persistent risk remains: the potential for system corruption [87, 88].

Data replication The key distinction between backups and replication lies in the accessibility of replicas, which are more readily available to production systems. Data deduplication is a vital data cleaning process in data storage, serving to mitigate data redundancy and optimize storage space utilization. The primary objective of data deduplication algorithms is to enhance the efficiency of databases by eliminating redundancies without compromising data accuracy or integrity. In recent research, there has been a significant focus on develo** secure data deduplication mechanisms. For instance, Fan et al. [89] introduced a hybrid data deduplication mechanism tailored for cloud storage systems, addressing security concerns associated with the deduplication process. Their experimental results demonstrated the effectiveness of their approach in resolving security issues within data deduplication. Similarly, Rashid et al. [90] proposed a two-level data deduplication framework designed for cloud storage systems. The framework comprised two tiers: the enterprise-level and the cloud storage provider-level. At the enterprise level, data deduplication was performed, and the deduplicated data was stored in the cloud. Subsequently, at the cloud storage provider level, duplicate data was systematically removed to optimize storage space while ensuring data security and control. The authors showcased the advantages of their framework in terms of security, control, space efficiency, and reduced storage costs.

Cloud storage and edge storage Data storage is an essential method for preserving data, and much research attention has been devoted to develo** incentive mechanisms for this purpose. Conventional data storage relies on established mechanisms for accessing multiple configurable resources. Over the past few decades, numerous researchers have dedicated their efforts to enhancing cloud storage systems through various incentive mechanisms. However, with the advent of 5G technology, the Internet of Things (IoT), and the proliferation of big data, cloud-based data storage has exhibited certain limitations. Cloud computing, in its current form, lacks some crucial functionalities required to cope with the surging volumes of big data effectively. These shortcomings include challenges related to low latency and jitter, ensuring high availability, and scalability. Consequently, several transformative changes are poised to impact our daily lives. Key questions arise, such as “Can services be delivered closer to end-users through distributed computing?” “Can your smartphone serve as your primary data repository?” “Can your vehicle monitor machine health, facilitate software updates, and identify real-time maintenance issues promptly?” “What if smart edge devices could offer deterministic latency and support time-sensitive applications while analyzing real-time and streaming data at the edge?” These questions present formidable challenges for data storage as we design incentive mechanisms for data sharing and exchange.

In response to these challenges, recent studies have focused on the development of edge data storage and processing solutions aimed at addressing the aforementioned questions. Ge et al. [91] investigated the data caching resource allocation problem in fog radio access networks environment. They employed a Stackelberg game to incentivize the data providers to participate in the resource allocation process. They applied the simple search method to solve the optimization problem that could optimize the data caching resource allocation. Alioua emphet al. [92] developed an incentive mechanism of edge caching for the Internet of vehicles system. Their incentive mechanism focused on the economic side of caching by considering the competitive cache-enablers market. They employed a Stackelberg game between the data provider and the multiple mobile network operators and found a Nash equilibrium to reduce the caching cost. However, we still have many opportunities to improve data storage in the data sharing and data exchanging process.

4.3 Data access

In the context of incentive mechanisms for data sharing and exchanging, “data access” is a broad concept that encompasses the authorization to access the data. This area comprises several critical data access components, including identity and authentication, access control, encryption, and trust management.

Identity and authentication is a term used to describe the process of granting different parties access to the data. In the past, authentication protocols were primarily designed for single-server environments, which are ill-suited for the new architecture of big data and IoT environments. Around 2015, an increasing amount of sensitive data, such as healthcare records, began to transition into digital formats. Consequently, many researchers began develo** more efficient authentication schemes to safeguard E-healthcare databases. For instance, both Wu et al. and Jiang et al. concentrated on devising three-factor authentication protocols to mitigate various types of attacks [93, 94]. Recognizing the limitations of single-server authentication schemes, some researchers began to explore the creation of inter-cloud identity management systems like OpenID and SAML, which offer Single-Sign-On (SSO) authentication capabilities.

Access control is employed to prevent unauthorized entities from accessing devices and sharing or exchanging data. Historically, the majority of research has been concentrated on designing access control systems for the cloud. However, as edge computing architectures have evolved, there have been relatively few developments in edge access control mechanisms. Yu et al. designed an access control system by leveraging techniques from various encryption schemes, establishing efficient fine-grained data access control [95]. Additionally, they introduced a novel framework for access control within the healthcare domain in a cloud computing environment [96].

Encryption has been a popular research topic for many years. However, traditional encryption methods like the Triple Data Encryption Algorithm (TDEA) and Triple Data Encryption (3DSE) have their limitations. They require devices to have prior knowledge of information recipients’ identities and share credentials, which may not be feasible in many data-sharing and exchanging scenarios where recipients are often unknown. To address these challenges, encryption methods tailored for data sharing and exchanging environments have been developed, providing solutions for scenarios where traditional algorithms fall short. For instance, Attribute-Based Encryption (ABE) is one such encryption algorithm that involves a key authority between a data sender and recipient [97]. This approach offers more flexibility and adaptability in complex data-sharing and exchanging systems.

Encryption methods have evolved to address the security needs of various computing environments, including centralized cloud servers and emerging edge paradigms. Researchers have proposed innovative encryption schemes to protect data in these diverse settings. In centralized cloud environments, Wu et al. combined hierarchical identity-based encryption (HIBE) with ciphertext attributed-based encryption (CP-ABE) to create an efficient encryption scheme for sharing confidential data [98]. Li et al. extended this approach to safeguard healthcare data on cloud servers, utilizing attribute-based encryption (ABE) techniques to encrypt patient’s personal health record (PHR) files [99].

With the advent of edge computing paradigms, encryption methods have been adapted to suit these environments. Alrawais et al. introduced an efficient key exchange protocol based on CP-ABE and digital signature techniques in fog computing environments, achieving improved performance in terms of confidentiality, authentication, verifiability, and access control [128, 129]. Yet, even these sophisticated frameworks are not without their constraints. Crucially, as we advance these systems, considerations surrounding data privacy and security cannot be sidelined. Moreover, there’s an evident merit in harnessing deep learning methodologies. By doing so, the framework could potentially auto-adjust its structure, accommodating the multifaceted nature of data.

Augmenting the data management system with capabilities for real-time data processing is another pivotal aspect. The rapid proliferation of big data and the IoT has ushered in an era where numerous applications are tethered to the immediacy of data sharing and exchange. Paradigmatic instances include autonomous vehicles [130], emergency fire response [131], and medical emergency services [132]. Yet, it’s evident that real-time data sharing in these sectors is fraught with limitations. Take, for instance, the unfortunate event of a vehicular accident. Present protocols necessitate a phone call, awaiting police intervention-a process that inadvertently prolongs the accident’s aftermath and could potentially delay critical medical interventions. Hence, it becomes paramount for the data management system to evolve, equip** itself with the agility to seamlessly handle real-time data streams.

6.3 Employ artificial intelligence algorithms to improve data quality

Data quality directly impacts the efficiency and accuracy of data sharing and exchange processes [133]. Leveraging artificial intelligence techniques to discern and eliminate inauthentic and subpar data is an invaluable research trajectory.

While many researchers have hitherto applied conventional machine learning and deep learning algorithms to this domain [134, 135], relying solely on these traditional methods to sieve out low-quality data can be resource-intensive. As a remedy, federated learning emerges as a promising paradigm. By adopting a distributed approach to deep learning, computation resources can be conserved. Additionally, the presence of counterfeit data undermines data quality, diminishing the efficacy of data sharing and exchange. Thus, devising mechanisms to detect such spurious data is an avenue worth exploring in future research. Beyond the conventional deep learning techniques, there’s potential in harnessing reinforcement learning. This can expedite model creation, obviating the need to amass training and testing datasets beforehand. Integrating anomaly detection systems into federated learning architectures can significantly bolster the integrity of data quality. Utilizing the distributed topology of the network, these systems are adept at pinpointing and segregating questionable data entries, thus enhancing the robustness of the dataset across the collective nodes. Moving beyond traditional deep learning methodologies, reinforcement learning presents a compelling alternative. It can streamline the model development process by eliminating the extensive requirement for precompiled training and testing datasets. Additionally, reinforcement learning algorithms are designed to adapt to evolving data trends autonomously, providing a scalable and adaptive solution for maintaining data quality amidst the complexities of expansive network environments.

6.4 Using distributed data storage techniques to ensure data security and privacy

In contrast to centralized data storage solutions, distributed storage methods have gained increasing prominence. Recently, blockchain has emerged as a widely researched approach for data storage. Blockchain, characterized as a decentralized, digital, secure, and transparent ledger for cryptographic data transactions, has begun to revolutionize numerous sectors [137, 138], cyber-physical systems [139], education [136], supply chain management [140], and crowdsourcing and crowdsensing [141], among others.

Figure 6 provides an illustrative depiction of how blockchain can enhance the data sharing and exchange paradigm [118]. The figure delineates n entities, represented as \(\mathbb {B}=(\beta _1, \beta _2, \beta _3,..., \beta _n)\). Each of these entities can assume the role of either a data provider or a requester. The datasets, denoted by \(\mathbb {D}\)=(\(d _1, d _2, d _3,..., d _m)\), constitute the content intended for sharing and exchange between entities. To safeguard data privacy throughout these operations, various encryption techniques, represented by \(\varepsilon\), are invoked. To spur entities’ participation, an incentive algorithm, \(\Gamma\), is integrated. This algorithm may encompass both monetary and non-monetary rewards. Considering the data storage and processing phases, a suite of distributed data storage and processing strategies can be implemented to bolster data privacy and efficiency. We advocate for the incorporation of blockchain and smart contracts as robust mechanisms for secure data storage.

Fig. 7
figure 7

Reinforcement learning in data sharing and exchanging

6.5 Design authentication and encryption mechanisms

Efficient and private authentication schemes [142, 143] have become paramount in the realm of data sharing and exchange. Historically, authentication protocols within the cloud ecosystem were primarily tailored to single-server environments. This model, however, is increasingly incongruent with the emergent architectures of 5G and the IoT [144], which champion distributed service environments. With the introduction of cloud data sharing in 2009, there was an uptick in user growth and a pronounced demand for shared services. This phenomenon led researchers to embark on the quest for robust trust and security authentications that would seamlessly link cloud users to services. Established frameworks like the SSL Authentication Protocol (SAP) [145] were often perceived as cumbersome and unintuitive by a vast swath of users.

In recognizing the constraints of singular server authentication systems, several scholars veered towards the development of inter-cloud identity management solutions. This journey saw the emergence of protocols such as OpenID [146] and SAML [147], both championing Single-Sign-On (SSO) authentication. Yet, these systems inherently hinge on third-party intermediaries, potentially ushering in unforeseen security vulnerabilities. Consequently, the crafting of efficient, private authentication schemes that resonate with a distributed service environment remains an ongoing academic challenge. In today’s data exchange ecosystems, the vast majority of IoT devices necessitate users to establish personal accounts, often requiring the divulgence of sensitive information. Thus, the dual challenge of guaranteeing user anonymity whilst ensuring efficient authentication becomes evident. Several burgeoning research opportunities have been identified:

  • Synergies: Most of the cutting-edge research delving into efficient, privacy-centric authentication [148, 149] is anchored in the domains of cloud computing and mobile cloud computing. As the locus of future data sharing and exchange is likely to shift towards edge computing, it is imperative to discern potential collaborations between mobile cloud computing and other edge paradigms.

  • Security vs. privacy trade-offs: In the process of devising novel authentication protocols, it becomes essential to strike an equilibrium between security and privacy. For instance, within the paradigm of lightweight authentication [150], the assurance of rigorous user anonymity takes precedence. Furthermore, for devices tethered to batteries, the nexus between energy conservation and security emerges as a captivating area of research.

Historically, the bulk of research efforts have been funneled into crafting access control systems [151] that seamlessly integrate with cloud computing. Contrastingly, research endeavors exploring access control mechanisms within the context of edge computing have been sparse. Thus, the development and implementation of pragmatic access control algorithms tailored to the edge environment stand out as a promising research trajectory. Given the anticipated surge in edge devices in the near future, a new set of challenges centered on efficient access model identification and optimization of finite resources come to the fore. This is especially pertinent for devices that are battery-dependent. In the nascent stages of trust management within the realm of cloud computing, Service Level Agreements (SLAs) [152] emerged as the foundational technique. However, these were not universally consistent across cloud providers, leading to potential trust issues. Most of the scholarly endeavors in trust management have historically been rooted in centralized services. Yet, come 2016, a modest shift was observed with more research gravitating toward distributed computing services. Trust management, when viewed through the lens of distributed computing within data sharing and exchange, emerges as a potential research frontier.

Regarding encryption mechanisms integral to data sharing and exchange processes, it’s evident that a majority of these mechanisms are tailored for cloud-based data sharing and fog data sharing ecosystems. However, the landscape is replete with opportunities to architect more efficient encryption algorithms that dovetail with mobile edge computing and cloud computing paradigms. While a significant chunk of the academic community is engrossed with the CP-ABE algorithms [153], alternative encryption strategies that seamlessly integrate with CP-ABE, such as fully homomorphic encryption (FHE) [154] and ciphertext policy attribute-based proxy re-encryption (CP-ABPRE) [155], also hold promise. As a result, refining encryption mechanisms in the data sharing and exchange space remains a paramount academic pursuit.

6.6 Use reinforcement learning to solve the un-shared decision problem

In data sharing and exchanging processes, it’s commonplace for multiple nodes to collaboratively complete a task. For instance, within the federated learning [183].

7.5 Automotive industry

The advent of data sharing and exchange has been instrumental in advancing the automotive sector, especially in the realm of autonomous vehicle technology. The capability for real-time data exchange is essential, enabling vehicular and infrastructural intercommunications to enhance safety and traffic management. Notwithstanding the advantages, the integrity of Vehicle-to-Everything (V2X) communication systems is of paramount concern. It is imperative to establish robust incentive mechanisms to motivate all parties, including vehicle owners and manufacturers, to contribute data, while concurrently ensuring stringent cybersecurity protocols. Strategic partnerships between automakers and cybersecurity enterprises have culminated in the fortification of V2X systems, with Tesla and BMW emerging as pioneers in this technological integration [184].

Nevertheless, the exchange of data within the automotive industry raises significant privacy concerns. Such data could potentially disclose an individual’s location, daily patterns, and private conversations, especially with the presence of camera recording devices in vehicles. It is, therefore, essential to implement stringent data security protocols that bolster user confidence in sharing their information. Furthermore, sophisticated incentive algorithms can be employed to refine data sharing and exchange systems, ensuring the automotive industry’s advancement without compromising personal privacy.

7.6 Financial services

Data sharing has revolutionized the financial sector, enabling banks and fintech entities to offer bespoke services. For example, Plaid [185] is a platform to link bank accounts to financial apps, streamlining transactions, and enhancing user experiences. Similarly, Yodlee [186] offers data aggregation and analytics services, providing insights to both consumers and financial institutions. Nonetheless, the sector is navigating through a labyrinth of rigorous data privacy and security regulations. Incentive-driven data-sharing frameworks have the potential to catalyze the secure exchange of data, aligning with regulatory mandates such as GDPR and CCPA. The drive towards open banking, propelled by API technology, has facilitated the creation of novel applications that elevate service delivery, ranging from unified financial interfaces to advanced fraud detection mechanisms [187].

However, breaches of financial records pose considerable risks. Stakeholders are often reticent to share financial data without assurance of robust security measures. Additionally, within financial data-sharing platforms, participants may also be competitors from different institutions, naturally cautious about disclosing customer data without substantial incentives and algorithms to safeguard their clients’ information. Consequently, incentive-based data sharing and exchange platforms are of paramount importance in the financial sector, balancing competitive interests with collaborative imperatives.

8 Conclusion

In this comprehensive survey, we explored various incentive mechanisms and optimization algorithms related to data sharing and exchanging, offering foundational definitions and related concepts. We segmented the lifecycle of data sharing and exchanging into four distinct parts, presenting in-depth insights on associated works within each category. Among the challenges identified in the design of incentive mechanisms, two primary concerns stand out in the majority of incentive-based applications: the challenge of motivating different users, especially competitors, to engage in data sharing and exchanging; and the imperative to protect sensitive user data. Addressing the former, combining both monetary and non-monetary incentives appears to be an effective approach to stimulate user participation in the sharing process. For ensuring data security, the integration of tailored encryption algorithms and the use of distributed data storage methods, such as blockchain and federated learning, emerge as sound strategies. In scenarios where data quality is paramount, deep learning presents a potential solution to both identify fake users and anticipate user behavior. In our rapidly evolving digital landscape, the crafting of trustworthy, efficient, and economical incentive mechanisms for data sharing and exchanging holds significant importance across numerous domains.