Deep reinforcement learning-based approach for rumor influence minimization in social networks

Jiang, Jiajian; Chen, **aoliang; Huang, Zexia; Li, **anyong; Du, Yajun

doi:10.1007/s10489-023-04555-y

Deep reinforcement learning-based approach for rumor influence minimization in social networks

Published: 04 April 2023

Volume 53, pages 20293–20310, (2023)
Cite this article

Download PDF

Applied Intelligence Aims and scope Submit manuscript

Deep reinforcement learning-based approach for rumor influence minimization in social networks

Download PDF

Jiajian Jiang¹,
**aoliang Chen ORCID: orcid.org/0000-0002-8201-9631^1,2,
Zexia Huang¹,
**anyong Li¹ &
…
Yajun Du¹

1733 Accesses
1 Altmetric
Explore all metrics

Abstract

Spreading malicious rumors on social networks such as Facebook, Twitter, and WeChat can trigger political conflicts, sway public opinion, and cause social disruption. A rumor can spread rapidly across a network and can be difficult to control once it has gained traction.Rumor influence minimization (RIM) is a central problem in information diffusion and network theory that involves finding ways to minimize rumor spread within a social network. Existing research on the RIM problem has focused on blocking the actions of influential users who can drive rumor propagation. These traditional static solutions do not adequately capture the dynamics and characteristics of rumor evolution from a global perspective. A deep reinforcement learning strategy that takes into account a wide range of factors may be an effective way of addressing the RIM challenge. This study introduces the dynamic rumor influence minimization (DRIM) problem, a step-by-step discrete time optimization method for controlling rumors. In addition, we provide a dynamic rumor-blocking approach, namely RLDB, based on deep reinforcement learning. First, a static rumor propagation model (SRPM) and a dynamic rumor propagation model (DRPM) based on of independent cascade patterns are presented. The primary benefit of the DPRM is that it can dynamically adjust the probability matrix according to the number of individuals affected by rumors in a social network, thereby improving the accuracy of rumor propagation simulation. Second, the RLDB strategy identifies the users to block in order to minimize rumor influence by observing the dynamics of user states and social network architectures. Finally, we assess the blocking model using four real-world datasets with different sizes. The experimental results demonstrate the superiority of the proposed approach on heuristics such as out-degree(OD), betweenness centrality(BC), and PageRank(PR).

Controlling Rumor Cascade over Social Networks

HISBmodel: A Rumor Diffusion Model Based on Human Individual and Social Behaviors in Online Social Networks

Minimizing Influence of Rumors by Blockers on Social Networks

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Online social media platforms such as Facebook, Twitter, and WeChat have become essential for individuals to interact, access data, and share posts. Nevertheless, social media has become a convenient platform for the rapid spread of false information, rumors, and terrorist statements [1]. This phenomenon poses a significant challenge to society and the online media supervision. A classical rumor event occurred during in the Fukushima earthquake in Japan (2011), which serves as a stark reminder of the potential consequences of a nuclear leak [2]. Ionized salts were believed to protect the body from nuclear radiation. Hence, many Chinese consumers bought salt immediately and several supermarkets purchased salt, causing widespread panic and confusion among the public. Another example is the belief concerning COVID-19 that ingesting pure alcohol could eradicate the virus in an infected body, which led to 800 fatalities in Iran and an additional 5876 hospitalizations for methanol poisoning [3]. These cases demonstrate that public opinion is adversely affected by the spread of malicious rumors, disrupting the regular social order and weakening government credibility [4]. It is important to take proactive measures to minimize the negative impact of network rumors once they are discovered on social media.

Rumor control is essential for providers of social platform services to supply accurate and truthful information, preventing rumors from spreading further and potentially causing more harm. The techniques for blocking rumors discussed in previous research can be divided into three categories:

Control rumor propagation by blocking nodes [5,6,7,8,9,10,11]: The purpose of these approaches is to reduce the spread of rumors by identifying influential nodes in a social network and blocking them when rumors are spread;
Control rumor propagation by blocking key edges [12,13,14]: These techniques restrict rumor propagation by obstructing a particular set of edges that are useful for rumor propagation;
Clarifying rumors by spreading the truth [15,16,17]: The assumption behind these approaches is that once individuals come to understand the truth, they will no longer believe the rumor. Their primary idea is to propagate the truth by identifying a set of nodes that users can trust.

Previous studies have shown that limiting the influence of key users in the dissemination of rumors can be an effective way of controlling rumor propagation. However, these works consider blocking a rumor as a static process and utilize greedy techniques to solve it. They do not consider how blocking nodes can influence rumor propagation after multiple cycles. This paper investigates a new issue in minimizing the influence of rumors on social networks, referred to as the dynamic rumor influence minimization problem, to block rumor propagation.

The process represented by an independent cascade model is divided into multiple time steps in order to ensure the comprehensibility of the rumor propagation process. At every time step, our aim is to discover the appropriate group of message-blocking individuals (blockers), indicated by B, comprising k members. The messages sent by the blockers are filtered or blocked to prevent the propagation of rumors from the blockers to other nodes, thereby forming the basis of rumor control.

Our strategy for addressing the DRIM challenge involves two components. First, a static rumor propagation model (SRPM) is developed based on rumor popularity and an independent cascade pattern. Next, a dynamic rumor propagation model (DRPM) is constructed by incorporating rumor popularity as a dynamic variable that evolves over time steps. Second, we propose a rumor blocking model using deep reinforcement learning technologies, which can select the most suitable blockers to control rumor propagation in interacting with the DRPM. Finally, the experimental results demonstrate that the models learned by deep reinforcement learning achieve better results in a variety of situations. The main contributions of this paper can be summarized as follows:

We formally introduce the dynamic rumor influence minimization (DRIM) problem,which incorporates dynamic changes resulting from rumor propagation in social networks better than its predecessor, the static RIM problem.
The popularity of rumors is determined according to the characteristics of information propagation in a social network. This paper presents two types of rumor propagation models: a static model (SRPM) that assumes that popularity remains constant and a dynamic model (DRPM) that considers changes in popularity. The models obtained from this study may be beneficial in simulating real-world rumors.
We propose a deep reinforcement learning-based rumor blocking model to control the dissemination of rumors. The model has the ability to modify the control policy depending on the state evolution of a social network. Analyzing the blocking model can yield insights into rumor control by providing a dynamic perspective on network evolution.

The rest of this paper is organized as follows. Section 2 reviews the work related to rumor influence minimization and reinforcement learning. Section 3 introduces the preliminaries of social networks, rumor propagation models, and reinforcement learning. Section 4 formalizes the dynamic rumor influence minimization problem, and its solution is provided in Section 5. The experimental results and the conclusion are reported in Sections 6 and 7, respectively.

2 Related work

In this section, we examine existing research concerning the influence minimization of rumors and the application of reinforcement learning.

2.1 Rumor influence minimization

A great deal of research has been conducted on ways to reduce the influence of rumors. The first serious exploration of the influences between users in a social network was made in the work of Domingos et al. [18]. Kempe et al. [19] conceived of viral marketing as an optimization problem, referred to as influence maximization (IM). Inspired by the influence maximization problem, Fan et al. [5] explored the opposite problem, least-cost rumor blocking. They attempted to identify a minimum set of nodes that would act as protectors. Protectors are actively involved in limiting the negative impact of a rumor, i.e., reducing the number of individuals affected by the rumor. User experience on minimizing rumor influence was studied by Wang et al. [6] who provided a method for addressing IM issues while maintaining a high level of user experience. Among the various propagation models proposed in rumor analysis, Adil et al. [16, 17] investigated the problem of multiple-rumor dissemination in a social network and offered the HISBM model to address this issue. The dynamic programming approach of Yan et al. [7] described how to solve the rumor influence minimization problem in a tree network. This paper proposes a reinforcement learning-based approach to reduce the influence of rumors, which is distinct from previous methods. This approach can adjust its blocking strategy dynamically in response to the propagation of rumors in a social network, and the results of this process are used to optimize (refine) the strategy.

2.2 Deep reinforcement learning

Recent advances in computer vision and natural language processing have followed the improvement of deep learning techniques. Furthermore, deep learning has made achievements in reinforcement learning challenges, such as making remarkable advances in the game of Go [20] and Atari games [21]. Reinforcement learning is a type of machine learning where an agent learns to interact with its environment to maximize a reward. This learning process allows the agent to improve its performance over time. Unlike other types of machine learning, reinforcement learning involves using a reward function, which helps the agent determine the value of its actions and guides its decision-making process. Reinforcement learning algorithms can generally be divided into two categories:

Value function-based approaches. These approaches are distinguished by optimizing the strategy while preserving a value system. An example is the Q-learning algorithm proposed by Watkins et al. [22]. The algorithm of Mnih et al. [21] outperformed all previous algorithms by combining Q-learning with deep learning in their study of Atari games. The authors referred to the improved algorithm as a deep Q-networks (DQN), and the stability and efficiency of the algorithm were improved by using freezing target networks and experience replay.
Policy-based approaches. These approaches directly model and optimize strategies, that do not require preserving a value function. Similar work on policy-based approaches was undertaken by Williams et al. [23] who collected trajectories by random walks and proposed the REINFORCE algorithm. Policy-based training is characterized by a wide variation in trajectory, which is challenging to train. Silver et al. [24] developed a deterministic policy gradient (DPG) algorithm. Lilicrap et al. [25] employed a DQN to estimate the values of functions based on the DPG algorithm, thereby producing the deep deterministic policy gradient (DDPG) algorithm.

The combination of value functions with an explicit representation of a policy may generate an actor-critic method. The value function is used as a baseline to calculate policy gradients. Actor-critic methods differ from the baselines only in that they employ a learned value function.

Motivated by the reinforcement learning research on influence maximization [26], we pioneer the use of reinforcement learning to solve the influence minimization problem. Compared with the previous RIM methods, the reinforcement learning-based method achieves more successful outcomes in a variety of scenarios.

3 Preliminaries

This section briefly introduces three concepts and their related definitions: social networks, rumor propagation models, and reinforcement learning.

3.1 Social networks

A social network is usually represented by a directed graph $G = (\mathcal {V}, \mathcal {E})$, where the set $\mathcal {V}$ of nodes and the set $\mathcal {E}$ of edges denote the users and the relationships between users (e.g., following or being followed), respectively. Figure 1 illustrates the social network of the Zachary’s karate club, where different-colored nodes represent distinct communities. The size of each node indicates its influence within the community. Nodes with a strong influence have a prominent role in spreading rumors. It has been observed that nodes with more connections have a stronger capacity to spread information. Consequently, nodes with a higher degree of influence generally tend to be more influential.

An edge $(u,v) \in \mathcal {E}$ in a rumor propagation scenario represents the fact that user v follows user u. As a result, user u is permitted to share a rumor with user v. Let p_uv ∈ [0,1] denote the probability that node u activates v, i.e., the probability that the rumor is passed from user u to user v. Specifically, we have p_uv = 0 when edge $(u,v) \notin \mathcal {E}$.

3.2 Rumor propagation models

With the development of deep learning, people have more in-depth research on social networks [27,28,29]. Rumors share many similarities with ordinary information regarding how they spread. Most rumor propagation solutions can be classified as a particular type of information propagation models. Models found to simulate rumors have been explored in several studies [6, 16, 17, 30]. Many of them recognize the critical role played by linear thresholds and independent cascade patterns.

Linear threshold models are characterized by a predetermined threshold 𝜃 for each node. Node activation occurs when the influence of the neighbors of the node exceeds the fixed value 𝜃. Activation will continue until no inactive nodes can satisfy the threshold condition. Contrary to linear threshold models, with fixed threshold parameters, independent cascade models rely on a certain probability for information propagation. Propagation can typically be depicted as node state distributions represented in discrete time steps starting at an initial time zero (t = 0). Let S_u(t) represent the activation state of a node u at a moment t in time. Then, node u is active or inactive at moment t if S_u(t) = 1 or S_u(t) = 0, respectively. A node that is activated at moment t_i will activate its inactive neighbor u at moment t_i+ 1 with a probability p_uv. The propagation process simulated by independent cascade patterns terminates if no node has been activated at the moment t_i. In this paper, independent cascade models are applied as our model fundamentals to incorporate rumor propagation characteristics.

An independent cascade model is illustrated in Fig. 2 to represent the propagation of rumors. The user states of a network can be divided into three categories: uninfected, infected, and activated. An uninfected node has not been affected by the rumor. An infected node has already been exposed to the rumor, but is unable to propagate it. An activated node has just been affected by the rumor and can spread it. Each edge has a weight value that indicates the probability of spreading the rumor along that edge. For example, a weight of 0.7 on edge (1,3) implies that the probability of rumor propagation from node 1 to node 3 is 0.7. Figure 2(a) shows the situation at the initial point of rumor propagation with node 1 as the original node. According to Fig. 2(b), the rumor spreads from node 1 to node 3, but not to node 2. A node in an independent cascade model has only one opportunity to activate its adjacent nodes. Therefore, at time step 2, node 1 switches from the active to the infected state, thereby losing its ability to activate other nodes. Instead, node 3 continues to activate nodes 2 and 4. The sole active node 6 has no successor that can be activated at the last time step, as shown in Fig. 2(d). Eventually, the propagation process ends.

3.3 Reinforcement learning

Recent developments in AI have heightened the need for reinforcement learning [31,32,33]. Reinforcement learning involves adjusting strategies to maximize expectations based on continuous interaction with the environment, consisting of two main objects: agents and the environment. Agents can recognize any changes in their environment and act accordingly. The environment reacts to the agent’s actions and changes its state, which also offers feedback to the agent as the rewards. In addition to the agent and environment, there are some other crucial components.

States: The environment description at a moment is called a state, denoted by s, which refers to the discrete situations of rumor spreading in a social network.
Actions: The node blocking behavior of an agent is referred to as an action, denoted as a.
Policies: A policy in rumor analysis is a function, denoted by π, referring to the agent’s behaviors, which determines whether a certain action a was taken by the agents in a particular state s. For example, the policy π(s) = a is utilized to represent an action a that decide which nodes should be blocked from spreading the rumor in state s.
Markov Decision Processes: “Markovianity” is the property that a “future” state is independent of the “past”. Markov decision processes are stochastic processes that exhibit this property. Almost all the reinforcement learning problems can be formulated as Markov decision processes. Figure 3 provides a visual representation of the interaction between agents and their environment in MDPs. Every interaction can be divided into three steps: (1) an agent perceives state s_t and reward r_t from the environment; (2) the agent takes action a according to state s_t; and (3) the interaction updates the state s_t and the reward r_t to s_t+ 1 and r_t+ 1, respectively. The interaction repeated multiple times to form a trajectory τ.
$$ \tau = s_{0}, a_{0}, r_{1}, s_{1}, a_{1}, \cdots, s_{t} $$
(1)
Value Function: The future reward that the agents can expect to receive from state s is represented by the value function, which is denoted by V_π(s). Value functions can be employed to evaluate the benefits and drawbacks of strategies. The sum of future rewards at different time steps is denoted as:
$$ R_{t} = r_{t+1} + \gamma r_{t+2} + {\cdots} = \sum\limits_{k=0}^{T} \gamma^{k} r_{t+k} $$
(2)
where γ denotes the discount factor. Similarly, we can define the state-value function Q_π(s,a) that gives the expectation R of future rewards according to state s and action a.

One of the breakthroughs in reinforcement learning was the development of Q-learning [22], defined by

$$ \begin{array}{@{}rcl@{}} Q(s_{t}, a_{t}) \leftarrow Q(s_{t}, a_{t}) &+& \alpha[R_{t+1} + \gamma \mathop{\max}\limits_{a}Q(s_{t+1},a) \\&&{\kern30pt}-Q(s_{t}, a_{t})] \end{array} $$

(3)

The Q-learning approach learns an action-value function Q that closely resembles the optimal action-value function Q^∗. Algorithm 1 shows the process of Q-learning. The action a represented in line 5 of the algorithm with the maximum value of Q must be determined, making Q-learning unsuitable for environments with continuous states or actions.

With the rise of deep learning, the concept of deep reinforcement learning has also been proposed. Mnih et al. [21] proposed the deep Q-learning (DQN) algorithm, which approximates Q^∗(s,a) by using a neural network Q_𝜃(s,a) with parameter 𝜃. The proposal of the DQN solves the problem that Q-learning cannot deal with continuous state and action space. Figure 4 shows how the DQN model plays an Atari game.

4 Problem formulation

This section describes the rumor influence minimization (RIM) problem in two respects: the dynamic rumor propagation model (DRPM) and the formalization of RIM problems based on the dynamic rumor propagation model.

4.1 Dynamic rumor propagation model

Many studies that concentrate on rumor propagation adopt a fixed propagation probability p_uv. The probability of information propagation in a real social network generally varies with the evolution of time and the number of participants. Hence, this study proposes a dynamic rumor propagation model (DRPM), that can calculate and update the probability p_uv of rumor propagation in a dynamic manner. Three main factors influence p_uv.

(1)
Credibility of spreaders. Rumors are more likely to be believed if the spreader has credibility. For example, online social networks tend to reward bloggers with a higher number of followers who post more reliable information. This study measures the credibility of a spreader by the number of followers for each user.
(2)
Probability that an inactive node will believe a rumor. Typically, a small percentage of social network data is devoted to rumors. Rumors may not be noticed by users with more following relationships due to their limited interaction time. A higher number of bloggers followed by a user can expand the sources of information to enable better judgement of rumors. Therefore, the more users are followed by an inactive node, the lower the probability that node will ultimately believe a rumor.
(3)
Popularity of rumors. Users are more likely to believe rumors that are popular on social networks. The mechanism of social platforms will enable people to focus their attention on new hot topics. The popularity of these topics is thus likely to change dynamically. As a result, rumors become popular when enough people share information about them.

The present study first describes a static rumor propagation model (SRPM) that combines factors (1) and (2), where the probability p_uv is calculated as follows:

$$ p_{uv} = \frac{\alpha * \log(1 + \textup{OUT}(u))}{\alpha * \log(1+\textup{OUT}(u)) + \beta * \log(1+\textup{IN}(v))} $$

(4)

where α and β are the balance coefficients satisfying α,β ∈ (0,1) and α + β = 1. The notation OUT(u) represents the out-degree of node u. The influence of spreaders can be calculated using the function $\log (1 + \textup {OUT}(u))$. This formula avoids the influence of extreme values better than calculating influence directly in terms of out degrees. For example, there may be bloggers with millions of followers on a large network. The probability of rumor spreading by these bloggers will be very close to 1 due to the linear growth of influences. Similarly, we use IN(v) to represent the in-degree of node v and apply $\log (1+\textup {IN}(v))$ to denote the probability that user v receives and believes a rumor posted by user u. According to the balance of parameters α and β, the representation of p_uv can be simplified as follows.

$$ p_{uv} = \frac{\alpha*\log(1+\textup{OUT}(u))}{\alpha*\log(1+\textup{OUT}(u)) +(1-\alpha)*\log(1+\textup{IN}(v))} $$

(5)

The static parameter α is used to adjust the distribution of the propagation probability p. Unfortunately, it is difficult to simulate the changes in popularity using a static parameter. Hence, we replace the static α in an SRPMs with dynamically varying popularity α^t, i,e, the popularity given by (6) at moment t.

$$ \alpha^{t} = \frac{\lambda * \vert A^{t} \vert +c_{1}}{\vert I^{t} \vert +c_{2}} $$

(6)

where c₁ and c₂ are constants used to smooth the change in popularity. The parameter λ is a scaling factor used to control the size of the dynamic popularity affected by the current number of infected users. |A^t| and |I^t| denote the numbers of activated nodes and rumor-infected nodes at moment t, respectively. The static model (SRPM) updated by (6) is called the dynamic rumor propagation model (DRPM). The popularity increases when the number of activated nodes at moment t increases significantly compared to historical moments. Over time, the popularity α^t decreases due to the increase in |I^t| until the end of the propagation process. According to (4) and (6), the propagation probability $p_{uv}^{t}$ in DRPMs is formalized as follows:

$$ p_{uv}^{t} = \frac{\alpha^{t-1} * \log(1+\textup{OUT}(u))}{\alpha^{t-1} *\log(1+\textup{OUT}(u)) + (1-\alpha^{t-1}) * \log(1+\textup{IN}(v))} $$

(7)

4.2 Dynamic rumor influence minimization

DRPMs have been established to simulate rumors dynamically. A systematic understanding of how DRPMs contribute to rumor influence analysis is still lacking. This section introduces the dynamic rumor influence minimization problem based on the proposed DRPMs.

Definition 1

Blockers: If an inactive node u is selected as a blocker at time step t, it will not be activated at time step t + 1.

Definition 2

Dynamic Rumor Influence Minimization (DRIM): DRIM for a social network $G=(\mathcal {V}, \mathcal {E})$ aims to find the blocker set B^t (containing k blocker nodes) at each moment to minimize the final number of users infected by the rumor |I^T|, where T is the moment when rumor propagation terminates. The DRIM problem can be formalized as

$$ B^{*} = \mathop{\arg\min}\limits_{B} \vert I^{T} \vert $$

(8)

where B is the sequence consisting of {B¹,B²,⋯ ,B^t}.

Let the positive integer k denote the blocker budget, which is the number of blockers allowed to be selected at each time step. Figure 5 reveals the dynamic rumor influence minimization process for a social network when the blocker budget equals one. The initial state of the network is shown in Fig. 5(a). Suppose that node 1 is the only seed at the initial state. Then, if node 2 is selected as the blocker at the initial time step, it will remain unaffected, i.e., B¹ = {2}, while the rumor activates nodes 4 and 5. The rumor-blocking behavior determined by selecting a blocker (K = 1) at time step 2 is similar to those represented in Fig. 5(a). After choosing node 6 as the blocker at time step 2, i.e., B² = {6}, the rumor is prevented from spreading from node 5 to node 6. Node 8 will be activated by the rumor diffusion from node 4. Finally, node 9 is selected as the blocker at time step 3, which stops the rumor propagation from node 8. As a result, there is no active node, so the propagation process is complete.

5 Methodology

Recently, investigators [6, 16, 17] have examined the effects of survival theory on likelihood calculation of the nodes that are activated during each time step. One of the main disadvantages of the survival theory is that it ignores the implications of blocked nodes for the future. The main advantage of utilizing reinforcement learning in the selection of blocking nodes is that it can predict the results after multiple rounds of propagation, allowing us to choose the optimal node for blocking according to the final rumor propagation results. To this end, we propose the novel model of reinforcement learning for dynamic blocking (RLDB), which obtains excellent performance by considering the role of blockers in an integrated manner. The workflow of our proposed model can be divided into two core processes: training the model and utilizing the trained model to identify blockers. The first stage involves training the model, which includes gathering and analyzing data, initializing parameters, and develo** Algorithm 3. The second stage involves utilizing the trained model to select blockers using Algorithm 2. Using this workflow, our proposed model can accurately and efficiently identify blockers, thus allowing for the solution of the RIM problem.

5.1 Reinforcement learning model for blocker selection

The primary difficulty encountered when using an RLDB model is determining the effective blocking strategies. Figure 6 illustrates how the RLDB model selects blockers when the blocker budget satisfies k = 1. The selection process can be represented by the following three steps:

(1)
Determine which nodes in a given social network can potentially be activated in the upcoming time step and add them to the candidate set C. For example, the candidate set is C = {4,6} in Fig. 6.
(2)
Calculate the future rewards R of blocking each candidate in the candidate set C by deep the Q-network, respectively.
(3)
Take the action with the greatest future rewards.

The algorithm for choosing the blockers of the RLDB model is outlined in Algorithm 2. In particular, between Lines 7 and 10 in Algorithm 2, we employ a neural network to predict the number of individuals influenced by the rumor after we have selected a single blocker. Predicting and selecting multiple blockers simultaneously drastically decreases the amount of computation involved and has a negligible impact on performance.

5.2 Parameter learning

Figure 7 depicts the overall framework of the deep Q-network (DQN) [21] with experience replay and objective function freezing. An application of deep Q-learning with experience replay will be explored in this section. This learning methodology is characterized by the fact that the agent’s experience at each time step is stored in a replay memory for parameter updates. The loss function for model training can be represented as:

$$ \mathcal{L}(s_{t},a_{t},s_{t+1} \vert \theta) = (r^{t} + \max_{a}\hat{Q}_{\theta^{-}}(s_{t+1}, a_{t+1}) - Q_{\theta}(s_{t}, a_{t})) $$

(9)

where r^t = −|A^t| and 𝜃⁻ are parameters of the target network $\hat {Q}$. Similarly, the parameter 𝜃 pertains to the strategy network Q. The policy network, represented by Q_𝜃, is employed by the DQN to make decisions. For each time step $T^{\prime }$, the value of 𝜃 is copied to 𝜃⁻. 𝜃⁻ stays unchanged at other times. Hence, the sole focus of this optimization process is to optimize 𝜃.

The parameter training for the DQN is demonstrated through Algorithm 3 which is based on (9) and utilizes experience replay. It is important to recognize that there are two neural networks, i.e., $Q_{\theta ^{-}}$ and Q_𝜃, in the DQN, where 𝜃⁻ and 𝜃 are the parameters of the neural networks. At every time step $T^{\prime }$, we set b to be the same as a. By utilizing this approach, the target network can be trained more effectively and reach a steady state more quickly.

6 Experiment

This section evaluates the effectiveness of the developed rumor influence minimization method for rumor control under a particular DRPM. First, we provide an overview of the datasets and experimental setting. Second, a thorough assessment is performed by examining the outcomes of the experiments and interpreting them from various perspectives. Finally, we compare reinforcement learning for dynamic blocking with other baselines. We implemented the experiment using PyTorch as the fundamental deep learning framework and NetworkX to manipulate the graph structure. The reinforcement learning environment was built with reference to the environment of the gym library, and the reinforcement learning algorithm was implemented using PyTorch. We employed a server equipped with an RTX3090 GPU to conduct model training at the hardware level.

6.1 Datasets

We chose four real-world social networks to evaluate the feasibility and performance of the proposed method.

1.
Zachary’s karate club [34]. This dataset reported by Wayne W. Zachary concerns the social network of a karate club on a university campus, and it is frequently employed as an illustration in community structure analysis.
2.
Facebook [35]. This dataset consists of “circles” (or “friends lists”) from Facebook. The dataset was gathered from participants who used a certain app to access Facebook.
3.
Cora [36]. Cora is a collection of academic papers related to machine learning that is available in a citation network format. The citation relationships between the papers are extracted and these relationships are utilized to form a network topology.
4.
Email [37]. Email data obtained from a major European research institution were utilized to develop the network. The e-mails only represent communication between institution members (the core). The dataset does not include any messages that were sent to or received from external sources.

Rumor control research requires only community structure information. Therefore, two versions of the dataset, namely Facebook-s and Cora-s, were created by taking the community structures from the original Facebook and Cora. The details of the datasets are shown in Table 1.

Table 1 The statistics of experimental datasets

Full size table

6.2 Evaluation criteria

To evaluate the performance of our proposed method, we consider the infection rate [6, 16, 17], i.e., the proportion of people affected by the rumor compared with total number of people, as the most intuitive way to measure the results of rumor propagation. A lower rate of infections indicates that rumor control is effective.

$$ Infection\_Rate = \frac{\vert I^{T} \vert}{\vert \mathcal{V} \vert} $$

(10)

This study conducts a thorough assessment of the method, which includes not only the infection rate but also the precision, recall, and F1 score. These indicators focus on assessing the accuracy of the prediction rather than the infection rate, which cannot give a full account of the influence on rumor control.

6.3 Hyperparameter setting

Hyperparameters, such as the number of neural network layers, batch size, and learning rate, are settings for our approach that cannot be learned from the data. They are often chosen by the practitioner and are typically specific to the problem at hand. These values are typically set before training the model, and they can significantly affect the performance of the model. Let n represent the number of nodes in a focused dataset. First, we attempt to reduce the size of the neural network to prevent overfitting. The parameter n for each neural network layer is represented in Table 2. Second, increasing the batch size during the training process can achieve better training results [38]. Therefore, we use a batch size that increases as our experiment progresses. Third, the learning rate depends on the size of the datasets. A larger dataset requires a lower learning rate. As a result, the training process can be dramatically improved by doubling or halving the learning rate as shown in Table 2.

Table 2 Hyperparameter settings of the RLDB model

Full size table

The ReLU function is used to activate the hidden layer, and Adam [39] is utilized for its optimization. The determination of the hyperparameters requires a combination of expertise and trial-and-error to identify the best possible configuration. The number of neurons should be at most 2¹0, and the neural network should contain at most 4 layers. The batch size configuration should be based on the memory capacity of the GPU. Having a larger batch size can help make the training process more stable. It is worth mentioning that the Dropout in the deep reinforcement learning model can prevent the convergence of the training loss.

6.4 Baseline methods

We chose four baseline methods to compare the performance of the proposed RLDB model. Our experiment offers an optimal setting for the algorithms with tunable parameters.

(1)
Random. Randomly select a node as a blocker from the set of candidate nodes C.
(2)
Out-Degree(OD) [19]. The out-degree of a node u in a network is equal to the number of outgoing edges from u. Inferring the influence of individuals in social networks is more precise when utilizing out-degree nodes compared to other centrality-based approaches.
(3)
Betweenness Centrality(BC) [40]. The betweenness of node u equals the number of shortest paths from all nodes to other nodes through node u. Social network research has increasingly emphasized the importance of betweenness centrality.
(4)
PageRank(PR) [41]. Google commonly uses the PageRank score to determine the importance of a website node. PageRank’s dam** factor parameter is set to 0.85 in all our experiments on the datasets.

6.5 Results

6.5.1 Study of parameter α

The experiment aims to analyze the influence of popularity α on the propagation of rumors in SRPMs. The graph in Fig. 8 displays the time elapsed from the start of a rumor to its termination and the infection rate of the whole social network. A rumor with low popularity will be difficult to spread on a social network. Consequently, its life expectancy will be brief. The time for a rumor to spread in a social network is roughly equal to the diameter of the entire network when the popularity is high. Such rumors can also be disseminated quickly.

A rumor has the longest spreading time when the popularity is moderate. We can determine a suitable level of popularity α according to the time needed for the rumor to spread in a social network. The optimal popularity, represented by $\alpha ^{\prime }$, is achieved when the propagation time is greatest. Rumor infection rates can similarly be examined to analyze the spread of rumors within social networks.

Figure 8 reveals that there are similar trend variations of popularity α among the four datasets. The rumor propagation time first increases and then decreases as the popularity α rises.This consistent trend reinforces our findings regarding the influence of rumor popularity on the duration of propagation. Nevertheless, each dataset requires a distinct level of popularity for optimal performance. The email dataset has an optimal popularity of $\alpha ^{\prime }=0.1$, while the best popularity in the Cora dataset is $\alpha ^{\prime }=0.1$. Sparse networks tend to be preferred over dense networks in regard to achieving optimal popularity. The rumor infection rate is estimated to be in the range of 0.5 to 0.7 when $\alpha =\alpha ^{\prime }$.

6.5.2 Study of the parameter λ

This section examines the effects of scaling factor λ in the DRPMs. This exploration aims to discover the role of scaling factor λ in determining the propagation time and infection rate. Figure 9 shows the intercorrelations between λ and the propagation time or infection rate in the four datasets. An interesting point to note is that if the value of λ is too large or too small, it will reduce the time required for a rumor to propagate in a social network. Conversely, we can determine the appropriate scaling factor λ based on the propagation time of a rumor in social networks. In addition, the factor λ is designated as the optimal scaling factor $\lambda ^{\prime }$ if λ is associated with the longest propagation time.

6.5.3 Performance comparison

This section empirically compares the RLDB method with the baselines, i.e., Random, OD, BC, and PR, under two rumor propagation models, an SRPM and a DRPM. The infection rates of the evaluation metrics for the five blocker budgets are recorded in Tables 3 and 4 for the SRPMs and DRPMs, respectively. The trend comparisons of the four datasets are depicted in Figs. 10 and 11.

Table 3 Infection rate comparison under DRPM

Full size table

Table 4 Infection rate comparison under DRPMs

Full size table

Figure 10 presents the baseline comparison under the SRPM models. Overall, the RLDB method maintains significant rumor control influence in all datasets, especially under the smallest blocker set size k = 1. There is a significant positive performance improvement (a minimal infection rate) of the RLDB approach compared with the baseline methods. Gradually, the gap between RLDB and other comparison methods is reduced as the value of k increases. Some control boundaries can be found in the datasets of Zachary’s karate club and Cora. When k equals 10, rumor infection rates are kept to an extremely low level. Currently, the RLDB and comparison methods have a small difference in performance. The RLDB method still has some advantages over the other approaches when applied to the Facebook and Email datasets with increasing the values of k.

An interesting observation can be drawn from the Facebook-s dataset. The infection rates of PR and BC methods reverse as the blocker set size k increases. The PR strategy is more effective at control than the BC method when k < 4. Increasing the value of k improves the BC technique’s capability, eventually leading to better results than those of PR. The reversal may be explained by the complexity of the network topology. The performance of techniques based on a single metric can be adversely affected by fluctuations in the dataset parameters. In contrast, the data-driven RLDB method achieves superior performance across multiple scenarios through the strategies derived from learning processes.

Figure 11 presents the infection rates under the DRPM models. In contrast to the SRPM models, the DRPM model introduces dynamic popularity, resulting in a more complex propagation process. The superior control effect of the RLDB model attests to its superior generalizability, especially for intricate propagation models and network structures.

Figures 12 and 13 illustrate the precision, recall, and F1 scores for the SRPM and DRPM models, respectively. RLDB performs better than the other methods when evaluated on both the SRPM and DRPM by each metric. RLDB stands out as particularly superior when analyzed using the DRPM on multiple datasets. It is noteworthy that Random has outperformed the other comparison methods except for RLDB in terms of accuracy, recall, and F1 scores, while OD has fared the worst in comparison to the infection rate. These scores are due to the fact that OD, BC, and PR methods based on node statistical attributes tend to select highly influential nodes. Despite having a large influence, the opinions of high-influence nodes are usually not swayed by other nodes, which makes these methods less successful than randomly choosing nodes.

7 Conclusion and future work

This paper is an initial exploration of the potential of reinforcement learning to control the spread of rumors on social media. The insights on model construction gained from this study may assist in reducing the infection rates of rumors in an integrated manner. First, we propose the static and dynamic rumor propagation models SRPM and DRPM based on the independent cascade models. Second, this research project advances the knowledge of rumor propagation and presents a dynamic rumor influence minimization problem. This problem offers more control over the spread of rumors by breaking down the blocking process into several components over time, in contrast to the traditional static rumor influence minimization problem. Another significant accomplishment is the implementation of reinforcement learning for dynamic blocking (RLDB) as a practical strategy to prevent the spread of rumors from multiple sources and blocking processes.

Testing our approach on extensive real-word social networks has proven effective. However, the results on large-scale artificial datasets demonstrate that RLDB takes significantly more time when dealing with networks with more than 1500 individual nodes. A potential way to address this issue is to split up a large network into smaller networks by creating distinct communities. In future research, we will optimize the efficiency of RLDB in large networks to improve its applicability.

References

Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science 359 (6380):1146–1151. https://doi.org/10.1126/science.aap9559
Article Google Scholar
Dang Z, Li L, Ni W, Liu R P, Peng H, Yang Y (2021) How does rumor spreading affect people inside and outside an institution. Inf Sci 574:377–393. https://doi.org/10.1016/j.ins.2021.05.085
Article MathSciNet Google Scholar
Hassanian-Moghaddam H, Zamani N, Kolahi A -A, McDonald R, Hovda K E (2020) Double trouble: methanol outbreak in the wake of the covid-19 pandemic in Iran—a cross-sectional assessment. Crit Care 24(1):1–3
Article Google Scholar
Pogue D (2017) How to stamp out fake news. Sci Am 316:24–24. https://doi.org/10.1038/scientificamerican0217-24
Article Google Scholar
Fan L, Lu Z, Wu W, Thuraisingham B M, Ma H, Bi Y (2013) Least cost rumor blocking in social networks. In: ICDCS, pp 540–549. https://doi.org/10.1109/ICDCS.2013.34
Wang B, Chen G, Fu L, Song L, Wang X (2017) DRIMUX: dynamic rumor influence minimization with user experience in social networks. IEEE Trans Knowl Data Eng 29(10):2168–2181. https://doi.org/10.1109/TKDE.2017.2728064
Article Google Scholar
Yan R, Li D, Wu W, Du D, Wang Y (2020) Minimizing influence of rumors by blockers on social networks: algorithms and analysis. IEEE Trans Netw Sci Eng 7(3):1067–1078. https://doi.org/10.1109/TNSE.2019.2903272
Article MathSciNet Google Scholar
Wang X, Deng K, Li J, Yu J X, Jensen C S, Yang X (2020) Efficient targeted influence minimization in big social networks. World Wide Web 23(4):2323–2340
Article Google Scholar
Zhu J, Ni P, Wang G (2020) Activity minimization of misinformation influence in online social networks. IEEE Trans Comput Social Syst 7(4):897–906. https://doi.org/10.1109/TCSS.2020.2997188
Article Google Scholar
Shi Q, Wang C, Ye D, Chen J, Feng Y, Chen C (2019) Adaptive influence blocking: minimizing the negative spread by observation-based policies. In: 2019 IEEE 35th international conference on data engineering (ICDE), pp 1502–1513. https://doi.org/10.1109/ICDE.2019.00135
Tong G, Wu W, Guo L, Li D, Liu C, Liu B, Du D -Z (2020) An efficient randomized algorithm for rumor blocking in online social networks. IEEE Trans Netw Sci Eng 7(2):845–854. https://doi.org/10.1109/TNSE.2017.2783190
Article MathSciNet Google Scholar
Kimura M, Saito K, Motoda H (2008) Minimizing the spread of contamination by blocking links in a network. In: AAAI, pp 1175–1180
Jia F, Zhou K, Kamhoua C, Vorobeychik Y (2020) Blocking adversarial influence in social networks. In: Zhu Q, Baras JS, Poovendran R, Chen J (eds) Decision and game theory for security. Springer, pp 257–276
Yan R, Li Y, Wu W, Li D, Wang Y (2019) Rumor blocking through online link deletion on social networks. ACM Trans Knowl Discov Data 13(2). https://doi.org/10.1145/3301302
He X, Song G, Chen W, Jiang Q (2012) Influence blocking maximization in social networks under the competitive linear threshold model. In: Proceedings of the 2012 Siam international conference on data mining. SIAM, pp 463–474
Hosni A I E, Li K, Ahmad S (2020) Minimizing rumor influence in multiplex online social networks based on human individual and social behaviors. Inf Sci 512:1458–1480. https://doi.org/10.1016/j.ins.2019.10.063
Article MathSciNet MATH Google Scholar
Hosni A I E, Li K (2020) Minimizing the influence of rumors during breaking news events in online social networks. Knowl Based Syst 105452:193. https://doi.org/10.1016/j.knosys.2019.105452
Google Scholar
Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’01, New York, pp 57–66. https://doi.org/10.1145/502512.502525
Kempe D, Kleinberg J M, Tardos É (2015) Maximizing the spread of influence through a social network. Theory Comput 11:105–147. https://doi.org/10.4086/toc.2015.v011a004
Article MathSciNet MATH Google Scholar
Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T P, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nat 529(7587):484–489
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M A, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nat 518(7540):529–533
Article Google Scholar
Watkins C J, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292
Article MATH Google Scholar
Williams R J (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256. https://doi.org/10.1007/BF00992696
Article MATH Google Scholar
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M A (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Bei**g, China, 21–26 June 2014. JMLR workshop and conference proceedings, vol 32, pp 387–395
Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, Lecun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings
Tian S, Mo S, Wang L, Peng Z (2020) Deep reinforcement learning-based approach to tackle topic-aware influence maximization. Data Sci Eng 5(1):1–11
Article Google Scholar
Nedunchezhian P, Mahalingam M (2022) Sybilsort algorithm—a friend request decision tracking recommender system in online social networks. Appl Intell 52(4):3995–4014. https://doi.org/10.1007/s10489-021-02578-x
Article Google Scholar
Yang J, Wu Y (2022) An approach of bursty event detection in social networks based on topological features. Appl Intell 52(6):6503–6521. https://doi.org/10.1007/s10489-021-02729-0
Article Google Scholar
Qiu L, Sai S, Wei M (2022) BPSL: a new rumor source location algorithm based on the time-stamp back propagation in social networks. Appl Intell 52(8):8603–8615. https://doi.org/10.1007/s10489-021-02919-w
Article Google Scholar
Indu V, Thampi S M (2019) A nature-inspired approach based on forest fire model for modeling rumor propagation in social networks. J Netw Comput Appl 125:28–41. https://doi.org/10.1016/j.jnca.2018.10.003
Article Google Scholar
Swetha N G, Karpagam G R (2022) Reinforcement learning infused intelligent framework for semantic web service composition. Appl Intell 52(2):1979–2000. https://doi.org/10.1007/s10489-021-02351-0
Article Google Scholar
Chen L, Cui J, Tang X, Qian Y, Li Y, Zhang Y (2022) Rlpath: a knowledge graph link prediction method using reinforcement learning based attentive relation path searching and representation learning. Appl Intell 52(4):4715–4726. https://doi.org/10.1007/s10489-021-02672-0
Article Google Scholar
Kumar R, Amgoth T (2022) Reinforcement learning based connectivity restoration in wireless sensor networks. Appl Intell 52(11):13214–13231. https://doi.org/10.1007/s10489-021-03084-w
Article Google Scholar
Zachary W (1976) An information flow model for conflict and fission in small groups1. J Anthropol Res 33. https://doi.org/10.1086/jar.33.4.3629752
McAuley J J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Bartlett P L, Pereira F C N, Burges C J C, Bottou L, Weinberger K Q (eds) Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, pp 548–556. https://proceedings.neurips.cc/paper/2012/hash/7a614fd06c325499f1680b9896beedeb-Abstract.html
McCallum A, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr 3(2):127–163. https://doi.org/10.1023/A:1009953814988
Article Google Scholar
Leskovec J, Kleinberg J M, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):2. https://doi.org/10.1145/1217299.1217301
Article Google Scholar
Smith S L, Kindermans P -J, Le Q V (2018) Don’t decay the learning rate, increase the batch size. In: International conference on learning representations. https://openreview.net/forum?id=B1Yy1BxCZ
Kingma D P, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, Lecun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. 1412.6980
Brandes U (2008) On variants of shortest-path betweenness centrality and their generic computation. Soc Netw 30(2):136–145
Article Google Scholar
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford Infolab

Download references

Funding

This work is supported by the Science and Technology Program of Sichuan Province (Grant nos. 2023YFD0424, 2022YFG0378) and the National Natural Science Foundation (Nos. 61902324, 11426179, and 61872298).

Author information

Authors and Affiliations

School of Computer and Software Engineering, **hua University, Chengdu, 610039, People’s Republic of China
Jiajian Jiang, **aoliang Chen, Zexia Huang, **anyong Li & Yajun Du
Department of Computer Science and Operations Research, University of Montreal, Montreal, QC, H3C3J7, Canada
**aoliang Chen

Authors

Jiajian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
**aoliang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zexia Huang
View author publications
You can also search for this author in PubMed Google Scholar
**anyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Du
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J. J. Jiang, Conceptualization; J. J. Jiang, Data curation; X. L. Chen, Formal analysis; J. J. Jiang and Z. X. Huang, Funding acquisition; Y. J. Du, Investigation; X. L. Chen, Methodology; X. L. Chen, and, Y. J. Du Project administration; X. L. Chen, Resources; J. J. Jiang, Software; X. L. Chen and Y. J. Du, Supervision; X. Y. Li, Validation; X. L. Chen and Z. X. Huang, Visualization; J. J. Jiang/Writing - original draft; X. L. Chen/Writing - review&editing.

Corresponding author

Correspondence to **aoliang Chen.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Competing interests

The authors have no competing interests to declare that are relevant to the content of the article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**aoliang Chen, Zexia Huang, **anyong Li and Yajun Du are contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, J., Chen, X., Huang, Z. et al. Deep reinforcement learning-based approach for rumor influence minimization in social networks. Appl Intell 53, 20293–20310 (2023). https://doi.org/10.1007/s10489-023-04555-y

Download citation

Accepted: 28 February 2023
Published: 04 April 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10489-023-04555-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep reinforcement learning-based approach for rumor influence minimization in social networks

Abstract

Similar content being viewed by others

Controlling Rumor Cascade over Social Networks

HISBmodel: A Rumor Diffusion Model Based on Human Individual and Social Behaviors in Online Social Networks

Minimizing Influence of Rumors by Blockers on Social Networks

1 Introduction

2 Related work

2.1 Rumor influence minimization

2.2 Deep reinforcement learning

3 Preliminaries

3.1 Social networks

3.2 Rumor propagation models

3.3 Reinforcement learning

4 Problem formulation

4.1 Dynamic rumor propagation model

4.2 Dynamic rumor influence minimization

Definition 1

Definition 2

5 Methodology

5.1 Reinforcement learning model for blocker selection

5.2 Parameter learning

6 Experiment

6.1 Datasets

6.2 Evaluation criteria

6.3 Hyperparameter setting

6.4 Baseline methods

6.5 Results

6.5.1 Study of parameter α

6.5.2 Study of the parameter λ

6.5.3 Performance comparison

7 Conclusion and future work

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation