Supervised Group Embedding for Rumor Detection in Social Media

Liu, Yuwei; Chen, **ngming; Rao, Yanghui; **e, Haoran; Li, Qing; Zhang, Jun; Zhao, Yingchao; Wang, Fu Lee

doi:10.1007/978-3-030-19274-7_11

Yuwei Liu¹⁷,
**ngming Chen¹⁷,
Yanghui Rao¹⁷,
Haoran **e¹⁸,
Qing Li¹⁹,
Jun Zhang²⁰,
Yingchao Zhao²¹ &
…
Fu Lee Wang²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11496))

Included in the following conference series:

International Conference on Web Engineering

1957 Accesses
2 Citations

Abstract

To detect rumors automatically in social media, methods based on recurrent neural network and convolutional neural network have been proposed. These methods split a stream of posts related to an event into several groups along time, and represent each group using unsupervised methods such as paragraph vector. However, many posts in a group (e.g., retweeted posts) do not contribute much to rumor detection, which deteriorates the performance of rumor detection based on unsupervised group embedding. In this paper, we propose a Supervised Group Embedding based Rumor Detection (SGERD) model that considers both textual and temporal information. Particularly, SGERD exploits post-level textual information to generate group embeddings, and is able to identify salient posts for further analysis. Experimental results on two real-world datasets demonstrate the effectiveness of our proposed model.

You have full access to this open access chapter, Download conference paper PDF

Rumor Detection with Hierarchical Recurrent Convolutional Neural Network

Security issues of news data dissemination in internet environment

Article Open access 22 March 2024

Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection

Keywords

1 Introduction

A rumor is defined as a story or a statement whose truth-value is unverified or deliberately false [18]. Nowadays, with the rapid growth of social media, large amounts of rumors are easily spread across the Internet. This brings negative effect (e.g., public panic) onto the society. For example, on April 23th of 2013, a rumor about “explosion” that injured Barack Obama in the White House spread through Twitter and wiped out $130 billion in stock value^{Footnote 1}. Therefore, it is crucial to detect rumors on social media effectively and as early as possible before they get spread widely.

In the previous studies, there have been several methods aiming to detect rumors for each post [18, 25]. Individual posts typically contain limited context, and rumors may be depicted by the same truth-telling way as non-rumor ones. Besides, a single post does not reflect much about the temporal property of a claim spreading on social media. Therefore, current studies on rumor detection aim to classify the aggregation of posts by identifying an event as a rumor or not [8, 11, 27, 28]. An event, which may possess true or false information, is defined as a set of posts (e.g., microblogs, tweets, and wechats) related to some specific claim [11]. Most previous approaches for rumor detection are based on applying conventional supervised learning algorithms with manually designed features. A wide variety of handcrafted features, such as content-based, user-based, and propagation-based features [1, 8, 12, 25], have been incorporated. Some other rumor detection methods exploit complicated features, e.g., user’s feedback [19], variation of features along event lifecycle [12], signals posts reflecting skepticism about factual claims [28] and conflict viewpoints [4]. Recently, Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) based methods [11, 27] have been shown to be competitive for rumor detection over events. These two methods both view an event as a series of posts, by splitting posts into groups along time. GRU is particularly suited for modeling sequential phenomena and capable of capturing the variation of contextual characteristic over rumor diffusion, and CNN can extract both local and global features through the convolution operation and reveal these high-level interactions [26]. However, there are some drawbacks in these models. Firstly, GRU is bias towards the latest groups [16] and CNN is not inherently equipped for a sense of time series [2]. Secondly, they identify rumors according to content features of groups represented by tf-idf or unsupervised paragraph vector [9]. We observe that posts on social media are full of redundant posts, and many posts about an event contribute less to rumor detection. Therefore, it is unsuitable to generate group embedding by tf-idf or unsupervised paragraph vector. Thirdly, since these models use group representations as the input, salient posts that contribute to rumor classification are arduous to get, yet are important for further analysis in real-world tasks, such as public opinion monitoring, where picking out salient posts is crucial for experts to verify conclusions drawn by automatic methods. Furthermore, the above-mentioned models only use the content information, while other useful features of groups (e.g., temporal information) are ignored. Recently, Liu et al. [10] incorporates attention mechanism to model content and dynamic information of individual post for rumor detection. However, without grou** the aggregated posts, it cannot utilize variation of features along an event’s lifecycle while it is helpful for rumor detection [12]. In addition, the model may be very complex when an event consists of a large number of posts.

To address the issues mentioned above, we propose a model named Supervised Group Embedding based Rumor Detection (SGERD) in this paper. First, in order to make each group contain as many correlated posts as possible, we split posts of each event into several groups with equal time interval by following [27]. Our intuition is that the representation of groups using unsupervised methods is arduous to alleviate the negative effects of redundant posts in a group, thus we directly take the content of posts as the input to learn the task-oriented representation for each post, and extract the local features of nearby posts to generate group embeddings. Furthermore, considering that the influence of groups on different time windows is dissimilar, we model the temporal information of groups and equip SGERD with a sense of group order.

The main contributions of our work are as follows:

We conduct rumor detection at the post-level by proposing a supervised method to learn group embeddings, which significantly improves the model performance. Moreover, we can conveniently pick out meaningful posts from each event.
We incorporate temporal information besides textual features into neural networks, which is shown to be helpful for rumor detection.
Experiments are conducted on two real-world datasets, and the results show that SGERD is effective and outperforms state-of-the-art methods.

The remainder of this paper is organized as follows. We review related work in Sect. 2, and present the SGERD for rumor detection in Sect. 3. We detail the dataset, experimental results, and discussion in Sect. 4. Finally, we present conclusions in Sect. 5.

2 Related Work

In recent years, the task of detecting rumors on the Internet has received considerable attention. As for research objectives, most studies attempted to detect rumors at the post level, i.e., classify a single post as rumor or not [18, 25], or identify whether the aggregation of posts under an event is rumor [8, 11, 27, 28]. Some other researches aimed to detect fake images [3] and identify hoax articles in WiKipedia [7]. With respect to rumor detection methods, many previous studies employed traditional classifiers using different sets of hand-crafted features. For instance, various features are extracted from the content, user characteristics, and the propagation pattern [1, 8, 12, 25]. Moreover, some rumor detection methods exploited complicated features, such as user’s feedback [19], variation of features along an event’s lifecycle [12], signals posts reflecting skepticism about factual claims [28], and conflict viewpoints [4].

Recently, Gated Recurrent Unit (GRU) [11, 14], Convolutional Neural Network (CNN) [27] and attention based method [10] have been proposed for rumor detection. Different from other prior works, they exploited the content of posts rather than typical features of events (e.g., the retweet number of an event and the information related to evaluate a user’s credibility). Yu et al. [27] adopted a two-layer CNN model named CAMI to extract both local and global features of events. First, posts of an event are split into twenty groups according to an equal time interval. Then, the groups are embedded into representations with fixed sizes by paragraph vector [9]. Last, the model takes group embeddings of the event as input and detects whether the event is rumor. Liu et al. [10] proposed an attention-based approach called AIM to detect rumors using content and dynamic information. However, their approach does not utilize grou** method and the number of model parameters is proportional to the number of posts in each event. In this way, classification will be intractable when there are great numbers of posts in events. Ma et al. [14] regarded rumor detection and stance classification as highly relevant tasks. They associated them and proposed a model which utilizes a multi-task learning scheme to model features shared by two tasks. Some studies modeled the propagation structures of different events by exploiting tree-kernel [13] and recursive neural network [24] in order to capture the patterns of propagation trees.

In the above, methods integrating various hand-crafted features into traditional classifiers only rely on limited context and cannot capture high-level features, thus they fail to be adaptive to complicated occasions of social media. Methods based on GRU have a preference for the latest group of events, while the latest one may not play a key role in rumor detection [27]. Though CAMI using CNN achieves the state-of-the-art performance, it has the drawback of not being equipped with a sense of time series. AIM does not utilize the grou** method, thus it cannot model variation of features along an event’s lifecycle, which is useful for rumor detection [12], and fails to be applied to events with a large number of posts.

3 Proposed Rumor Detection Model

3.1 Definitions

According to [11, 27], an event is defined as a set of posts related to a specific claim, e.g., “Trump Campaign colluded with Russia during 2016 presidential election”, and each post is associated with a timestamp. In this way, an event contains much more information than a single post. We denote an event instance as $e ={(post_{i}, timestamp_{i})}$, consisting of ideally all relevant posts $post_{i}$ at $timestamp_{i}$, where $timestamp_{i}$ is in chronological order and $timestamp_{1}$ is the start time of e, i.e., timestamp of the first post of the corresponding event e. The total number of posts under e is denoted as $\vert e \vert $, and thus $i \in \left[ 1, \vert e \vert \right] $. Based on this definition, our task is to detect whether a sequence of relevant posts associated with an event is rumor. Following the previous work [27], we split posts of an event into n groups according to an equal time interval for each event and set n to twenty.

3.2 Model Structure

The overall model architecture is illustrated in Fig. 1, which contains four modules: split posts into n groups and learn task-oriented post embedding, construct group embedding G over a variable length of posts, learn temporal embedding $T_{emb}$ to equip model with a sense of group order, and employ a series of convolution operations for classification.

3.2.1 Representation of Posts in Each Group

For an event instance e with n groups, our input consists of streams of posts, which can be interpreted as a time series where nearby posts are likely to be correlated. The work of Shen et al. [20] shown that a model only using simple operations (e.g., parameter-free pooling operation) on word embeddings may have comparable performance for some tasks. Inspired by their work, different from the procedure utilized in the literature [27] for rumor detection, we use average word embeddings to represent posts instead of paragraph vector [9] for a simpler procedure. Concretely, let ($v_1$, $v_2$, ..., $v_L$) denote a sequence of words of a post, where each word $v_i$ is represented by a d-dimensional word embedding trained by Word2Vec [15], and L is the number of words this specific post contains. We represent each post as the average word embeddings of this post: $\frac{1}{L}(\sum _{i=1}^{L}v_i)$. This operation can be viewed as average pooling and result in an embedding with the same dimension d as word embedding $v_i$.

However, post representation by average word embeddings is not task-oriented, i.e., it is represented by unsupervised method and thus not suitable for rumor detection. For the purpose of generating a representation of each post that fits for detecting rumor, we utilize a fully connected feed-forward network (FFN) [22] to apply to each post. It consists of two linear transformations with a hidden Rectified Linear Unit (ReLU) [17] nonlinearity in between as follows:

$$\begin{aligned} \mathrm{FFN}(x) = \mathrm{ReLU}(xW_{1})W_{2}, \end{aligned}$$

(1)

where $W_{1} \in R^{d \times h}$ and $W_{2} \in R^{h \times d}$ are parameter matrices. It uses trainable weight matrices to attend to different dimensions of the input, and thus we can obtain the representation of each post that more suits for the task of rumor detection after applied by the FFN.

3.2.2 Generation of Group Embedding

In this section we propose a supervised method to generate group embedding over a variable length of posts. We define $Group_{i} \in \mathbb {R}^{len_{i} \times d}$ as the i-th group of event e with a sequence of $len_{i}$ posts, where each post is allocated by the grou** method proposed by [27], $Group_{i}[j] \in \mathbb {R}^{d}$ as the embedding of the j-th post in $Group_{i}$ obtained by the average pooling of word embeddings and applied Eq. (1), and $Group_{i}[j:j+len] \in \mathbb {R}^{(len+1) \times d}$ as the concatenation of the embeddings from post j up to post $j+len$. We apply the convolution operation to combine nearby posts from temporal windows of filter size, and extract local features for group embeddings. We denote a one-dimensional convolution filter F as a weight matrix $W_{F} \in \mathbb {R}^{ws \times d} $, where ws is the size of filter F. When F is applied to $Group_i$, the dot product is calculated between $W_{F}$ and each possible windows of ws successive posts representations, then bias $b_{F}$ is added and activation function f is applied. This results in a feature map $p \in \mathbb {R}^{len_i-ws+1} $ with entry j as

$$\begin{aligned} p[j] = f(W_{F} \cdot Group_i[j:j+ws-1] + b_F), \end{aligned}$$

(2)

where $ j \in [1, 1+len_i-ws] $, $b_F \in \mathbb {R}$, f is a non-linear function such as ReLU. Note that the weights of convolutional kernels are shared across different groups. After the convolution operation, a sequence of local features of nearby posts is extracted, where each one is corresponding to posts of the same time window. Since many posts do not contribute to rumor detection, we want to restrain the non-salient local features and keep important features in the group embedding that are helpful for rumor detection. For this purpose, finally, we apply a max-over-time pooling operation [5] over the feature map p and take the maximum value as the salient feature. The general idea is to mine one significant feature with the highest value of each feature map corresponding to this specific filter, and meanwhile ignore some less important information. After the pooling operation, we aggregate the local features to obtain a global representation for groups, i.e., each group is represented by a fixed length vector $g_i \in \mathbb {R}^{m} $, whose size is equal to the number of filters.

To equip our SGERD with a sense of group order, as well as to model the influence of each group within different time windows, we incorporate temporal information $T = [t_1, t_2, ..., t_n]$ of e into the generated group embeddings, where each entry $t_i$ is min-max normalized time interval of end time of the i-th group and start time of e, i.e. this time interval is the latest timestamp of post in the i-th group minus start time of e. This is similar to exploiting the information about the position of tokens in a sequence by position embeddings [2, 23]. In particular, we embed T by a weighted vector $V \in \mathbb {R}^{n}$ and a bias $b_T$, followed by a non-linear hyperbolic tangent (tanh), which results in $T_{emb}$ with each row as:

$$\begin{aligned} T_{emb}[i] = \mathrm{tanh}(T \circ V^\mathrm {T} + b_T), \end{aligned}$$

(3)

where $i \in [1, m]$, $b_{T} \in R$, and $\circ $ represents the element-wise multiplication.

Finally, both group embedding and temporal embedding are combined to obtain a temporal-aware group embedding: $\tilde{G} = G + T_{emb}$, where $G=[g_1, g_2, ..., g_n]$ and the columns of $\tilde{G}$ can be viewed as tuned group embeddings with the temporal information. Temporal embedding is useful in our architecture since they give our SGERD a sense of which parts of event it is currently dealing with and reflect different influence of each group (ref. Sect. 4.3).

3.2.3 Group Embedding-Based Rumor Detection

After constructing embedding for each group, we repeat the above convolution operation twice to extract low and high level group features from $\tilde{G}$, while these operations use different settings of filters. Then, a fully connected layer and the ultimate output $\hat{l}_{e}$ are obtained via softmax, where $\hat{l}_{e}$ is the predicted probability of event e being the category of rumor.

Our model is trained end-to-end by minimizing the following error over the training set D:

$$\begin{aligned} J = -\sum _{\forall e \in D}l_{e}\ln {\hat{l}_{e}} - \sum _{\forall e \in D}(1-l_{e})\ln {(1-\hat{l}_{e})} + \frac{\lambda }{2}||\theta ||_{2}, \end{aligned}$$

(4)

where $l_{e}$ is the ground truth label of e, $\lambda $ is the regularization term, and $\theta $ is the parameter set to be trained during learning. Training is done through stochastic gradient descent over shuffled mini-batches with Adam [6] update rule.

4 Experiment

In this section, we evaluate the performance of the proposed model for rumor detection. We have designed the experiments to achieve the following goals: (i) to compare the performance of different methods in detecting rumors, (ii) to evaluate the function of different components for learning group embedding, (iii) to evaluate the effectiveness of mainstream methods in early detection of rumors, and (iv) to validate the model performance by extracting salient posts which contribute more to detect rumors.

4.1 Dataset

Following previous works on rumor detection [12, 27], we evaluate the effectiveness of our SGERD on two real-world datasets: Weibo and Twitter. There are 2,313 and 2,351 events belonging to rumor and non-rumor in Weibo, 498 and 494 events belonging to rumor and non-rumor in Twitter, respectively. As for temporal information, average time intervals of events are 2,460.7 h and 1,582.6 h for Weibo and Twitter, respectively. The above numbers of rumor events from Weibo were obtained from Sina community management center^{Footnote 2}, and similar numbers of non-rumor events were gathered by crawling the posts of general threads that are not reported as rumors. For Twitter, rumor and non-rumor events were confirmed by Snopes^{Footnote 3}—an online rumor debunking service, and combined with some non-rumor events from two public datasets [1, 8].

4.2 Experimental Settings

To demonstrate the effectiveness of our proposed SGERD on rumor detection, we have implemented the following baselines for comparison:

AIM is an attention-based method which utilizes both content and dynamic information of posts [10].

CAMI is based on two CNN hidden layers [27]. Input layer is content features of groups learned by paragraph vector [9], and groups have fixed number of twenty.

GRU-2 is based on two GRU hidden layers [11]. Input layer is content features represented by tf-idf, and time span of each group has variable length.

SVM-TS is a SVM classifier with linear kernel which uses time-series structures to model the variation of social context features [12]. These features are manually designed and based on contents, users and propagation patterns.

RFC is a Random Forest Classifier which aims to fit the temporal tweets volume curve with three parameters [8].

DT-Rank is a ranking model implemented by decision tree method to detect trending rumors, which ranks the clustered results by focusing on rumors with enquiry phrases and cluster disputed factual claims [28].

DTC is a Decision Tree Classifier, which models information credibility based on overall statistic handcrafted features [1].

SVM-RBF is a SVM-based Classifier adopting RBF kernel, which models information credibility based on overall statistic handcrafted features [25].

Note that although the method proposed by Ma et al. [14] can also detect rumor, jointly optimizing rumor detection and stance classification makes it unsuitable for comparison here. Methods that model propagation structures [13, 24] need propagation trees of posts for each event, thus they are not able to be compared for Weibo and Twitter. Following the setting of previous works [11, 27], we select 10% of data for validation, and split the remaining 90% into training and testing sets in a 3:1 ratio for both datasets. Note that validation, training and testing sets are stratified shuffled according to classes. We employ Accuracy, Precision, Recall and $F_{1}$ to evaluate the performance on rumor detection [27].

Our proposed SGERD is implemented based on Keras^{Footnote 4}. For each dataset, we set the regularization term $\lambda $ to be 0.001, the dimensionality of word embedding d as 100, the inner layer dimensionality h of FFN as 50, and the filter size ws of each convolution layer as (3, 3, 3). Finally, the corresponding filter numbers m for three layers are (50, 20, 20) and (50, 10, 10) for Weibo and Twitter, respectively. The above hyperparameters are tuned in the validation dataset.

4.3 Comparison with Baselines

Table 1 presents the performance of different methods in terms of Accuracy, Precision, Recall, and $F_{1}$. The accuracy on Twitter is generally much lower than on Weibo because Twitter is smaller than Weibo and has higher ratio of reposts. We can observe the performance ranking of these methods on rumor detection as follows: SGERD, CAMI, AIM, GRU-2, SVM-TS, RFC, DTC, SVM-RBF, and DT-Rank. All methods based on deep neural network (DNN) perform better than other conventional ones. The classical methods, i.e., SVM-TS, RFC, DTC, SVM-RBF, and DT-Rank, are mainly implemented by combining traditional classifiers with different sets of handcrafted features. They are not able to capture high-level features, thus fail to be effective in complicated scenarios of social media. In contrast, methods based on DNN, i.e., SGERD, CAMI, AIM, GRU-2, can dig out deep latent semantic features and tend to be adaptive to complicated scenarios. Furthermore, DT-Rank extracts organized expressions from posts with enquiry phrases, which is not common in the datasets we used. Thus relying on limited enquiry phrases, DT-Rank has rather restricted adaptability to different datasets. Compared to SVM-RBF and DTC, both SVM-TS and RFC achieve a better performance by integrating time-series structure, which confirms that temporal features are significant to rumor detection.

Table 1. Rumor detection results (R: Rumor; N: Non-rumor)

Full size table

Among all DNN based methods, our proposed SGERD outperforms all previous published DNN based methods for rumor detection. All these methods can model high-level features and dig out deep latent semantic information of posts, and thus reach a high performance. However, these methods all have limitations. GRU-2 is capable of capturing the variation of contextual characteristic over rumor diffusion, but GRU-2 has bias towards the latest group which usually does not play a key role. CAMI has been proven as able to extract both local and global features through convolution operation and reveal those high-level interactions, but its major model structure is implemented based on simple CNN, which is not inherently equipped with a sense of group order. Both GRU-2 and CAMI use unsupervised methods to generate group embeddings, which do not consider the different importance of each post. AIM employs two kinds of attention mechanisms to help exploit dynamical information of each post, however, the potential interactions among nearby groups are discarded since grou** methods are not utilized. By overcoming these shortcomings, SGERD achieves a considerable performance improvement on rumor detection.

4.4 Ablation Experiments

To evaluate the function of different components for learning group embedding, we implement three variations of SGERD, denoted as SGERD - P, SGERD - G, SGERD - $T_{emb}$, respectively. Compared with SGERD, SGERD - P directly utilizes the unsupervised average word embeddings to generate group embedding, rather than learn the representation of each post with supervision. Similarly, SGERD - G is conducted by employing max-pooling operation over the supervised post embeddings to represent a group without convolution operation to learn local features for the group embedding. Note that these two variations of our model do not use temporal embedding. To investigate the functional performance of temporal embedding, SGERD - $T_{emb}$ is implemented by removing temporal embedding from SGERD. The results of ablation experiments on rumor detection for each dataset are show in Table 2.

Table 2. Rumor detection results (R: Rumor; N: Non-rumor)

Full size table

Compared to SGERD - $T_{emb}$, SGERD - P decreases its performance on both two datasets. SGERD - G decreases by 0.3% and 0.4% on Weibo and Twitter dataset, respectively. This shows that learning the representation of each post with supervision is helpful for the rumor detection task, because it can generate more task-oriented post embedding. Similarly, SGERD - G decreases by 0.5% and 0.9% on Weibo and Twitter dataset, respectively. This shows that the convolution operation for extracting local features of nearby posts is important to detect rumors. Furthermore, it is obvious that the decreased performance brought by SGERD - G is relatively significant when compared with SGERD - P. We assume that it is because the convolution operation can combine features of multiple posts in the same time windows, which may be discarded by directly employing max-pooling operation over the post embeddings. Compared to SGERD in Table 1, SGERD - $T_{emb}$ decreases by 0.6% and 0.9% on Weibo and Twitter, respectively, which indicates that the temporal information of posts plays an important part on deciding whether or not an event is rumor, and our temporal embeddings successfully model the temporal features. Finally, all these variations without specific components still perform better than state-of-the-art methods, which indicates that it is better to model the post-level content using our supervised method than the unsupervised aggregation of groups.

4.5 Early Detection of Rumors

In practical occasion, rumor is usually requested to be detected as early as possible, and thus early detection of rumor is a crucial task. To investigate the performance of SGERD on early detection of rumors, we set several detection deadlines, posts after which are not considered in early detection. The mean official report time (ORT) of rumor given by the debunking services of Snopes and Sina community management center is taken as a reference. We conduct early detection experiments on AIM, CAMI, GRU-2 and SVM-TS for comparison, since these methods have the best performance among all mainstream methods. Although DT-Rank is mainly adopted in early detection task, its performance on rumor detection is much poorer than other baseline methods, and thus we do not take it into consideration.

Accuracy of different methods during different detection deadlines is presented in Fig. 2, from which we can observe that the accuracy curve of most methods will climb from a small value and gradually converge to a certain accuracy. During first several hours, the accuracy of SGERD climbs rapidly and tends to converge to a relatively high value at the earliest time, while other methods take longer time to converge and cannot reach such a high accuracy. The accuracy of SGERD will reach 91.4% for Weibo and 77.3% for Twitter within 12 h, which is much earlier than the official report time of rumor.

When discarding the temporal embedding, SGERD - $T_{emb}$ achieves an accuracy curve nearly coinciding with that of SGERD, which indicates that temporal embedding has a rather limited impact on early detection. This is because the proposed temporal embedding is designed to model temporal information for sequences of posts, which requires the input sequence being long enough. However, for early detection of rumors, the detection deadlines are restricted in a certain range and the number of posts is limited, resulting in that the temporal features of posts cannot be captured and temporal embedding for early detection cannot be as effective as usual rumor detection. Similarly, SVM-TS and GRU-2 model time series information of the input sequences in a way which conflicts with the requirement of early detection. Therefore, SVM-TS and GRU-2 are unsuitable when the detection deadline is early, and the climbing rates of their accuracies curves are slow and the convergence accuracies are low. Without integrating the temporal information, CAMI can extract key features even with a short sequence of posts, and its accuracy curves therefore become steadier and keep in higher accuracy than GRU-2 and SVM-TS. However, CAMI does not consider the different importance of each post in the same group. AIM ignores the variation among the nearby groups as it does not group the posts. Therefore, their accuracy curves converge to lower levels when compared with SGERD. With the benefit of modeling from post level, our SGERD can mine useful posts and alleviate the negative effect of redundant posts, and it ranks the first for early detection in every stage.

4.6 Samples of the Salient Posts

Similar to the visualization work in information retrieval [21], we present samples of the salient posts extracted by SGERD. Firstly, for each post, we evaluate the output of CNN before the max-over-time pooling operation, i.e., picking out the largest output value $p^{*}$ as in Eq. (2) among all the different filters to represent this post. Secondly, we sort all the posts according to their output value $p^{*}$, and trace back to posts that have large output value for each event. Finally, we get the salient posts making significant contribution to rumor detection. Figure 3 presents several events, from which we can visualize posts with large output values in blue that contribute more to rumor detection. Similarly, we illustrate trivial posts with small output values in green for comparison.

Sample (a) is an identified rumor on Weibo about “Enter reversed password when robbers threaten you to take money from ATM and it will call the police for help secretly”, sample (b) is an incorrect opinion of beer spreading on Weibo, which claims that beer causes male feminization, sample (c) is a false news of hairbands exported from China on Twitter, and sample (d) is a fabricated information on Twitter that vegetarian hot dog contains meat and human DNA. From above examples, we can observe that many posts express doubts and opposition to these events, such as “is it true?”, have a relatively large output value, while the posts with small value are redundant (e.g., “repost”) or just make a factual description. This step of visualization is especially useful for rumor detection because it provides explanatory information about how the model works, and helps understand better what is learned by SGERD.

5 Conclusions

In this paper, we have proposed a supervised group embedding based rumor detection model named SGERD. Our SGERD processes each event starting from posts and leverages the max-pooling operation to alleviate the effect of redundant posts, so it is able to pick out salient posts. Furthermore, SGERD incorporates temporal information to equip CNN with a sense of group order, and it models the influence of different groups. To demonstrate the effectiveness of our SGERD, we have done experiments on two real-world datasets, and the results show that it outperforms traditional handcrafted features based models as well as deep neural network models (i.e. GRU-2, CAMI and AIM). Finally, visualizing salient posts contributing more to rumor detection can help us comprehend better how SGERD works, and help domain experts more easily verify conclusions drawn by this automatic method.

Notes

References

Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: WWW, pp. 675–684 (2011)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML, pp. 1243–1252 (2017)
Google Scholar
Gupta, A., Lamba, H., Kumaraguru, P., Joshi, A.: Faking sandy: characterizing and identifying fake images on Twitter during hurricane sandy. In: WWW, pp. 729–736 (2013)
Google Scholar
**, Z., Cao, J., Zhang, Y., Luo, J.: News verification by exploiting conflicting social viewpoints in microblogs. In: AAAI, pp. 2972–2978 (2016)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP, pp. 1746–1751 (2014)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)
Google Scholar
Kumar, S., West, R., Leskovec, J.: Disinformation on the web: impact, characteristics, and detection of Wikipedia hoaxes. In: WWW, pp. 591–602 (2016)
Google Scholar
Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online social media. In: ICDM, pp. 1103–1108 (2013)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)
Google Scholar
Liu, Q., Yu, F., Wu, S., Wang, L.: Mining significant microblogs for misinformation identification: an attention-based approach. ACM Trans. Intell. Syst. Technol. 9(5), 50:1–50:20 (2018)
Article Google Scholar
Ma, J., et al.: Detecting rumors from microblogs with recurrent neural networks. In: IJCAI, pp. 3818–3824 (2016)
Google Scholar
Ma, J., Gao, W., Wei, Z., Lu, Y., Wong, K.: Detect rumors using time series of social context information on microblogging websites. In: CIKM, pp. 1751–1754 (2015)
Google Scholar
Ma, J., Gao, W., Wong, K.: Detect rumors in microblog posts using propagation structure via kernel learning. In: ACL, pp. 708–717 (2017)
Google Scholar
Ma, J., Gao, W., Wong, K.: Detect rumor and stance jointly by neural multi-task learning. In: WWW Companion, pp. 585–593 (2018)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531 (2011)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML, pp. 807–814 (2010)
Google Scholar
Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: EMNLP, pp. 1589–1599 (2011)
Google Scholar
Rieh, S.Y., Jeon, G.Y., Yang, J.Y., Lampe, C.: Audience-aware credibility: from understanding audience to establishing credible blogs. In: ICWSM, pp. 436–445 (2014)
Google Scholar
Shen, D., et al.: Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms. In: ACL, pp. 440–450 (2018)
Google Scholar
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: CIKM, pp. 101–110 (2014)
Google Scholar
Tan, Z., Wang, M., **e, J., Chen, Y., Shi, X.: Deep semantic role labeling with self-attention. In: AAAI, pp. 4929–4936 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 6000–6010 (2017)
Google Scholar
Wong, K., Gao, W., Ma, J.: Rumor detection on twitter with tree-structured recursive neural networks. In: ACL, pp. 1980–1989 (2018)
Google Scholar
Yang, F., Liu, Y., Yu, X., Yang, M.: Automatic detection of rumor on Sina Weibo. In: SIGKDD Workshop, pp. 13:1–13:7 (2012)
Google Scholar
Yin, W., Schütze, H.: Convolutional neural network for paraphrase identification. In: HLT-NAACL, pp. 901–911 (2015)
Google Scholar
Yu, F., Liu, Q., Wu, S., Wang, L., Tan, T.: A convolutional approach for misinformation identification. In: IJCAI, pp. 3901–3907 (2017)
Google Scholar
Zhao, Z., Resnick, P., Mei, Q.: Enquiring minds: early detection of rumors in social media from enquiry posts. In: WWW, pp. 1395–1405 (2015)
Google Scholar

Download references

Acknowledgments

We are grateful to the anonymous reviewers for their valuable comments on this manuscript. This research has been supported in part by the National Natural Science Foundation of China (U1611264), a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS11/E03/16), the Individual Research Scheme of the Dean’s Research Fund 2017–2018 (FLASS/DRF/IRS-8), Top-up Fund for General Research Fund/Early Career Scheme (TFG-3) and Seed Fund for General Research Fund/Early Career Scheme (SFG-6) of the 2018 Dean’s Research Fund to MIT Department, Small Grant for Academic Staff (MIT/SGA05/18-19) of The Education University of Hong Kong, and a Collaborative Research Grant (project no. C1031-18G) from the Research Grants Council of Hong Kong SAR.

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Yuwei Liu, **ngming Chen & Yanghui Rao
Department of Mathematics and Information Technology, The Education University of Hong Kong, Tai Po, Hong Kong
Haoran **e
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Qing Li
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Jun Zhang
School of Computing and Information Sciences, Caritas Institute of Higher Education, Tseung Kwan O, Hong Kong
Yingchao Zhao
School of Science and Technology, The Open University of Hong Kong, Kowloon, Hong Kong
Fu Lee Wang

Authors

Yuwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
**ngming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yanghui Rao
View author publications
You can also search for this author in PubMed Google Scholar
Haoran **e
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yingchao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fu Lee Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanghui Rao .

Editor information

Editors and Affiliations

Novosibirsk State Technical University, Novosibirsk, Russia
Maxim Bakaev
Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
In-Young Ko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y. et al. (2019). Supervised Group Embedding for Rumor Detection in Social Media. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-19274-7_11
Published: 26 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19273-0
Online ISBN: 978-3-030-19274-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics