Abstract
Given the retweeting activity for the posts of several Twitter users, how can we distinguish organic activity from spammy retweets by paid followers to boost a post’s appearance of popularity? More generally, given groups of observations, can we spot strange groups? Our main intuition is that organic behavior has more variability, while fraudulent behavior, like retweets by botnet members, is more synchronized. We refer to the detection of such synchronized observations as the Synchonization Fraud problem, and we study a specific instance of it, Retweet Fraud Detection, manifested in Twitter. Here, we propose: (A) ND-Sync, an efficient method for detecting group fraud, and (B) a set of carefully designed features for characterizing retweet threads. ND-Sync is effective in spotting retweet fraudsters, robust to different types of abnormal activity, and adaptable as it can easily incorporate additional features. Our method achieves a 97% accuracy on a real dataset of 12 million retweets crawled from Twitter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Almaatouq, A., et al.: Twitter: who gets caught? observed trends in social micro-blogging spam. In: WebSci, pp. 33–41. ACM (2014)
Beutel, A., et al.: CopyCatch: stop** group attacks by spotting lockstep behavior in social networks. In: WWW, pp. 119–130. ACM (2013)
Breunig, M., et al.: LOF: identifying density-based local outliers. In: Proc. ACM SIGMOD Conf. 2000, pp. 93–104 (2000)
Brys, G., et al.: A Robust Measure of Skewness. Journal of Computational and Graphical Statistics 13, 996–1017 (2004)
Chan, P. K., et al.:Modeling multiple time series for anomaly detection. In: ICDM, pp. 90–97. IEEE Computer Society (2005)
Chandola, V., et al.: Anomaly Detection: A Survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)
Chu, Z., et al.: Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? IEEE Trans. Dependable Secur. Comput. 9(6), 811–824 (2012)
Cook, D., et al.: Twitter Deception and Influence: Issues of Identity, Slacktivism, and Puppetry. Journal of Information Warfare 13(1), 58–71 (2014)
Freitas, C.A., et al.: Reverse Engineering Socialbot Infiltration Strategies in Twitter. Ar**v e-prints (2014)
Garrett, R.G.: The Chi-square Plot: a Tool for Multivariate Outlier Recognition. Journal of Geochemical Exploration 32, 319–341 (1989)
Ghosh, R., et al.: Entropy-based classification of ‘Retweeting’ activity on twitter. In: KDD Workshop on Social Network Analysis (SNA-KDD) (2011)
Ghoting, A., et al.: Fast mining of distance-based outliers in high-dimensional datasets. Data Mining and Knowledge Discovery 16(3), 349–364 (2008)
Hazel, G.: Multivariate Gaussian MRF for Multispectral Scene Segmentation and Anomaly Detection. IEEE Transactions on Geoscience and Remote Sensing 38(3), 1199–1211 (2000)
Hubert, M., et al.: ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 47, 64–79 (2005)
Hubert, M., et al.: Robust PCA for Skewed Data and its Outlier Map. Computational Statistics & Data Analysis 53(6), 2264–2274 (2009)
Jiang, M., et al.: CatchSync: catching synchronized behavior in large directed graphs. In: KDD, pp. 941–950. ACM (2014)
Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Inferring strange behavior from connectivity pattern in social networks. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS, vol. 8443, pp. 126–138. Springer, Heidelberg (2014)
Jolliffe, I.T.: Discarding Variables in a Principal Component Analysis. II: Real Data. Journal of the Royal Statistical Society. Series C (Applied Statistics) 22(1), 21–31 (1973)
Noble, C.C., et al.: Graph-based anomaly detection. In: KDD (2003)
Papadimitriou, S., et al.: LOCI: Fast outlier detection using the local correlation integral. In: ICDE 2003 (2003)
Shah, N., et al.: Spotting suspicious link behavior with fBox: an adversarial perspective. In: ICDM (2014)
Stringhini, G., et al.: Follow the green: growth and dynamics in twitter follower markets. In: IMC, pp. 163–176. ACM (2013)
Tavares, G., et al.: Scaling-Laws of Human Broadcast Communication Enable Distinction between Human, Corporate and Robot Twitter Users. PLoS ONE 8(7), e65774 (2013)
Twitter Inc. S-1 Filing, US Securities and Exchange Commission (2013). http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm
**ong, L., et al.: Group Anomaly Detection using Flexible Genre Models. Advances in Neural Information Processing Systems 24, 1071–1079 (2011)
**ong, L., et al.: Efficient learning on point sets. In: ICDM, pp. 847–856 (2013)
Yang, C., et al.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: WWW, pp. 71–80 (2012)
Yu, R., et al.: GLAD: group anomaly detection in social media analysis. In: KDD, pp. 372–381. ACM (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Giatsoglou, M., Chatzakou, D., Shah, N., Beutel, A., Faloutsos, C., Vakali, A. (2015). ND-Sync: Detecting Synchronized Fraud Activities. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)