Abstract
Users of the Twitter microblogging platform share a considerable amount of information through short messages on a daily basis. Some of these so-called tweets discuss issues related to software and could include information that is relevant to the companies develo** these applications. Such tweets have the potential to help requirements engineers better understand user needs and therefore provide important information for software evolution. However, little is known about the nature of tweets discussing software-related issues. In this paper, we report on the usage characteristics, content and automatic classification potential of tweets about software applications. Our results are based on an exploratory study in which we used descriptive statistics, content analysis, machine learning and lexical sentiment analysis to explore a dataset of 10,986,495 tweets about 30 different software applications. Our results show that searching for relevant information on software applications within the vast stream of tweets can be compared to looking for a needle in a haystack. However, this relevant information can provide valuable input for software companies and support the continuous evolution of the applications discussed in these tweets. Furthermore, our results show that it is possible to use machine learning and lexical sentiment analysis techniques to automatically extract information about the tweets regarding their relevance, authors and sentiment polarity.
Similar content being viewed by others
Notes
The search query is: tweepy.Cursor (api.search, q\(\,=\,\)’APP_NAME -filter:retweets’, lang\(\,=\,\)’en’), where APP_NAME is the name of the software application. We ran the query iteratively (every 7–9 days) for each software application in our dataset.
We consider tweets that do not belong to the unrelated, unclear or noise categories as fulfilling these criteria. A definition of each category can be found in Table 3.
We follow Twitter’s suit and count each link as 23 characters and do not consider media and photographs for the count.
Previous research [35] found that 75% of the re-tweets occur less than a day after the concerned tweet has been posted. Thus, we consider that the number of re-tweets and likes is in most cases complete due to them being collected after the tweet has been present for at most 7–9 days (because of the Twitter API restrictions described in Sect. 2.2).
This includes all different hashtag combinations of the software application name. For example, for Adobe Photoshop: #AdobePhotoshop, #Adobe #Photoshop, etc.
Windows 7 and Windows 10 share the common Twitter account @Windows.
We do not count the categories unrelated, unclear, noise and other in this final count.
We performed stratification during our tenfold cross-validation.
http://meka.sourceforge.net, default configuration. Details in "Appendix".
The name that uniquely identifies the user in Twitter; it appears after @ sign (e.g., @twitterapi).
Referred as friends by Twitter.
We performed stratification during our tenfold cross-validation.
http://www.cs.waikato.ac.nz/ml/weka, default configuration. Details in "Appendix".
The Android and Windows stores allow for bidirectional communication. However, the Apple and Blackberry stores do not.
References
Twitter API Rate Limits. https://dev.twitter.com/rest/public/rate-limiting. Accessed 04 April 2017
Achananuparp P, Lubis IN, Tian Y, Lo D, Lim EP (2012) Observatory of Trends in Software Related Microblogs. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering, ASE’12, pp 334–337
Almaatouq A, Alabdulkareem A, Nouh M, Shmueli E, Alsaleh M, Singh VK, Alarifi A, Alfaris A, Pentland AS (2014) Twitter: Who Gets Caught? Observed trends in social micro-blogging Spam. In: Proceedings of the 2014 ACM conference on Web Science, WebSci’14, pp 33–41
Almaatouq A, Shmueli E, Nouh M, Alabdulkareem A, Singh VK, Alsaleh M, Alarifi A, Alfaris A, Pentland AS (2016) If it looks like a spammer and behaves like a spammer, it must be a spammer: analysis and detection of microblogging spam accounts. Int J Inf Secur 15(5):475–491
Alsaleh M, Alarifi A, Al-Salman AM, Alfayez M, Almuhaysin A (2014) TSD: detecting sybil accounts in Twitter. In: Proceedings of the 13th international conference on machine learning and applications, pp 463–469
Asur S, Huberman BA (2010) Predicting the future with social media. Proc IEEE/WIC/ACM Int Conf Web Intell Intell Agent Technol 1:492–499
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on Twitter. In: Proceedings of collaboration, electronic messaging, anti-abuse and spam conference, CEAS’10
Berry DM (2017) Evaluation of tools for hairy requirements engineering and software engineering tasks. In: Technical report, University of Waterloo
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
Bougie G, Starke J, Storey MA, German DM (2011) Towards understanding Twitter use in software engineering: preliminary findings, ongoing challenges and future questions. In: Proceedings of the 2nd international workshop on Web 2.0 for software engineering, Web2SE’11. ACM, pp 31–36
Chen N, Lin J, Hoi SC, **ao X, Zhang B (2014) AR-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering, ICSE’14, pp 767–778
Cheng Z, Caverlee J, Lee K (2010) You are where you Tweet: a content-based approach to geo-locating Twitter users. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM’10, pp 759–768
Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: Are you a human, bot, or cyborg? IEEE Trans Dependable Secure Comput 9(6):811–824
Davis CA, Varol O, Ferrara E, Flammini A, Menczer F (2016) BotOrNot: a system to evaluate social bots. In: Proceedings of the 25th international conference companion on world wide web, WWW’16 Companion, pp 273–274
De Choudhury M, Counts S (2013) Understanding affect in the workplace via social media. In: Proceedings of the 2013 conference on computer supported cooperative work, CSCW’13, pp 303–316
Di Sorbo A, Panichella S, Alexandru CV, Shimagaki J, Visaggio CA, Canfora G, Gall HC (2016) What would users change in my app? Summarizing app reviews for recommending software changes. In: Proceedings of the international symposium on foundations of software engineering, pp 499–510
Dietterich IG (2000) Ensemble methods in machine learning. In: Proceedings of the international workshop on multiple classifier systems, pp 1–15
Galvis Carreño LV, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: Proceedings of the 2013 international conference on software engineering, ICSE’13, pp 582–591
Grier C, Thomas K, Paxson V, Zhang M (2010) @Spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on computer and communications security, CCS’10, pp 27–37
Groen EC, Doerr J, Adam S (2015) Towards crowd-based requirements engineering a research preview. In: Requirements engineering: foundation for software quality. Springer, pp 247–253.
Gu X, Kim S (2015) “What Parts of Your Apps are Loved by Users?” (T). In: Proceedings of the 2015 30th IEEE/ACM international conference on automated software engineering, ASE’15, pp 760–770
Guzman E, Alkadhi R, Seyff N (2016) A needle in a haystack: What do Twitter users say about software? In: Proceedings of the international conference on requirements engineering, RE’16
Guzman E, Aly O, Bruegge B (2015) Retrieving diverse opinions from app reviews. In: Proceedings of the 2015 ACM/IEEE international symposium on empirical software engineering and measurement, ESEM’15, pp 1–10
Guzman E, Azócar D, Li Y (2014) Sentiment analysis of commit comments in GitHub: an empirical study. In: Proceedings of the 11th working conference on mining software repositories, MSR’14, pp 352–355. ACM, 2014
Guzman E, Bruegge B (2013) Towards emotional awareness in software development teams. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE’13, pp 671–674
Guzman E, El-Haliby M, Bruegge B (2015) Ensemble methods for app review classification: an approach for software evolution (N). In: Proceedings of the 2015 30th IEEE/ACM international conference on automated software engineering, ASE’15, pp 771–776
Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: Proceedings of the 2014 IEEE 22nd international requirements engineering conference, RE’14, pp 153–162
Hong L, Dan O, Davison BD (2011) Predicting popular messages in Twitter. In: Proceedings of the 20th international conference companion on world wide web, WWW ’11, pp 57–58
Hoon L, Vasa R, Schneider JG, Grundy J et al (2013) An analysis of the mobile app review landscape: trends and implications. Faculty of information and communication technologies, Swinburne University of Technology, Technical report
Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the working conference on mining software repositories, MSR’13, pp 41–44
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: Tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188
Johann T, Maalej W (2015) Democratic mass participation of users in requirements engineering? In: Proceedings of the 2015 IEEE 23rd international requirements engineering conference, RE’15, pp 256–261
Kucuktunc O, Cambazoglu BB, Weber I, Ferhatosmanoglu H (2012) A large-scale sentiment analysis for Yahoo! answers. In: Proceedings of the 5th ACM international conference on web search and data mining, WSDM ’12, pp 633–642
Kukreja N, Boehm B (2012) Process implications of social networking-based requirements negotiation tools. In: Proceedings of the international conference on software and system process, ICSSP’12, pp 68–72
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web, WWW’10, pp 591–600
Lim SL, Damian D, Finkelstein A (2011) StakeSource2. 0: using social networks of stakeholders to identify and prioritise requirements. In: Proceedings of the 33rd international conference on software engineering, ICSE’11, pp 1022–1024
Lim SL, Finkelstein A (2012) StakeRare: using social networks and collaborative filtering for large-scale requirements elicitation. IEEE Trans Softw Eng 38(3):707–735
Lim SL, Quercia D, Finkelstein A (2010) StakeNet: using social networks to analyse the stakeholders of large-scale software projects. In: Proceedings of the 32Nd ACM/IEEE international conference on software engineering, ICSE’10, pp 295–304
Luaces O, Díez J, Barranquero J, del Coz JJ, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Prog Artif Intell 1(4):303–313
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Proceedings of the IEEE 23rd international requirements engineering conference, RE’15, pp 116–125
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016) A survey of app store analysis for software engineering. In: IEEE transactions on software engineering
Martinez-Romo J, Araujo L (2013) Detecting malicious Tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8):2992–3000
McCord M, Chuah M (2011) Spam detection on Twitter using traditional classifiers. In: Proceedings of the 8th international conference on autonomic and trusted computing, ATC’11, pp 175–186
Mitchell TM (1997) Machine Learning, volume 4 of McGraw-Hill Series in Computer Science. McGraw-Hill, New York
Neuendorf K (2002) The content analysis guidebook. Sage Publications, Thousand Oaks
Novielli N, Calefato F, Lanubile F (2014) Towards discovering the role of emotions in stack overflow. In: Proceedings of the 6th international workshop on social software engineering, SSE’14, pp 33–36
Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R (2015) Are bullies more productive? Empirical study of affectiveness vs. issue fixing time. In: Proceedings of the 12th working conference on mining software repositories, MSR’15, pp 303–313
Pagano D, Bruegge B (2013) User involvement in software evolution practice : a case study. In: Proceedings of the international conference on software engineering, ICSE ’13, 2013
Pagano D, Maalej W (2013) User feedback in the appstore: an empirical study. In: Proceedings of the 21st IEEE international requirements engineering conference, RE’13, pp 125–134
Palomba F, Linares-Vásquez M, Bavota G, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2015) User reviews matter! tracking crowdsourced reviews to support evolution of successful apps. In: Proceedings of the IEEE international conference on software maintenance and evolution, ICSME’15, pp 291–300
Panichella S, Di Sorbo A, Guzman E, Visaggio C, Canfora G, Gall H (2015) How can i improve my app? Classifying user reviews for software maintenance and evolution. In: Proceedings of the 31st international conference on software maintenance and evolution, ICSME’15, pp 281–290
Pennebaker J, Chung C, Ireland M (2007) The development and psychological properties of LIWC2007
Pfitzner R, Garas A, Schweitzer E (2012) Emotional divergence influences information spreading in Twitter. In: Proceedings of the 6th international AAAI conference on weblogs and social media, ICWSM’12
Prasetyo PK, Lo D, Achananuparp P, Tian Y, Lim EP (2012) Automatic classification of software related microblogs. In: Proceedings of the 28th IEEE international conference on software maintenance, ICSM’12, pp 596–599
Seyff N, Ollmann G, Bortenschlager M (2014) AppEcho: a user-driven, in situ feedback approach for mobile platforms and applications. In: Proceedings of the 1st international conference on mobile software engineering and systems, MOBILESoft’14, pp 99–108
Seyff N, Todoran I, Caluser K, Singer L, Glinz M (2015) Using popular social network sites to support requirements elicitation, prioritization and negotiation. J Internet Serv Appl 6(1):1–16
Sharma A, Tian Y, Lo D (2015) NIRMAL: automatic identification of software relevant Tweets leveraging language model. In: Proceedings of the IEEE 22nd international conference on software analysis, evolution and reengineering, SANER’15, pp 449–458
Sharma A, Tian Y, Lo D (2015) What’s hot in software engineering Twitter space? In: Proceedings of the IEEE international conference on software maintenance and evolution, ICSME’15. IEEE, pp 541–545
Singer L, Figueira Filho F, Storey MA (2014) Software engineering at the speed of light: how developers stay current using Twitter. In: Proceedings of the 36th international conference on software engineering, ICSE’14, pp 211–221
Singh M, Bansal D, Sofat S (2014) Detecting malicious users in Twitter using classifiers. In: Proceedings of the 7th international conference on security of information and networks, SIN’14, pp 247–253
Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in Twitter to improve information filtering. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval, SIGIR’10, pp 841–842
Stieglitz S, Dang-Xuan L (2013) Emotions and information diffusion in social media—sentiment of microblogs and sharing behavior. J Manag Inf Syst 29(4):217–248
Suh B, Hong L, Pirolli P, Chi EH (2010) Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In: Proceedings of the IEEE 2nd international conference on social computing, SocialCom’10, pp 177–184
Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in Twitter events. J Am Soc Inf Sci Technol 62(2):406–418
Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558
Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, IMC’11, pp 243–258
Tian Y, Achananuparp P, Lubis IN, Lo D, Lim EP (2012) What does software engineering community microblog about? In: Proceedings of the 9th IEEE working conference on mining software repositories, MSR’12, pp 247–250
Tian Y, Lo D (2014) An exploratory study on software microblogger behaviors. In: Proceedings of the IEEE 4th workshop on mining unstructured data, MUD’14. IEEE, pp 1–5
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3:1–13
Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: What 140 characters reveal about political sentiment. In: Proceedings of the 4th international AAAI conference on weblogs and social media, ICWSM’10, pp 178–185
Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: Proceedings of the 38th international conference on software engineering, ICSE’16, pp 14–24
Wehrmaker T, Gärtner S, Schneider K (2012) Contexter feedback system. In: Proceedings of the 34th international conference on software engineering, ICSE’12, pp 1459–1460
Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP on interactive demonstrations, HLT-Demo’05, pp 34–35
Yang T, Lee D, Yan S (2013) Steeler Nation, 12th man, and boo birds: classifying Twitter user interests using time series. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM’13, pp 684–691
Acknowledgements
We thank Martin Glinz, Dustin Wüest, Melanie Stade, Bernd Brügge, Eya Ben Charrada, Kim Lauenroth, Marjo Kauppinen and Fabiano Dalpiaz for their feedback and discussions. This work was partially supported by the European Commission within the SUPERSEDE project (ID 644018) and a PhD scholarship provided by King Saud University for the second author.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Weka configurations
In our work, we applied the default configurations of the classification algorithms as set by Weka.Footnote 21
1.1.1 Naive Bayes
weka.classifiers.bayes.NaiveBayes Class for a Naive Bayes classifier using estimator classes.
-
NumDecimalPlaces \(\,=\,\)2. The number of decimal places to be used for the output of numbers in the model.
-
DoNotCheckCapabilities \(\,=\,\)False. If set, the classifier capabilities are not checked before the classifier is built.
-
UsekernelEstimator \(\,=\,\) False. If set, kernel density estimator is used rather than normal distribution for numeric attributes.
-
UseSupervisedDiscretization \(\,=\,\)False. If set, supervised discretization is to be used to convert numeric attributes to nominal ones.
1.1.2 Multinomial Naive Bayes
weka.classifiers.bayes.NaiveBayesMultinomial: Class for building and using a multinomial Naive Bayes classifier.
-
NumDecimalPlaces \(\,=\,\)2. The number of decimal places to be used for the output of numbers in the model.
-
DoNotCheckCapabilities \(\,=\,\)False. If set, the classifier capabilities are not checked before the classifier is built.
1.1.3 SVM
weka.classifiers.functions.SMO: Class that implements John Platt’s sequential minimal optimization algorithm for training a support vector classifier.
-
BuildCalibrationModels \(\,=\,\) False. This option is used to fit calibration models to the outputs of the support vector machine.
-
C \(\,=\,\)1. The complexity constant.
-
Epsilon \(=1.0e-12\). The epsilon for round-off error.
-
FilterType \(\,=\,\) Normalize training data.
-
Kernel \(\,=\,\)The kernel to use. weka.classifiers.functions. supportVector.PolyKernel. The polynomial kernel: K(x, y) = \(\langle x, y \rangle ^p\,or\,K(x, y)\) = \((\langle x, y \rangle +1)^p\) with exponent\(\,=\,\) 1.
-
RandomSeed \(\,=\,\)1. The random number seed for the cross-validation.
-
toleranceParameter \(\,=\,\)0.001. The tolerance parameter.
-
NumDecimalPlaces \(\,=\,\)2. The number of decimal places to be used for the output of numbers in the model.
-
DoNotCheckCapabilities \(\,=\,\)False. If set, the classifier capabilities are not checked before the classifier is built.
1.1.4 J48
weka.classifiers.functions.J48: Class for generating a pruned or unpruned C4.5 decision tree.
-
Binarysplits \(\,=\,\) False. If set, binary splits are used on nominal attributes when building the trees.
-
CollapseTree \(\,=\,\)True. If set, parts are removed that do not reduce training error.
-
ConfidenceFactor \(\,=\,\)0.25. The confidence threshold for pruning.
-
DoNotMakeSplitPointActualValue \(\,=\,\) False. If set, the true point is not relocated to an actual data value.
-
minNumObj \(\,=\,\)2. The minimum number of instances per leaf.
-
NumDecimalPlaces \(\,=\,\)2. The number of decimal places to be used for the output of numbers in the model.
-
NumFolds \(\,=\,\)3. The number of folds for reduced error pruning. One fold is used as pruning set.
-
reducedErrorPruning \(\,=\,\)False. If set, reduced error pruning is used instead of C.
-
SubtreeRaising \(\,=\,\) True. If set, subtree raising is used when pruning.
-
Unpruned \(\,=\,\)False. If set, pruning is performed.
-
UseLaplace \(\,=\,\)False. If set, Laplace smoothing is used for predicted probabilities.
-
UseMDLcorrection \(\,=\,\)True. If set, MDL correction is used when finding splits on numeric attributes.
1.1.5 Random forest
weka.classifiers.trees.RandomForest: Class for constructing a forest of random trees.
-
BagSizePercent \(\,=\,\)100. Size of each bag, as a percentage of the training set size.
-
BreakTiesRandomly \(\,=\,\)False. If set, break ties randomly when several attributes look equally good.
-
CalcOutOfBag \(\,=\,\)False. Whether to calculate the out-of-bag error.
-
MaxDepth \(\,=\,\)0. The maximum depth of the tree, 0 for unlimited.
-
NumDecimalPlaces \(\,=\,\)2. The number of decimal places to be used for the output of numbers in the model.
-
NumExecutionSlots \(\,=\,\)1. Number of execution slots.
-
NumFeatures \(\,=\,\)0. The number of features used in random selection.
-
NumIterations \(\,=\,\)100. The number of iterations to be performed.
-
Seed \(\,=\,\)1. The random number seed to be used.
1.2 Meka configurations
In our work, we applied the default configurations of the multi-label classification methods as set by Meka.Footnote 22 For the base classifiers, we applied the same setting as in Weka.
1.2.1 Binary relevance method
meka.classifiers.multilabel.BR: Class for implementing binary relevance method.
1.2.2 Label powerset method
meka.classifiers.multilabel.LP: Class for implementing label powerset (LP) method.
Rights and permissions
About this article
Cite this article
Guzman, E., Alkadhi, R. & Seyff, N. An exploratory study of Twitter messages about software applications. Requirements Eng 22, 387–412 (2017). https://doi.org/10.1007/s00766-017-0274-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-017-0274-x