Abstract
The flooding of online scholarly articles necessitates the automated organization of documents according to their most descriptive attributes. In this paper, an agglomerative similarity measure based on common features associated with research articles, such as number of references, authors, citations, and contents are used for automated clustering of scholarly articles. The agglomerative similarity matrix is based on a combination of citation matrix, author matrix, and the content matrix for feature vector representation. The experiments are performed on agglomerative feature vector derived from wiki20 dataset using different unsupervised learning algorithms such as K-Means, K-medoids, and Fuzzy C-means. The clustering result obtained with modified feature vector is compared to the existing bag of words model using separation and cohesion as performance metrics. The Dunn’s index is used for finding the optimal number of clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and TechnIques. Elsevier (2011)
Kaufman, L., Rousseeuw, P.J.: Clustering large applications (Program CLARA). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 126–163 (2008)
Wang, X., Zhao, Y., Liu, R., Zhang, J.: Knowledge-transfer analysis based on co-citation clustering. Scientometrics 97, 859–869 (2013)
Aljaber, B., Stokes, N., Bailey, J., Pei, J.: Document clustering of scientific texts using citation contexts. Inf. Retr. 13, 101–131 (2010)
Sun, X.: Textual document clustering using topic models. In: 10th International Conference on Semantics, Knowledge and Grids (SKG), pp. 1–4 (2014)
Nakazawa, R., Itoh, T., Saito, T.: A visualization of research papers based on the topics and citation network. In: 19th International Conference on Information Visualisation (iV), pp. 283–289 (2015)
Shubankar, K., Singh, A., Pudi, V.: A frequent keyword-set based algorithm for topic modeling and clustering of research papers. In: 3rd Conference on Data Mining and Optimization (DMO), pp. 96–102 (2011)
Gao, T., Du, J., Wang, S., Chen, L.: Topic detection for emergency events based on FCM document clustering. In: 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), pp. 1181–1185 (2010)
Kummamuru, K., Dhawale, A., Krishnapuram, R.: Fuzzy co-clustering of documents and keywords. In: The 12th IEEE International Conference onFuzzy Systems, 2003(FUZZ’03), pp. 772–777 (2003)
Win, T.T., Mon, L.: Document clustering by fuzzy c-mean algorithm. In: 2nd International Conference on Advanced Computer Control (ICACC), pp. 239–242 (2010)
Mishra, R.K., Saini, K., Bagri, S.: Text document clustering on the basis of inter passage approach by using K-means. In: International Conference on Computing, Communication & Automation (ICCCA), pp. 110–113 (2015)
Chang, H.-C., Hsu, C.-C., Deng, Y.-W.: Unsupervised document clustering based on keyword clusters. In: IEEE International Symposium on Communications and Information Technology (ISCIT 2004), pp. 1198–1203 (2004)
Chim, H., Deng, X.: Efficient phrase-based document similarity for clustering. IEEE Trans. Knowl. Data Eng. 20, 1217–1229 (2008)
Matei, L.S., Trăuşan-Matu, Ş.: Document clustering based on time series. In: 19th International Conference on System Theory, Control and Computing (ICSTCC 2015), pp. 128–133 (2015)
Porter, M.F.: An algorithm for suffix strip**. Program 14, 130–137 (1980)
Ramasubramanian, C., Ramya, R.: Effective pre-processing activities in text mining using improved porter’s stemming algorithm. Int. J. Adv. Res. Comput. Commun. Eng. 2, 4536–4538 (2013)
Sisodia, D.S., Verma, S., Vyas, O.P.: A discounted fuzzy relational clustering of web users’ using intuitive augmented sessions dissimilarity metric. IEEE Access. 4, 6883–6893 (2016)
Sisodia, D.S., Verma, S., Vyas, O.P.: Augmented intuitive dissimilarity metric for clustering of Web user sessions. J. Inf. Sci. 43, 480–491 (2016)
Ben-Gal, I.: Outlier detection. Data Mining and Knowledge Discovery Handbook, pp.131–146 (2005)
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its use in Detecting Compact Well-Separated Clusters (1973)
Sisodia, D.S., Verma, S., Vyas, O.P.: Performance evaluation of an augmented session dissimilarity matrix of web user sessions using relational fuzzy C-means clustering. Int. J. Appl. Eng. Res. 11, 6497–6503 (2016)
Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with Wikipedia. In: Proceedings of the AAAI WikiAI workshop, pp. 19–24 (2008)
Medelyan, O.: Human-Competitive Automatic Topic Indexing (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sisodia, D.S., Choudhary, M., Vandana, T., Rai, R. (2019). Agglomerative Similarity Measure Based Automated Clustering of Scholarly Articles. In: Tanveer, M., Pachori, R. (eds) Machine Intelligence and Signal Analysis. Advances in Intelligent Systems and Computing, vol 748. Springer, Singapore. https://doi.org/10.1007/978-981-13-0923-6_46
Download citation
DOI: https://doi.org/10.1007/978-981-13-0923-6_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0922-9
Online ISBN: 978-981-13-0923-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)