Agglomerative Similarity Measure Based Automated Clustering of Scholarly Articles

  • Conference paper
  • First Online:
Machine Intelligence and Signal Analysis

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 748))

Abstract

The flooding of online scholarly articles necessitates the automated organization of documents according to their most descriptive attributes. In this paper, an agglomerative similarity measure based on common features associated with research articles, such as number of references, authors, citations, and contents are used for automated clustering of scholarly articles. The agglomerative similarity matrix is based on a combination of citation matrix, author matrix, and the content matrix for feature vector representation. The experiments are performed on agglomerative feature vector derived from wiki20 dataset using different unsupervised learning algorithms such as K-Means, K-medoids, and Fuzzy C-means. The clustering result obtained with modified feature vector is compared to the existing bag of words model using separation and cohesion as performance metrics. The Dunn’s index is used for finding the optimal number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 160.49
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 213.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and TechnIques. Elsevier (2011)

    Google Scholar 

  2. Kaufman, L., Rousseeuw, P.J.: Clustering large applications (Program CLARA). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 126–163 (2008)

    Google Scholar 

  3. Wang, X., Zhao, Y., Liu, R., Zhang, J.: Knowledge-transfer analysis based on co-citation clustering. Scientometrics 97, 859–869 (2013)

    Article  Google Scholar 

  4. Aljaber, B., Stokes, N., Bailey, J., Pei, J.: Document clustering of scientific texts using citation contexts. Inf. Retr. 13, 101–131 (2010)

    Article  Google Scholar 

  5. Sun, X.: Textual document clustering using topic models. In: 10th International Conference on Semantics, Knowledge and Grids (SKG), pp. 1–4 (2014)

    Google Scholar 

  6. Nakazawa, R., Itoh, T., Saito, T.: A visualization of research papers based on the topics and citation network. In: 19th International Conference on Information Visualisation (iV), pp. 283–289 (2015)

    Google Scholar 

  7. Shubankar, K., Singh, A., Pudi, V.: A frequent keyword-set based algorithm for topic modeling and clustering of research papers. In: 3rd Conference on Data Mining and Optimization (DMO), pp. 96–102 (2011)

    Google Scholar 

  8. Gao, T., Du, J., Wang, S., Chen, L.: Topic detection for emergency events based on FCM document clustering. In: 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), pp. 1181–1185 (2010)

    Google Scholar 

  9. Kummamuru, K., Dhawale, A., Krishnapuram, R.: Fuzzy co-clustering of documents and keywords. In: The 12th IEEE International Conference onFuzzy Systems, 2003(FUZZ’03), pp. 772–777 (2003)

    Google Scholar 

  10. Win, T.T., Mon, L.: Document clustering by fuzzy c-mean algorithm. In: 2nd International Conference on Advanced Computer Control (ICACC), pp. 239–242 (2010)

    Google Scholar 

  11. Mishra, R.K., Saini, K., Bagri, S.: Text document clustering on the basis of inter passage approach by using K-means. In: International Conference on Computing, Communication & Automation (ICCCA), pp. 110–113 (2015)

    Google Scholar 

  12. Chang, H.-C., Hsu, C.-C., Deng, Y.-W.: Unsupervised document clustering based on keyword clusters. In: IEEE International Symposium on Communications and Information Technology (ISCIT 2004), pp. 1198–1203 (2004)

    Google Scholar 

  13. Chim, H., Deng, X.: Efficient phrase-based document similarity for clustering. IEEE Trans. Knowl. Data Eng. 20, 1217–1229 (2008)

    Article  Google Scholar 

  14. Matei, L.S., Trăuşan-Matu, Ş.: Document clustering based on time series. In: 19th International Conference on System Theory, Control and Computing (ICSTCC 2015), pp. 128–133 (2015)

    Google Scholar 

  15. Porter, M.F.: An algorithm for suffix strip**. Program 14, 130–137 (1980)

    Article  Google Scholar 

  16. Ramasubramanian, C., Ramya, R.: Effective pre-processing activities in text mining using improved porter’s stemming algorithm. Int. J. Adv. Res. Comput. Commun. Eng. 2, 4536–4538 (2013)

    Google Scholar 

  17. Sisodia, D.S., Verma, S., Vyas, O.P.: A discounted fuzzy relational clustering of web users’ using intuitive augmented sessions dissimilarity metric. IEEE Access. 4, 6883–6893 (2016)

    Article  Google Scholar 

  18. Sisodia, D.S., Verma, S., Vyas, O.P.: Augmented intuitive dissimilarity metric for clustering of Web user sessions. J. Inf. Sci. 43, 480–491 (2016)

    Article  Google Scholar 

  19. Ben-Gal, I.: Outlier detection. Data Mining and Knowledge Discovery Handbook, pp.131–146 (2005)

    Google Scholar 

  20. Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its use in Detecting Compact Well-Separated Clusters (1973)

    Google Scholar 

  21. Sisodia, D.S., Verma, S., Vyas, O.P.: Performance evaluation of an augmented session dissimilarity matrix of web user sessions using relational fuzzy C-means clustering. Int. J. Appl. Eng. Res. 11, 6497–6503 (2016)

    Google Scholar 

  22. Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with Wikipedia. In: Proceedings of the AAAI WikiAI workshop, pp. 19–24 (2008)

    Google Scholar 

  23. Medelyan, O.: Human-Competitive Automatic Topic Indexing (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dilip Singh Sisodia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sisodia, D.S., Choudhary, M., Vandana, T., Rai, R. (2019). Agglomerative Similarity Measure Based Automated Clustering of Scholarly Articles. In: Tanveer, M., Pachori, R. (eds) Machine Intelligence and Signal Analysis. Advances in Intelligent Systems and Computing, vol 748. Springer, Singapore. https://doi.org/10.1007/978-981-13-0923-6_46

Download citation

Publish with us

Policies and ethics

Navigation