Agglomerative Similarity Measure Based Automated Clustering of Scholarly Articles

Sisodia, Dilip Singh; Choudhary, Manjula; Vandana, Tummala; Rai, Rishi

doi:10.1007/978-981-13-0923-6_46

Dilip Singh Sisodia¹⁶,
Manjula Choudhary¹⁶,
Tummala Vandana¹⁶ &
…
Rishi Rai¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 748))

1271 Accesses
2 Citations

Abstract

The flooding of online scholarly articles necessitates the automated organization of documents according to their most descriptive attributes. In this paper, an agglomerative similarity measure based on common features associated with research articles, such as number of references, authors, citations, and contents are used for automated clustering of scholarly articles. The agglomerative similarity matrix is based on a combination of citation matrix, author matrix, and the content matrix for feature vector representation. The experiments are performed on agglomerative feature vector derived from wiki20 dataset using different unsupervised learning algorithms such as K-Means, K-medoids, and Fuzzy C-means. The clustering result obtained with modified feature vector is compared to the existing bag of words model using separation and cohesion as performance metrics. The Dunn’s index is used for finding the optimal number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 160.49; Price includes VAT (Germany)

Softcover Book: EUR 213.99; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Clustering articles based on semantic similarity

Article 27 February 2017

The Implementation of Enhanced K-Strange Points Clustering Method in Classifying Undergraduate Thesis Titles

Combining semantic and term frequency similarities for text clustering

Article 02 January 2019

References

Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and TechnIques. Elsevier (2011)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Clustering large applications (Program CLARA). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 126–163 (2008)
Google Scholar
Wang, X., Zhao, Y., Liu, R., Zhang, J.: Knowledge-transfer analysis based on co-citation clustering. Scientometrics 97, 859–869 (2013)
Article Google Scholar
Aljaber, B., Stokes, N., Bailey, J., Pei, J.: Document clustering of scientific texts using citation contexts. Inf. Retr. 13, 101–131 (2010)
Article Google Scholar
Sun, X.: Textual document clustering using topic models. In: 10th International Conference on Semantics, Knowledge and Grids (SKG), pp. 1–4 (2014)
Google Scholar
Nakazawa, R., Itoh, T., Saito, T.: A visualization of research papers based on the topics and citation network. In: 19th International Conference on Information Visualisation (iV), pp. 283–289 (2015)
Google Scholar
Shubankar, K., Singh, A., Pudi, V.: A frequent keyword-set based algorithm for topic modeling and clustering of research papers. In: 3rd Conference on Data Mining and Optimization (DMO), pp. 96–102 (2011)
Google Scholar
Gao, T., Du, J., Wang, S., Chen, L.: Topic detection for emergency events based on FCM document clustering. In: 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), pp. 1181–1185 (2010)
Google Scholar
Kummamuru, K., Dhawale, A., Krishnapuram, R.: Fuzzy co-clustering of documents and keywords. In: The 12th IEEE International Conference onFuzzy Systems, 2003(FUZZ’03), pp. 772–777 (2003)
Google Scholar
Win, T.T., Mon, L.: Document clustering by fuzzy c-mean algorithm. In: 2nd International Conference on Advanced Computer Control (ICACC), pp. 239–242 (2010)
Google Scholar
Mishra, R.K., Saini, K., Bagri, S.: Text document clustering on the basis of inter passage approach by using K-means. In: International Conference on Computing, Communication & Automation (ICCCA), pp. 110–113 (2015)
Google Scholar
Chang, H.-C., Hsu, C.-C., Deng, Y.-W.: Unsupervised document clustering based on keyword clusters. In: IEEE International Symposium on Communications and Information Technology (ISCIT 2004), pp. 1198–1203 (2004)
Google Scholar
Chim, H., Deng, X.: Efficient phrase-based document similarity for clustering. IEEE Trans. Knowl. Data Eng. 20, 1217–1229 (2008)
Article Google Scholar
Matei, L.S., Trăuşan-Matu, Ş.: Document clustering based on time series. In: 19th International Conference on System Theory, Control and Computing (ICSTCC 2015), pp. 128–133 (2015)
Google Scholar
Porter, M.F.: An algorithm for suffix strip**. Program 14, 130–137 (1980)
Article Google Scholar
Ramasubramanian, C., Ramya, R.: Effective pre-processing activities in text mining using improved porter’s stemming algorithm. Int. J. Adv. Res. Comput. Commun. Eng. 2, 4536–4538 (2013)
Google Scholar
Sisodia, D.S., Verma, S., Vyas, O.P.: A discounted fuzzy relational clustering of web users’ using intuitive augmented sessions dissimilarity metric. IEEE Access. 4, 6883–6893 (2016)
Article Google Scholar
Sisodia, D.S., Verma, S., Vyas, O.P.: Augmented intuitive dissimilarity metric for clustering of Web user sessions. J. Inf. Sci. 43, 480–491 (2016)
Article Google Scholar
Ben-Gal, I.: Outlier detection. Data Mining and Knowledge Discovery Handbook, pp.131–146 (2005)
Google Scholar
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its use in Detecting Compact Well-Separated Clusters (1973)
Google Scholar
Sisodia, D.S., Verma, S., Vyas, O.P.: Performance evaluation of an augmented session dissimilarity matrix of web user sessions using relational fuzzy C-means clustering. Int. J. Appl. Eng. Res. 11, 6497–6503 (2016)
Google Scholar
Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with Wikipedia. In: Proceedings of the AAAI WikiAI workshop, pp. 19–24 (2008)
Google Scholar
Medelyan, O.: Human-Competitive Automatic Topic Indexing (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology Raipur, Raipur, India
Dilip Singh Sisodia, Manjula Choudhary, Tummala Vandana & Rishi Rai

Authors

Dilip Singh Sisodia
View author publications
You can also search for this author in PubMed Google Scholar
Manjula Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Tummala Vandana
View author publications
You can also search for this author in PubMed Google Scholar
Rishi Rai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dilip Singh Sisodia .

Editor information

Editors and Affiliations

Discipline of Mathematics, Indian Institute of Technology Indore, Simrol, Madhya Pradesh, India
M. Tanveer
Discipline of Electrical Engineering, Indian Institute of Technology Indore, Simrol, Madhya Pradesh, India
Ram Bilas Pachori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sisodia, D.S., Choudhary, M., Vandana, T., Rai, R. (2019). Agglomerative Similarity Measure Based Automated Clustering of Scholarly Articles. In: Tanveer, M., Pachori, R. (eds) Machine Intelligence and Signal Analysis. Advances in Intelligent Systems and Computing, vol 748. Springer, Singapore. https://doi.org/10.1007/978-981-13-0923-6_46

Download citation

DOI: https://doi.org/10.1007/978-981-13-0923-6_46
Published: 08 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0922-9
Online ISBN: 978-981-13-0923-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Agglomerative Similarity Measure Based Automated Clustering of Scholarly Articles

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering articles based on semantic similarity

The Implementation of Enhanced K-Strange Points Clustering Method in Classifying Undergraduate Thesis Titles

Combining semantic and term frequency similarities for text clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Agglomerative Similarity Measure Based Automated Clustering of Scholarly Articles

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering articles based on semantic similarity

The Implementation of Enhanced K-Strange Points Clustering Method in Classifying Undergraduate Thesis Titles

Combining semantic and term frequency similarities for text clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation