On Computing Similarity in Academic Literature Data: Methods and Evaluation

Hamedani, Masoud Reyhani; Kim, Sang-Wook

doi:10.1007/978-3-319-11538-2_37

Masoud Reyhani Hamedani²³ &
Sang-Wook Kim²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8597))

Included in the following conference series:

International Conference on Web-Age Information Management

1935 Accesses
1 Citations

Abstract

Similarity computation for academic literature data is one of the interesting topics that have been discussed recently in information retrieval and data mining. Consequently, a variety of methods has been proposed to compute the similarity of scientific papers. In this paper, we present various similarity methods and evaluate their effectiveness via extensive experiments on a real-world dataset of scientific papers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Finding Relatedness between Research Papers Using Similarity and Dissimilarity Scores

Explorations of Cross-Disciplinary Term Similarity

Paper Co-citation Analysis Using Semantic Similarity Measures

Notes

References

Aktolga, A., Ros, I., Assogba, Y.: Detecting outlier sections in US congressional legislation. In: ACM SIGIR, pp. 235–244 (2011)
Google Scholar
Amsler, R.: Application of citation-based automatic classification. Technical report, Texas University (1972)
Google Scholar
Barrón-Cedeño, A., Eiselt, A., Rosso, P.: Monolingual text similarity measures: a comparison of models over Wikipedia articles revisions. In: European Conference on IR Research, pp. 305–319 (2003)
Google Scholar
Chiki, N., Rothenburger, B., Gilles, N.: Combining link and content information for scientific topics discovery. In: ICTAI, pp. 211–214 (2008)
Google Scholar
Fuhr, N.: Probabilistic models in information retrieval. Comput. J. 35(3), 243–255 (1992)
Article MathSciNet MATH Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Jeh, J., Widom, J.: SimRank: a measure of structural-context similarity. In: ACM SIGKDD, pp. 538–543 (2002)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD, pp. 133–142 (2002)
Google Scholar
Kessler, M.: Bibliographic coupling between scientific papers. Am. Doc. J. 14(1), 10–25 (1963)
Article Google Scholar
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Google Scholar
Lv, Y., Zhai, C.: When documents are very long, BM25 fails. In: ACM SIGIR, pp. 1103–1104 (2011)
Google Scholar
Reyhani Hamedani, M., Lee, S., Kim, S.: On combining text-based and link-based similarity measures for scientific papers. In: ACM RACS, pp. 111–115 (2013)
Google Scholar
Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. ACM J. 15(1), 8–36 (1968)
Article MATH Google Scholar
Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24(4), 265–269 (1973)
Article Google Scholar
Sugiyama, K., Kan, M.: Scholarly paper recommendation via user’s recent research interests. In: JCDL, pp. 29–38 (2010)
Google Scholar
Tan, B., Shen, X., Zhai, C.: Mining long-term search history to improve search accuracy. In: ACM SIGKDD, pp. 718–723 (2006)
Google Scholar
Yates, R.B., Neto, B.R.: Modern Information Retrieval. Addison Wesley, Boston (1999)
Google Scholar
Yoon, S., Kim, S., Kim, J.: On computing text-based similarity in scientific literature. In: WWW, pp. 169–170 (2011)
Google Scholar
Zhao, P., Han, H., Yizhou, S.: P-Rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562 (2009)
Google Scholar

Download references

Acknowledgments

This work was supported by (1) Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2013 (Grants No. C0006278), (2) the MSIP (Ministry of Science, ICT, and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-4009) supervised by the NIPA (National IT Industry Promotion Agency), and (3) the Seoul Creative Human Development Program (HM120006).

Author information

Authors and Affiliations

Department of Computer Software, Hanyang Uiversity, Seoul, Korea
Masoud Reyhani Hamedani & Sang-Wook Kim

Authors

Masoud Reyhani Hamedani
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Wook Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masoud Reyhani Hamedani .

Editor information

Editors and Affiliations

DEKE Lab., Renmin University of China, Bei**g, China
Yueguo Chen
Institute for Information Systems, Technical University Braunschweig, Braunschweig, Germany
Wolf-Tilo Balke
Hong Kong Baptist University Dept. Computer Science, Kowloon Tong, Hong Kong SAR
Jianliang Xu
School of Information, Renmin University of China, Bei**g, China
Wei Xu
School of Computer Science and Technology, Hefei, China
Peiquan **
Department of Computer Science, East China Normal University, Shanghai, China
**n Lin
Department of Computer Science, Kean University, Wenzhou, China
Tiffany Tang
School of Electrical Engineering, Korea University, Seoul, Korea, Republic of (South Korea)
Eenjun Hwang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hamedani, M.R., Kim, SW. (2014). On Computing Similarity in Academic Literature Data: Methods and Evaluation. In: Chen, Y., et al. Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science(), vol 8597. Springer, Cham. https://doi.org/10.1007/978-3-319-11538-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-11538-2_37
Published: 10 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11537-5
Online ISBN: 978-3-319-11538-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Computing Similarity in Academic Literature Data: Methods and Evaluation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Finding Relatedness between Research Papers Using Similarity and Dissimilarity Scores

Explorations of Cross-Disciplinary Term Similarity

Paper Co-citation Analysis Using Semantic Similarity Measures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On Computing Similarity in Academic Literature Data: Methods and Evaluation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Finding Relatedness between Research Papers Using Similarity and Dissimilarity Scores

Explorations of Cross-Disciplinary Term Similarity

Paper Co-citation Analysis Using Semantic Similarity Measures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation