Abstract
Similarity computation for academic literature data is one of the interesting topics that have been discussed recently in information retrieval and data mining. Consequently, a variety of methods has been proposed to compute the similarity of scientific papers. In this paper, we present various similarity methods and evaluate their effectiveness via extensive experiments on a real-world dataset of scientific papers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aktolga, A., Ros, I., Assogba, Y.: Detecting outlier sections in US congressional legislation. In: ACM SIGIR, pp. 235–244 (2011)
Amsler, R.: Application of citation-based automatic classification. Technical report, Texas University (1972)
Barrón-Cedeño, A., Eiselt, A., Rosso, P.: Monolingual text similarity measures: a comparison of models over Wikipedia articles revisions. In: European Conference on IR Research, pp. 305–319 (2003)
Chiki, N., Rothenburger, B., Gilles, N.: Combining link and content information for scientific topics discovery. In: ICTAI, pp. 211–214 (2008)
Fuhr, N.: Probabilistic models in information retrieval. Comput. J. 35(3), 243–255 (1992)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Jeh, J., Widom, J.: SimRank: a measure of structural-context similarity. In: ACM SIGKDD, pp. 538–543 (2002)
Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD, pp. 133–142 (2002)
Kessler, M.: Bibliographic coupling between scientific papers. Am. Doc. J. 14(1), 10–25 (1963)
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Lv, Y., Zhai, C.: When documents are very long, BM25 fails. In: ACM SIGIR, pp. 1103–1104 (2011)
Reyhani Hamedani, M., Lee, S., Kim, S.: On combining text-based and link-based similarity measures for scientific papers. In: ACM RACS, pp. 111–115 (2013)
Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. ACM J. 15(1), 8–36 (1968)
Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24(4), 265–269 (1973)
Sugiyama, K., Kan, M.: Scholarly paper recommendation via user’s recent research interests. In: JCDL, pp. 29–38 (2010)
Tan, B., Shen, X., Zhai, C.: Mining long-term search history to improve search accuracy. In: ACM SIGKDD, pp. 718–723 (2006)
Yates, R.B., Neto, B.R.: Modern Information Retrieval. Addison Wesley, Boston (1999)
Yoon, S., Kim, S., Kim, J.: On computing text-based similarity in scientific literature. In: WWW, pp. 169–170 (2011)
Zhao, P., Han, H., Yizhou, S.: P-Rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562 (2009)
Acknowledgments
This work was supported by (1) Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2013 (Grants No. C0006278), (2) the MSIP (Ministry of Science, ICT, and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-4009) supervised by the NIPA (National IT Industry Promotion Agency), and (3) the Seoul Creative Human Development Program (HM120006).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hamedani, M.R., Kim, SW. (2014). On Computing Similarity in Academic Literature Data: Methods and Evaluation. In: Chen, Y., et al. Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science(), vol 8597. Springer, Cham. https://doi.org/10.1007/978-3-319-11538-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-11538-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11537-5
Online ISBN: 978-3-319-11538-2
eBook Packages: Computer ScienceComputer Science (R0)