On Computing Similarity in Academic Literature Data: Methods and Evaluation

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8597))

Included in the following conference series:

Abstract

Similarity computation for academic literature data is one of the interesting topics that have been discussed recently in information retrieval and data mining. Consequently, a variety of methods has been proposed to compute the similarity of scientific papers. In this paper, we present various similarity methods and evaluate their effectiveness via extensive experiments on a real-world dataset of scientific papers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.informatik.uni-trier.de.

  2. 2.

    http://academic.research.microsoft.com/.

References

  1. Aktolga, A., Ros, I., Assogba, Y.: Detecting outlier sections in US congressional legislation. In: ACM SIGIR, pp. 235–244 (2011)

    Google Scholar 

  2. Amsler, R.: Application of citation-based automatic classification. Technical report, Texas University (1972)

    Google Scholar 

  3. Barrón-Cedeño, A., Eiselt, A., Rosso, P.: Monolingual text similarity measures: a comparison of models over Wikipedia articles revisions. In: European Conference on IR Research, pp. 305–319 (2003)

    Google Scholar 

  4. Chiki, N., Rothenburger, B., Gilles, N.: Combining link and content information for scientific topics discovery. In: ICTAI, pp. 211–214 (2008)

    Google Scholar 

  5. Fuhr, N.: Probabilistic models in information retrieval. Comput. J. 35(3), 243–255 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  6. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  7. Jeh, J., Widom, J.: SimRank: a measure of structural-context similarity. In: ACM SIGKDD, pp. 538–543 (2002)

    Google Scholar 

  8. Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD, pp. 133–142 (2002)

    Google Scholar 

  9. Kessler, M.: Bibliographic coupling between scientific papers. Am. Doc. J. 14(1), 10–25 (1963)

    Article  Google Scholar 

  10. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  11. Lv, Y., Zhai, C.: When documents are very long, BM25 fails. In: ACM SIGIR, pp. 1103–1104 (2011)

    Google Scholar 

  12. Reyhani Hamedani, M., Lee, S., Kim, S.: On combining text-based and link-based similarity measures for scientific papers. In: ACM RACS, pp. 111–115 (2013)

    Google Scholar 

  13. Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. ACM J. 15(1), 8–36 (1968)

    Article  MATH  Google Scholar 

  14. Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24(4), 265–269 (1973)

    Article  Google Scholar 

  15. Sugiyama, K., Kan, M.: Scholarly paper recommendation via user’s recent research interests. In: JCDL, pp. 29–38 (2010)

    Google Scholar 

  16. Tan, B., Shen, X., Zhai, C.: Mining long-term search history to improve search accuracy. In: ACM SIGKDD, pp. 718–723 (2006)

    Google Scholar 

  17. Yates, R.B., Neto, B.R.: Modern Information Retrieval. Addison Wesley, Boston (1999)

    Google Scholar 

  18. Yoon, S., Kim, S., Kim, J.: On computing text-based similarity in scientific literature. In: WWW, pp. 169–170 (2011)

    Google Scholar 

  19. Zhao, P., Han, H., Yizhou, S.: P-Rank: a comprehensive structural similarity measure over information networks. In: CIKM, pp. 553–562 (2009)

    Google Scholar 

Download references

Acknowledgments

This work was supported by (1) Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2013 (Grants No. C0006278), (2) the MSIP (Ministry of Science, ICT, and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-4009) supervised by the NIPA (National IT Industry Promotion Agency), and (3) the Seoul Creative Human Development Program (HM120006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masoud Reyhani Hamedani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hamedani, M.R., Kim, SW. (2014). On Computing Similarity in Academic Literature Data: Methods and Evaluation. In: Chen, Y., et al. Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science(), vol 8597. Springer, Cham. https://doi.org/10.1007/978-3-319-11538-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11538-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11537-5

  • Online ISBN: 978-3-319-11538-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation