Log in

An open-source tool for merging data from multiple citation databases

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

A bibliometric analysis based on records from a single citation database may be limited in its comprehensiveness and, therefore, in the reliability of its results. The process of combining and deduplicating records from multiple citation index databases for the purpose of a bibliometric analysis is often manual and requires significant effort, especially for larger amounts of data. This paper presents an open-source tool for automatically preprocessing and deduplicating records based on similarity and user-configurable strategies. To validate the capabilities of the tool, the authors of this paper first manually deduplicated records from Scopus and Web of Science on a use-case analysis for 11,307 records. The performance of the tool was then evaluated against the manually deduplicated results. From the results of the best performing similarity configuration on a deduplication use case, the tool minimizes the time researchers would spend on data wrangling for combining Scopus and WoS up to 99% precision and 98% F-measure. The tool developed has practical implications for bibliometric studies. For instance, we conducted a bibliometric analysis of the most productive researchers at a university using a single citation database, as well as merged data from multiple citation databases. The study used the VOSviewer tool and showed that utilizing merged data may produce different outcomes compared to those obtained from a study based on a single citation database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

Download references

Funding

This research has been supported by the Ministry of Science, Technological Development and Innovation (Contract No. 451-03-65/2024-03/200156) and the Faculty of Technical Sciences, University of Novi Sad through project “Scientific and Artistic Research Work of Researchers in Teaching and Associate Positions at the Faculty of Technical Sciences, University of Novi Sad” (No. 01-3394/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dušan Nikolić.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nikolić, D., Ivanović, D. & Ivanović, L. An open-source tool for merging data from multiple citation databases. Scientometrics (2024). https://doi.org/10.1007/s11192-024-05076-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11192-024-05076-2

Keywords

Navigation