Abstract
The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (see Note 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ (2004) The NLM indexing initiativeś medical text indexer. Stud Health Technol Inform 107(Pt 1):268–272
Stokes N, Li Y, Cavedon L, Zobel J (2010) Exploring criteria for successful query expansion in the genomic domain. Inf Retr 12:17–50
Lu Z, Kim W, Wilbur WJ (2010) Evaluation of query expansion using MeSH in PubMed. Inf Retr 12:69–80
Zhu S, Takigawa I, Zeng J, Mamitsuka H (2009) Field independent probabilistic model for clustering multi-field documents. Inf Process Manage 45(5):555–570
Zhu S, Zeng J, Mamitsuka H (2009) Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinformatics 25(15):1944–1951
Gu J, Feng W, Zeng J, Mamitsuka H, Zhu S (2013) Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans Cybernetics 43(4):1265–1276
Zhou J, Shui Y, Peng S, Li X, Mamitsuka H, Zhu S (2015) MeSHSim: An R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents. J Bioinform Comput Biol 13(6):1542002
Huang X, Zheng X, Yuan W, Wang F, Zhu S (2011) Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inform Sci 181(11):2293–2302
Mork JG, Jimeno-Yepes A, Aronson AR (2013) The NLM medical text indexer system for indexing biomedical literature. BioASQ@ CLEF
Demner-Fushman D, Mork JG (2016) A report to the board of Scientific Counselors, April 2016
Mork JG, Demner-Fushman D, Schmidt S, Aronson AR (2014) Recent Enhancements to the NLM Medical Text Indexer. CLEF (Working Notes), pp 1328–1336
Nelson SJ, Schopen M, Savage AG, Schulman JL, Arluk N (2004) The MeSH translation maintenance system: structure, interface design, and implementation. Medinfo 11:67–69
Aronson AR, Lang FM (2004) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236
Lin J, Wilbur WJ (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8:423
Partalas I, Gaussier É, Ngomo ACN et al. (2013) Results of the first BioASQ Workshop. BioASQ@ CLEF
Tsatsaronis G et al (2015) An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16:138
Balikas G, Partalas I, Ngomo AN, Krithara A, Paliouras G (2014) Results of the BioASQ track of the question answering lab at CLEF 2014. CLEF (Working Notes), pp 1181–1193
Tsoumakas G, Laliotis M, Markantonatos N, Vlahavas IP (2013) Large-scale semantic indexing of biomedical publications. BioASQ@ CLEF
Mao Y, Lu Z (2013) NCBI at the 2013 BioASQ challenge task: learning to rank for automatic MeSH indexing. BioASQ@ CLEF
Liu K, Peng S, Wu J, Zhai C, Mamitsuka H, Zhu S (2015) MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence. Bioinformatics 12:i339–i347
Peng S, You R, Wang H, Zhai C, Mamitsuka H, Zhu S (2016) DeepMeSH: deep semantic representation for improving large-scale MeSH indexing. Bioinformatics 32(12):i70–i79
Peng S, You R, **e Z, Wang B, Zhang Y, Zhu S (2015) The Fudan participation in the 2015 BioASQ challenge: large-scale biomedical semantic indexing and question answering. CLEF (Working Notes)
Acknowledgments
This work has been partially supported by National Natural Science Foundation of China (Grant Nos: 61572139), MEXT KAKENHI #16H02868 and FiDiPro by Tekes.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Peng, S., Mamitsuka, H., Zhu, S. (2018). MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing. In: Mamitsuka, H. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 1807. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8561-6_15
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8561-6_15
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8560-9
Online ISBN: 978-1-4939-8561-6
eBook Packages: Springer Protocols