Skip to main content

previous disabled Page of 2
and
  1. Article

    Open Access

    Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records

    Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and mod...

    Qingyu Chen, **gcheng Du, Sun Kim in BMC Medical Informatics and Decision Making (2020)

  2. Article

    Open Access

    Discovering themes in biomedical literature using a projection-based algorithm

    The need to organize any large document collection in a manner that facilitates human comprehension has become crucial with the increasing volume of information available. Two common approaches to provide a br...

    Lana Yeganova, Sun Kim, Grigory Balasanov, W. John Wilbur in BMC Bioinformatics (2018)

  3. Article

    Open Access

    PubMed Phrases, an open set of coherent phrases for searching biomedical literature

    In biomedicine, key concepts are often expressed by multiple words (e.g., ‘zinc finger protein’). Previous work has shown treating a sequence of words as a meaningful unit, where applicable, is not only import...

    Sun Kim, Lana Yeganova, Donald C. Comeau, W. John Wilbur, Zhiyong Lu in Scientific Data (2018)

  4. Article

    Open Access

    Optimizing graph-based patterns to extract biomedical events from the literature

    We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs o...

    Haibin Liu, Karin Verspoor, Donald C Comeau, Andrew D MacKinlay in BMC Bioinformatics (2015)

  5. Article

    Open Access

    Identifying named entities from PubMed® for enriching semantic categories

    Controlled vocabularies such as the Unified Medical Language System (UMLS®) and Medical Subject Headings (MeSH®) are widely used for biomedical natural language processing (NLP) tasks. However, the standard te...

    Sun Kim, Zhiyong Lu, W John Wilbur in BMC Bioinformatics (2015)

  6. Article

    Open Access

    Finding biomedical categories in Medline®

    There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly define...

    Lana Yeganova, Won Kim, Donald C Comeau, W John Wilbur in Journal of Biomedical Semantics (2012)

  7. Article

    Open Access

    Thematic clustering of text documents using an EM-based approach

    Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to...

    Sun Kim, W John Wilbur in Journal of Biomedical Semantics (2012)

  8. Article

    Open Access

    The gene normalization task in BioCreative III

    We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 full...

    Zhiyong Lu, Hung-Yu Kao, Chih-Hsuan Wei, Minlie Huang, **gchen Liu in BMC Bioinformatics (2011)

  9. Article

    Open Access

    Overview of the BioCreative III Workshop

    The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological s...

    Cecilia N Arighi, Zhiyong Lu, Martin Krallinger, Kevin B Cohen in BMC Bioinformatics (2011)

  10. Article

    Open Access

    The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional ...

    Martin Krallinger, Miguel Vazquez, Florian Leitner, David Salgado in BMC Bioinformatics (2011)

  11. Article

    Open Access

    Classifying protein-protein interaction articles using word and syntactic features

    Identifying protein-protein interactions (PPIs) from literature is an important step in mining the function of individual proteins as well as their biological network. Since it is known that PPIs have distinct...

    Sun Kim, W John Wilbur in BMC Bioinformatics (2011)

  12. Article

    Open Access

    Machine learning with naturally labeled data for identifying abbreviation definitions

    The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most exis...

    Lana Yeganova, Donald C Comeau, W John Wilbur in BMC Bioinformatics (2011)

  13. Article

    Open Access

    Improving a gold standard: treating human relevance judgments of MEDLINE document pairs

    Given prior human judgments of the condition of an object it is possible to use these judgments to make a maximal likelihood estimate of what future human judgments of the condition of that object will be. How...

    W John Wilbur, Won Kim in BMC Bioinformatics (2011)

  14. Article

    Open Access

    Finding related sentence pairs in MEDLINE

    We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detec...

    Larry H. Smith, W. John Wilbur in Information Retrieval (2010)

  15. Article

    Open Access

    The ineffectiveness of within-document term frequency in text classification

    For the purposes of classification it is common to represent a document as a bag of words. Such a representation consists of the individual terms making up the document together with the number of times each t...

    W. John Wilbur, Won Kim in Information Retrieval (2009)

  16. Article

    Open Access

    Modeling actions of PubMed users with n-gram language models

    Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then ...

    Jimmy Lin, W. John Wilbur in Information Retrieval (2009)

  17. Article

    Open Access

    Evaluation of query expansion using MeSH in PubMed

    This paper investigates the effectiveness of using MeSH® in PubMed through its automatic query expansion process: Automatic Term Map** (ATM). We run Boolean searches based on a collection of 55 topics and about...

    Zhiyong Lu, Won Kim, W. John Wilbur in Information Retrieval (2009)

  18. Article

    Open Access

    Abbreviation definition identification based on automatic precision estimates

    The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders...

    Sunghwan Sohn, Donald C Comeau, Won Kim, W John Wilbur in BMC Bioinformatics (2008)

  19. Article

    Open Access

    Overview of BioCreative II gene mention recognition

    Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A ...

    Larry Smith, Lorraine K Tanabe, Rie Johnson nee Ando, Cheng-Ju Kuo in Genome Biology (2008)

  20. Article

    Open Access

    PubMed related articles: a probabilistic topic-based model for content similarity

    We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from ...

    Jimmy Lin, W John Wilbur in BMC Bioinformatics (2007)

previous disabled Page of 2