![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Article
Open AccessDeep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records
Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and mod...
-
Article
Open AccessDiscovering themes in biomedical literature using a projection-based algorithm
The need to organize any large document collection in a manner that facilitates human comprehension has become crucial with the increasing volume of information available. Two common approaches to provide a br...
-
Article
Open AccessPubMed Phrases, an open set of coherent phrases for searching biomedical literature
In biomedicine, key concepts are often expressed by multiple words (e.g., ‘zinc finger protein’). Previous work has shown treating a sequence of words as a meaningful unit, where applicable, is not only import...
-
Article
Open AccessOptimizing graph-based patterns to extract biomedical events from the literature
We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs o...
-
Article
Open AccessIdentifying named entities from PubMed® for enriching semantic categories
Controlled vocabularies such as the Unified Medical Language System (UMLS®) and Medical Subject Headings (MeSH®) are widely used for biomedical natural language processing (NLP) tasks. However, the standard te...
-
Article
Open AccessFinding biomedical categories in Medline®
There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly define...
-
Article
Open AccessThematic clustering of text documents using an EM-based approach
Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to...
-
Article
Open AccessThe gene normalization task in BioCreative III
We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 full...
-
Article
Open AccessOverview of the BioCreative III Workshop
The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological s...
-
Article
Open AccessThe Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text
Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional ...
-
Article
Open AccessClassifying protein-protein interaction articles using word and syntactic features
Identifying protein-protein interactions (PPIs) from literature is an important step in mining the function of individual proteins as well as their biological network. Since it is known that PPIs have distinct...
-
Article
Open AccessMachine learning with naturally labeled data for identifying abbreviation definitions
The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most exis...
-
Article
Open AccessImproving a gold standard: treating human relevance judgments of MEDLINE document pairs
Given prior human judgments of the condition of an object it is possible to use these judgments to make a maximal likelihood estimate of what future human judgments of the condition of that object will be. How...
-
Article
Open AccessFinding related sentence pairs in MEDLINE
We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detec...
-
Article
Open AccessThe ineffectiveness of within-document term frequency in text classification
For the purposes of classification it is common to represent a document as a bag of words. Such a representation consists of the individual terms making up the document together with the number of times each t...
-
Article
Open AccessModeling actions of PubMed users with n-gram language models
Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then ...
-
Article
Open AccessEvaluation of query expansion using MeSH in PubMed
This paper investigates the effectiveness of using MeSH® in PubMed through its automatic query expansion process: Automatic Term Map** (ATM). We run Boolean searches based on a collection of 55 topics and about...
-
Article
Open AccessAbbreviation definition identification based on automatic precision estimates
The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders...
-
Article
Open AccessOverview of BioCreative II gene mention recognition
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A ...
-
Article
Open AccessPubMed related articles: a probabilistic topic-based model for content similarity
We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from ...