![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Article
Open AccessFrom archive to analysis: accessing web archives at scale through a cloud-based interface
This paper introduces the Archives Unleashed Cloud, a web-based interface for working with web archives at scale. Current access paradigms, largely driven by the scope and scale of web archives, generally invo...
-
Chapter and Conference Paper
From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance
The latest major release of Lucene (version 8) in March 2019 incorporates block-max indexes and exploits the block-max variant of Wand for query evaluation, which are innovations that originated from academia. Th...
-
Chapter and Conference Paper
Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants
When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most li...
-
Chapter and Conference Paper
Reproducibility is a Process, Not an Achievement: The Replicability of IR Reproducibility Experiments
This paper espouses a view of reproducibility in the computational sciences as a process and not just a point-in-time “achievement”. As a concrete case study, we revisit the Open-Source IR Reproducibility Challen...
-
Article
The role of index compression in score-at-a-time query evaluation
This paper explores the performance of top k document retrieval with score-at-a-time query evaluation on impact-ordered indexes in main memory. To better understand execution efficiency in the context of modern p...
-
Article
Document vector representations for feature extraction in multi-stage document ranking
We consider a multi-stage retrieval architecture consisting of a fast, “cheap” candidate generation stage, a feature extraction stage, and a more “expensive” reranking stage using machine-learned models. In th...
-
Article
Open AccessHerpesviruses control the DNA damage response through TIP60
-
Article
Open AccessSearching for SNPs with cloud computing
As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bow...
-
Article
Open AccessModeling actions of PubMed users with n-gram language models
Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then ...
-
Article
Open AccessIs searching full text more effective than searching abstracts?
With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, me...
-
Article
Open AccessPageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We conside...
-
Article
Open AccessIdentification of tissue-specific cis-regulatory modules based on interactions between transcription factors
Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA reg...
-
Article
Open AccessPubMed related articles: a probabilistic topic-based model for content similarity
We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from ...
-
Article
Open AccessSyntactic sentence compression in the biomedical domain: facilitating access to related articles
We explore a syntactic approach to sentence compression in the biomedical domain, grounded in the context of result presentation for related article search in the PubMed search engine. By automatically trimmin...
-
Article
Methods for automatically evaluating answers to complex questions
Evaluation is a major driving force in advancing the state of the art in language technologies. In particular, methods for automatically assessing the quality of machine output is the preferred method for meas...