Search Results - Springer

Sort By Newest First Oldest First

Chapter and Conference Paper

Towards Automated End-to-End Health Misinformation Free Search with a Large Language Model

In the information age, health misinformation remains a notable challenge to public welfare. Integral to addressing this issue is the development of search systems adept at identifying and filtering out mislea...

Ronak Pradeep, Jimmy Lin in Advances in Information Retrieval (2024)
Chapter and Conference Paper

PyGaggle: A Gaggle of Resources for Open-Domain Question Answering

Text retrieval using dense–sparse hybrids has been gaining popularity because of their effectiveness. Improvements to both sparse and dense models have also been noted, in the context of open-domain question a...

Ronak Pradeep, Haonan Chen, Lingwei Gu… in Advances in Information Retrieval (2023)
Chapter and Conference Paper

Answer Retrieval for Math Questions Using Structural and Dense Retrieval

Answer retrieval for math questions is a challenging task due to the complex and structured nature of mathematical expressions. In this paper, we combine a structure retriever and a domain-adapted ColBERT retr...

Wei Zhong, Yuqing **e, Jimmy Lin in Experimental IR Meets Multilinguality, Mul… (2023)
Chapter and Conference Paper

Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering

One of the contributions of the landmark Dense Passage Retriever (DPR) work is the curation of a corpus of passages generated from Wikipedia articles that have been segmented into non-overlap** passages of 1...

Manveer Singh Tamber, Ronak Pradeep, Jimmy Lin in Advances in Information Retrieval (2023)
Chapter and Conference Paper

Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking

While much recent work has demonstrated that hard negative mining can be used to train better bi-encoder models, few have considered it in the context of cross-encoders, which are key ingredients in modern re...

Ronak Pradeep, Yuqi Liu, **nyu Zhang, Yilin Li… in Advances in Information Retrieval (2022)
Chapter

Setting the Stage

This section begins by more formally characterizing the text ranking problem, explicitly enumerating our assumptions about characteristics of the input and output, and more precisely circumscribing the scope o...

Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)
Chapter

Refining Query and Document Representations

The vocabulary mismatch problem [Furnas et al., 1987]—where searchers and the authors of the texts to be searched use different words to describe the same concepts—was introduced in Section 1.2.2 as a core pro...

Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)
Chapter

Future Directions and Conclusions

It is quite remarkable that BERT debuted in October 2018, only around three years ago. Taking a step back and reflecting, the field has seen an incredible amount of progress in a short amount of time. As we ha...

Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)
Chapter and Conference Paper

Another Look at DPR: Reproduction of Training and Replication of Retrieval

Text retrieval using learned dense representations has recently emerged as a promising alternative to “traditional” text retrieval using sparse bag-of-words representations. One foundational work that has garn...

Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li… in Advances in Information Retrieval (2022)
Chapter

Introduction

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task. The most common formulation of text ranking is search, where the search en...

Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)
Chapter

Multi-Stage Architectures for Reranking

The simplest and most straightforward formulation of text ranking is to convert the task into a text classification problem, and then sort the texts to be ranked based on the probability that each item belongs...

Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)
Chapter

Learned Dense Representations for Ranking

Arguably, the single biggest benefit brought about by modern deep learning techniques to text ranking is the move away from sparse signals, mostly limited to exact matches, to continuous dense representations ...

Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)
Book

Pretrained Transformers for Text Ranking

BERT and Beyond

Jimmy Lin, Rodrigo Nogueira… in Synthesis Lectures on Human Language Technologies (2022)
Chapter and Conference Paper

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

Pseudo-Relevance Feedback (PRF) utilises the relevance signals from the top-k passages from the first round of retrieval to perform a second round of retrieval aiming to improve search effectiveness. A recent res...

Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma… in Advances in Information Retrieval (2022)
Article

Open Access

From archive to analysis: accessing web archives at scale through a cloud-based interface

This paper introduces the Archives Unleashed Cloud, a web-based interface for working with web archives at scale. Current access paradigms, largely driven by the scope and scale of web archives, generally invo...

Nick Ruest, Samantha Fritz, Ryan Deschamps… in International Journal of Digital Humanities (2021)

Download PDF (1162 KB) View Article
Chapter and Conference Paper

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers

While BERT has been shown to be effective for passage retrieval, its maximum input length limitation poses a challenge when applying the model to document retrieval. In this work, we reproduce three passage sc...

**nyu Zhang, Andrew Yates, Jimmy Lin in Advances in Information Retrieval (2021)
Article

Navigation-based candidate expansion and pretrained language models for citation recommendation

Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration. We...

Rodrigo Nogueira, Zhiying Jiang, Kyunghyun Cho, Jimmy Lin in Scientometrics (2020)
Article

The ubiquity of large graphs and surprising challenges of graph processing: extended survey

Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We performed an extens...

Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin… in The VLDB Journal (2020)
Chapter and Conference Paper

From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance

The latest major release of Lucene (version 8) in March 2019 incorporates block-max indexes and exploits the block-max variant of Wand for query evaluation, which are innovations that originated from academia. Th...

Adrien Grand, Robert Muir, Jim Ferenczi, Jimmy Lin in Advances in Information Retrieval (2020)

Download PDF (223 KB) View Chapter
Chapter and Conference Paper

Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants

When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most li...

Chris Kamphuis, Arjen P. de Vries, Leonid Boytsov… in Advances in Information Retrieval (2020)

Download PDF (225 KB) View Chapter

60 Result(s)

Towards Automated End-to-End Health Misinformation Free Search with a Large Language Model

PyGaggle: A Gaggle of Resources for Open-Domain Question Answering

Answer Retrieval for Math Questions Using Structural and Dense Retrieval

Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering

Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking

Setting the Stage

Refining Query and Document Representations

Future Directions and Conclusions

Another Look at DPR: Reproduction of Training and Replication of Retrieval

Introduction

Multi-Stage Architectures for Reranking

Learned Dense Representations for Ranking

Pretrained Transformers for Text Ranking

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

From archive to analysis: accessing web archives at scale through a cloud-based interface

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers

Navigation-based candidate expansion and pretrained language models for citation recommendation

The ubiquity of large graphs and surprising challenges of graph processing: extended survey

From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance

Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants

Our Content

Other Sites

Help & Contacts