Pretrained Transformers for Text Ranking
BERT and Beyond
Chapter and Conference Paper
One common trend in recent studies of language models (LMs) is the use of standardized tests for evaluation. However, despite being the fifth most spoken language worldwide, few such evaluations have been con...
Chapter and Conference Paper
This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents. The system, called Visconde, uses a three-step pipeline...
Chapter and Conference Paper
In recent years, there has been considerable growth in the volume of legal proceedings in Brazil. In this context, there is a lot of potential in using recent advances in Natural Language Processing to automat...
Chapter and Conference Paper
As the capabilities of language models continue to advance, it is conceivable that “one-size-fits-all” model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many o...
Chapter and Conference Paper
A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts. In a production pipeline, requirements often change, ...
Chapter
This section begins by more formally characterizing the text ranking problem, explicitly enumerating our assumptions about characteristics of the input and output, and more precisely circumscribing the scope o...
Chapter
The vocabulary mismatch problem [Furnas et al., 1987]—where searchers and the authors of the texts to be searched use different words to describe the same concepts—was introduced in Section 1.2.2 as a core pro...
Chapter
It is quite remarkable that BERT debuted in October 2018, only around three years ago. Taking a step back and reflecting, the field has seen an incredible amount of progress in a short amount of time. As we ha...
Chapter
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task. The most common formulation of text ranking is search, where the search en...
Chapter
The simplest and most straightforward formulation of text ranking is to convert the task into a text classification problem, and then sort the texts to be ranked based on the probability that each item belongs...
Chapter
Arguably, the single biggest benefit brought about by modern deep learning techniques to text ranking is the move away from sparse signals, mostly limited to exact matches, to continuous dense representations ...
Book
Article
Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration. We...
Chapter
Epistasis (gene-gene interaction) is crucial to predicting genetic disease. Our work tackles the computational challenges faced by previous works in epistasis detection by modeling it as a one-step Markov Deci...
Chapter and Conference Paper
Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of large pretrained language models (LMs) to downstream natural language processing ...