Skip to main content

and
  1. No Access

    Chapter and Conference Paper

    BLUEX: A Benchmark Based on Brazilian Leading Universities Entrance eXams

    One common trend in recent studies of language models (LMs) is the use of standardized tests for evaluation. However, despite being the fifth most spoken language worldwide, few such evaluations have been con...

    Thales Sales Almeida, Thiago Laitz, Giovana K. Bonás in Intelligent Systems (2023)

  2. No Access

    Chapter and Conference Paper

    Visconde: Multi-document QA with GPT-3 and Neural Reranking

    This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents. The system, called Visconde, uses a three-step pipeline...

    Jayr Pereira, Robson Fidalgo, Roberto Lotufo in Advances in Information Retrieval (2023)

  3. No Access

    Chapter and Conference Paper

    Exploring Text Decoding Methods for Portuguese Legal Text Generation

    In recent years, there has been considerable growth in the volume of legal proceedings in Brazil. In this context, there is a lot of potential in using recent advances in Natural Language Processing to automat...

    Kenzo Sakiyama, Raphael Montanari, Roseval Malaquias Junior in Intelligent Systems (2023)

  4. No Access

    Chapter and Conference Paper

    Sabiá: Portuguese Large Language Models

    As the capabilities of language models continue to advance, it is conceivable that “one-size-fits-all” model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many o...

    Ramon Pires, Hugo Abonizio, Thales Sales Almeida, Rodrigo Nogueira in Intelligent Systems (2023)

  5. No Access

    Chapter and Conference Paper

    Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

    A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts. In a production pipeline, requirements often change, ...

    Ramon Pires, Fábio C. de Souza, Guilherme Rosa in Document Analysis Systems (2022)

  6. No Access

    Chapter

    Setting the Stage

    This section begins by more formally characterizing the text ranking problem, explicitly enumerating our assumptions about characteristics of the input and output, and more precisely circumscribing the scope o...

    Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)

  7. No Access

    Chapter

    Refining Query and Document Representations

    The vocabulary mismatch problem [Furnas et al., 1987]—where searchers and the authors of the texts to be searched use different words to describe the same concepts—was introduced in Section 1.2.2 as a core pro...

    Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)

  8. No Access

    Chapter

    Future Directions and Conclusions

    It is quite remarkable that BERT debuted in October 2018, only around three years ago. Taking a step back and reflecting, the field has seen an incredible amount of progress in a short amount of time. As we ha...

    Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)

  9. No Access

    Chapter

    Introduction

    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task. The most common formulation of text ranking is search, where the search en...

    Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)

  10. No Access

    Chapter

    Multi-Stage Architectures for Reranking

    The simplest and most straightforward formulation of text ranking is to convert the task into a text classification problem, and then sort the texts to be ranked based on the probability that each item belongs...

    Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)

  11. No Access

    Chapter

    Learned Dense Representations for Ranking

    Arguably, the single biggest benefit brought about by modern deep learning techniques to text ranking is the move away from sparse signals, mostly limited to exact matches, to continuous dense representations ...

    Jimmy Lin, Rodrigo Nogueira, Andrew Yates in Pretrained Transformers for Text Ranking (2022)

  12. No Access

    Book

  13. No Access

    Article

    Navigation-based candidate expansion and pretrained language models for citation recommendation

    Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration. We...

    Rodrigo Nogueira, Zhiying Jiang, Kyunghyun Cho, Jimmy Lin in Scientometrics (2020)

  14. No Access

    Chapter

    EpiRL: A Reinforcement Learning Agent to Facilitate Epistasis Detection

    Epistasis (gene-gene interaction) is crucial to predicting genetic disease. Our work tackles the computational challenges faced by previous works in epistasis detection by modeling it as a one-step Markov Deci...

    Kexin Huang, Rodrigo Nogueira in Precision Health and Medicine (2020)

  15. No Access

    Chapter and Conference Paper

    BERTimbau: Pretrained BERT Models for Brazilian Portuguese

    Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of large pretrained language models (LMs) to downstream natural language processing ...

    Fábio Souza, Rodrigo Nogueira, Roberto Lotufo in Intelligent Systems (2020)