Shallow Cross-Encoders for Low-Latency Retrieval

Petrov, Aleksandr V.; MacAvaney, Sean; Macdonald, Craig

doi:10.1007/978-3-031-56063-7_10

Aleksandr V. Petrov¹⁴,
Sean MacAvaney¹⁴ &
Craig Macdonald¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14610))

Included in the following conference series:

European Conference on Information Retrieval

527 Accesses
5 Altmetric

Abstract

Transformer-based Cross-Encoders achieve state-of-the-art effectivness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, kee** search latencies low is important for user satisfaction and energy usage. In this paper, we show that weaker shallow transformer models (i.e. transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings, since they can estimate the relevance of more documents in the same time budget. We further show that shallow transformers may benefit from the generalised Binary Cross-Entropy (gBCE) training scheme, which has recently demonstrated success for recommendation tasks. Our experiments with TREC Deep Learning passage ranking querysets demonstrate significant improvements in shallow and full-scale models in low-latency scenarios. For example, when the latency limit is 25 ms per query, MonoBERT-Large (a cross-encoder based on a full-scale BERT model) is only able to achieve NDCG@10 of 0.431 on TREC DL 2019, while TinyBERT-gBCE (a cross-encoder based on TinyBERT trained with gBCE) reaches NDCG@10 of 0.652, a +51% gain over MonoBERT-Large. We also show that shallow Cross-Encoders are effective even when used without a GPU (e.g., with CPU inference, NDCG@10 decreases only by 3% compared to GPU inference with 50 ms latency), which makes Cross-Encoders practical to run even without specialised hardware acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In [6], Google researchers argued that for a smooth user experience, total search latency should be kept under 100 ms. This includes time for network round-trips, page rendering, and other overheads. Therefore, this paper uses a 50 ms cutoff for defining low-latency retrieval, leaving the remainder of the time to these other overheads.
2.
https://www.sbert.net/docs/pretrained-models/ce-msmarco.html.
3.
https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/.
4.
https://towardsdatascience.com/tinybert-for-search-10x-faster-and-20x-smaller-tha n-bert-74cd1b6b5aec.
5.
For simplicity, we omit batching in this reasoning. With a slight tweaking, it remains valid for batching as well (e.g. inference time should be divided by the batch size).
6.
Source code for this paper can be found at https://github.com/asash/shallow-cross-encoders.
7.
We do not use the standard MSMARCO triplets file because it only contains one negative per query, and for gBCE training scheme we need up to 128 negatives.
8.
https://huggingface.co/prajjwal1/bert-tiny.
9.
https://huggingface.co/castorini/monot5-base-msmarco-10k.
10.
https://huggingface.co/castorini/monobert-large-msmarco-finetune-only.

References

Bajaj, P., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Proceedings of NeurIPS (2018)
Google Scholar
Bhargava, P., Drozd, A., Rogers, A.: Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics (2021). http://arxiv.org/abs/2110.01518
Bruch, S., Lucchese, C., Nardini, F.M.: Efficient and effective tree-based and neural learning to rank. Found. Trends® Inf. Retrieval 17(1), 1–123 (2023)
Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. In: Proceedings of TREC (2020)
Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. In: Proceedings of TREC (2019)
Google Scholar
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of of NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of SIGIR, pp. 993–1002 (2011)
Google Scholar
Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of SIGIR, pp. 2288–2292 (2021)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bach, F.: Deep Learning. MIT Press, Cambridge (2017)
Google Scholar
Hofstätter, S., Althammer, S., Schröder, M., Sertkan, M., Hanbury, A.: Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation (2021). http://arxiv.org/abs/2010.02666
Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of SIGIR, pp. 113–122 (2021)
Google Scholar
Hofstätter, S., Zlabinger, M., Hanbury, A.: Interpretable & time-budget-constrained contextualization for re-ranking. In: Proceedings of ECAI (2020)
Google Scholar
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (2020). http://arxiv.org/abs/1905.01969
Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of SIGIR, pp. 39–48 (2020)
Google Scholar
Kohavi, R., Tang, D., Xu, Y.: Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press, Cambridge (2020)
Book Google Scholar
Kulkarni, H., MacAvaney, S., Goharian, N., Frieder, O.: Lexically-accelerated dense retrieval. In: Proceedings of SIGIR, pp. 152–162 (2023)
Google Scholar
Lin, S.C., Yang, J.H., Lin, J.: In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval. In: Proceedings of RepL4NLP, pp. 163–173 (2021)
Google Scholar
Lu, W., Jiao, J., Zhang, R.: TwinBERT: distilling knowledge to twin-structured compressed BERT models for large-scale retrieval. In: Proceedings of CIKM, pp. 2645–2652 (2020)
Google Scholar
MacAvaney, S., Macdonald, C.: A python interface to PISA! In: Proceedings of SIGIR, pp. 3339–3344 (2022)
Google Scholar
MacAvaney, S., Macdonald, C., Ounis, I.: Streamlining evaluation with IR-measures. In: Proceedings of ECIR, pp. 305–310 (2022)
Google Scholar
MacAvaney, S., Nardini, F.M., Perego, R., Tonellotto, N., Goharian, N., Frieder, O.: Expansion via prediction of importance with contextualization. In: Proceedings of SIGIR, pp. 1573–1576 (2020)
Google Scholar
MacAvaney, S., Tonellotto, N., Macdonald, C.: Adaptive re-ranking with a corpus graph. In: Proceedings of SIGIR, pp. 1491–1500 (2022)
Google Scholar
MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: Proceedings of SIGIR, pp. 1101–1104 (2019)
Google Scholar
Macdonald, C., Tonellotto, N., MacAvaney, S., Ounis, I.: PyTerrier: declarative experimentation in python from BM25 to dense retrieval. In: Proceedings of CIKM, pp. 4526–4533 (2021)
Google Scholar
Mallia, A., Siedlaczek, M., Mackenzie, J.M., Suel, T.: PISA: performant indexes and search for academia. In: Proceedings of OSIRRC@SIGIR 2019, vol. 2409, pp. 50–56 (2019)
Google Scholar
Nogueira, R., Cho, K.: Passage Re-ranking with BERT (2020). http://arxiv.org/abs/1901.04085
Nogueira, R., Jiang, Z., Lin, J.: Document Ranking with a Pretrained Sequence-to-Sequence Model (2020). http://arxiv.org/abs/2003.06713
Petrov, A.V., Macdonald, C.: gSASRec: reducing overconfidence in sequential recommendation trained with negative sampling. In: Proceedings of RecSys, pp. 116–128 (2023)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of EMNLP (2019)
Google Scholar
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC 3. In: Proceedings of TREC (1994)
Google Scholar
Scells, H., Zhuang, S., Zuccon, G.: Reduce, reuse, recycle: green information retrieval research. In: Proceedings of SIGIR, pp. 2825–2837 (2022)
Google Scholar
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models (2021). http://arxiv.org/abs/2104.08663
Turc, I., Chang, M.W., Lee, K., Toutanova, K.: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (2019). http://arxiv.org/abs/1908.08962
Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)
Google Scholar
Wallat, J., Beringer, F., Anand, A., Anand, A.: Probing BERT for ranking abilities. In: Proceedings of ECIR, pp. 255–273 (2023)
Google Scholar
Wang, X., MacAvaney, S., Macdonald, C., Ounis, I.: An inspection of the reproducibility and replicability of TCT-ColBERT. In: Proceedings of SIGIR, pp. 2790–2800 (2022)
Google Scholar
Wang, X., Macdonald, C., Tonellotto, N., Ounis, I.: Reproducibility, replicability, and insights into dense multi-representation retrieval models: from ColBERT to Col*. In: Proceedings of SIGIR, pp. 2552–2561 (2023)
Google Scholar
Wolf, T., et al.: HuggingFace’s Transformers: State-of-the-art Natural Language Processing (2020). http://arxiv.org/abs/1910.03771
**ong, L., et al.: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval (2020)
Google Scholar
Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. In: Proceedings of SIGIR, pp. 2308–2313 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Glasgow, Glasgow, UK
Aleksandr V. Petrov, Sean MacAvaney & Craig Macdonald

Authors

Aleksandr V. Petrov
View author publications
You can also search for this author in PubMed Google Scholar
Sean MacAvaney
View author publications
You can also search for this author in PubMed Google Scholar
Craig Macdonald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleksandr V. Petrov .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

A Effect of gBCE Training Scheme on Tiny BERT-Based Cross-Encoder on the MS MARCO Dev Set

Table 3 reports the effectiveness of a Tiny BERT model on a 6,980 queries sub-set of the MS MARCO dev set (dataset irds:msmarco-passage/dev/small in PyTerrier). The evaluation follows the scheme described in Sect. 4.3, with the exception of using MRR@10, which is the official metric for this queryset, instead of NDCG@10. As we can see from the table, the overall trends follow the observations in Sect. 4.3. In particular, an increased number of negatives is more important than the loss function; gBCE loss improves results with a small number of negatives but has a moderate effect when the number of negatives increases. However, we observe that overall gBCE in this experiment is better than BCE loss in 5 out of 6 cases. With 1 negative, the improvement over BCE loss is statistically significant. Overall, the combination of gBCE loss and 128 number of negatives provides a significant improvement of MRR@10, from 0.2942 to 0.3200 (+8.76%), compared to the “standard” training scheme with 1 negative and BCE loss. Note that this result is lower compared to the larger models – e.g. Nogueira et al. [27] achieved MRR@10 of 0.36 on this queryset with a BERT-Large model. Lower effectiveness compared to the full-scale models is an expected result, as we do not control for latency in this experiment. When latency is limited, shallow Cross-Encoders are more effective (see Fig. 1).

Table 3. Effect of the loss function and the number of negatives training on Tiny BERT-based Cross-Encoder MRR@10 on the MS MARCO dev set. Bold indicates the best result, and * indicates a statistically significant difference (\(pvalue < 0.05\)) compared to the baseline (BCE loss, one negative).

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petrov, A.V., MacAvaney, S., Macdonald, C. (2024). Shallow Cross-Encoders for Low-Latency Retrieval. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-56063-7_10
Published: 23 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Shallow Cross-Encoders for Low-Latency Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Effect of gBCE Training Scheme on Tiny BERT-Based Cross-Encoder on the MS MARCO Dev Set

A Effect of gBCE Training Scheme on Tiny BERT-Based Cross-Encoder on the MS MARCO Dev Set

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation