Shallow Cross-Encoders for Low-Latency Retrieval

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14610))

Included in the following conference series:

Abstract

Transformer-based Cross-Encoders achieve state-of-the-art effectivness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, kee** search latencies low is important for user satisfaction and energy usage. In this paper, we show that weaker shallow transformer models (i.e. transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings, since they can estimate the relevance of more documents in the same time budget. We further show that shallow transformers may benefit from the generalised Binary Cross-Entropy (gBCE) training scheme, which has recently demonstrated success for recommendation tasks. Our experiments with TREC Deep Learning passage ranking querysets demonstrate significant improvements in shallow and full-scale models in low-latency scenarios. For example, when the latency limit is 25 ms per query, MonoBERT-Large (a cross-encoder based on a full-scale BERT model) is only able to achieve NDCG@10 of 0.431 on TREC DL 2019, while TinyBERT-gBCE (a cross-encoder based on TinyBERT trained with gBCE) reaches NDCG@10 of 0.652, a +51% gain over MonoBERT-Large. We also show that shallow Cross-Encoders are effective even when used without a GPU (e.g., with CPU inference, NDCG@10 decreases only by 3% compared to GPU inference with 50 ms latency), which makes Cross-Encoders practical to run even without specialised hardware acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In [6], Google researchers argued that for a smooth user experience, total search latency should be kept under 100 ms. This includes time for network round-trips, page rendering, and other overheads. Therefore, this paper uses a 50 ms cutoff for defining low-latency retrieval, leaving the remainder of the time to these other overheads.

  2. 2.

    https://www.sbert.net/docs/pretrained-models/ce-msmarco.html.

  3. 3.

    https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/.

  4. 4.

    https://towardsdatascience.com/tinybert-for-search-10x-faster-and-20x-smaller-than-bert-74cd1b6b5aec.

  5. 5.

    For simplicity, we omit batching in this reasoning. With a slight tweaking, it remains valid for batching as well (e.g. inference time should be divided by the batch size).

  6. 6.

    Source code for this paper can be found at https://github.com/asash/shallow-cross-encoders.

  7. 7.

    We do not use the standard MSMARCO triplets file because it only contains one negative per query, and for gBCE training scheme we need up to 128 negatives.

  8. 8.

    https://huggingface.co/prajjwal1/bert-tiny.

  9. 9.

    https://huggingface.co/castorini/monot5-base-msmarco-10k.

  10. 10.

    https://huggingface.co/castorini/monobert-large-msmarco-finetune-only.

References

  1. Bajaj, P., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Proceedings of NeurIPS (2018)

    Google Scholar 

  2. Bhargava, P., Drozd, A., Rogers, A.: Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics (2021). http://arxiv.org/abs/2110.01518

  3. Bruch, S., Lucchese, C., Nardini, F.M.: Efficient and effective tree-based and neural learning to rank. Found. Trends® Inf. Retrieval 17(1), 1–123 (2023)

    Google Scholar 

  4. Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. In: Proceedings of TREC (2020)

    Google Scholar 

  5. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. In: Proceedings of TREC (2019)

    Google Scholar 

  6. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of of NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  8. Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of SIGIR, pp. 993–1002 (2011)

    Google Scholar 

  9. Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of SIGIR, pp. 2288–2292 (2021)

    Google Scholar 

  10. Goodfellow, I., Bengio, Y., Courville, A., Bach, F.: Deep Learning. MIT Press, Cambridge (2017)

    Google Scholar 

  11. Hofstätter, S., Althammer, S., Schröder, M., Sertkan, M., Hanbury, A.: Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation (2021). http://arxiv.org/abs/2010.02666

  12. Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of SIGIR, pp. 113–122 (2021)

    Google Scholar 

  13. Hofstätter, S., Zlabinger, M., Hanbury, A.: Interpretable & time-budget-constrained contextualization for re-ranking. In: Proceedings of ECAI (2020)

    Google Scholar 

  14. Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (2020). http://arxiv.org/abs/1905.01969

  15. Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of SIGIR, pp. 39–48 (2020)

    Google Scholar 

  16. Kohavi, R., Tang, D., Xu, Y.: Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press, Cambridge (2020)

    Book  Google Scholar 

  17. Kulkarni, H., MacAvaney, S., Goharian, N., Frieder, O.: Lexically-accelerated dense retrieval. In: Proceedings of SIGIR, pp. 152–162 (2023)

    Google Scholar 

  18. Lin, S.C., Yang, J.H., Lin, J.: In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval. In: Proceedings of RepL4NLP, pp. 163–173 (2021)

    Google Scholar 

  19. Lu, W., Jiao, J., Zhang, R.: TwinBERT: distilling knowledge to twin-structured compressed BERT models for large-scale retrieval. In: Proceedings of CIKM, pp. 2645–2652 (2020)

    Google Scholar 

  20. MacAvaney, S., Macdonald, C.: A python interface to PISA! In: Proceedings of SIGIR, pp. 3339–3344 (2022)

    Google Scholar 

  21. MacAvaney, S., Macdonald, C., Ounis, I.: Streamlining evaluation with IR-measures. In: Proceedings of ECIR, pp. 305–310 (2022)

    Google Scholar 

  22. MacAvaney, S., Nardini, F.M., Perego, R., Tonellotto, N., Goharian, N., Frieder, O.: Expansion via prediction of importance with contextualization. In: Proceedings of SIGIR, pp. 1573–1576 (2020)

    Google Scholar 

  23. MacAvaney, S., Tonellotto, N., Macdonald, C.: Adaptive re-ranking with a corpus graph. In: Proceedings of SIGIR, pp. 1491–1500 (2022)

    Google Scholar 

  24. MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: Proceedings of SIGIR, pp. 1101–1104 (2019)

    Google Scholar 

  25. Macdonald, C., Tonellotto, N., MacAvaney, S., Ounis, I.: PyTerrier: declarative experimentation in python from BM25 to dense retrieval. In: Proceedings of CIKM, pp. 4526–4533 (2021)

    Google Scholar 

  26. Mallia, A., Siedlaczek, M., Mackenzie, J.M., Suel, T.: PISA: performant indexes and search for academia. In: Proceedings of OSIRRC@SIGIR 2019, vol. 2409, pp. 50–56 (2019)

    Google Scholar 

  27. Nogueira, R., Cho, K.: Passage Re-ranking with BERT (2020). http://arxiv.org/abs/1901.04085

  28. Nogueira, R., Jiang, Z., Lin, J.: Document Ranking with a Pretrained Sequence-to-Sequence Model (2020). http://arxiv.org/abs/2003.06713

  29. Petrov, A.V., Macdonald, C.: gSASRec: reducing overconfidence in sequential recommendation trained with negative sampling. In: Proceedings of RecSys, pp. 116–128 (2023)

    Google Scholar 

  30. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of EMNLP (2019)

    Google Scholar 

  31. Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC 3. In: Proceedings of TREC (1994)

    Google Scholar 

  32. Scells, H., Zhuang, S., Zuccon, G.: Reduce, reuse, recycle: green information retrieval research. In: Proceedings of SIGIR, pp. 2825–2837 (2022)

    Google Scholar 

  33. Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models (2021). http://arxiv.org/abs/2104.08663

  34. Turc, I., Chang, M.W., Lee, K., Toutanova, K.: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (2019). http://arxiv.org/abs/1908.08962

  35. Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)

    Google Scholar 

  36. Wallat, J., Beringer, F., Anand, A., Anand, A.: Probing BERT for ranking abilities. In: Proceedings of ECIR, pp. 255–273 (2023)

    Google Scholar 

  37. Wang, X., MacAvaney, S., Macdonald, C., Ounis, I.: An inspection of the reproducibility and replicability of TCT-ColBERT. In: Proceedings of SIGIR, pp. 2790–2800 (2022)

    Google Scholar 

  38. Wang, X., Macdonald, C., Tonellotto, N., Ounis, I.: Reproducibility, replicability, and insights into dense multi-representation retrieval models: from ColBERT to Col*. In: Proceedings of SIGIR, pp. 2552–2561 (2023)

    Google Scholar 

  39. Wolf, T., et al.: HuggingFace’s Transformers: State-of-the-art Natural Language Processing (2020). http://arxiv.org/abs/1910.03771

  40. **ong, L., et al.: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval (2020)

    Google Scholar 

  41. Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. In: Proceedings of SIGIR, pp. 2308–2313 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandr V. Petrov .

Editor information

Editors and Affiliations

A Effect of gBCE Training Scheme on Tiny BERT-Based Cross-Encoder on the MS MARCO Dev Set

A Effect of gBCE Training Scheme on Tiny BERT-Based Cross-Encoder on the MS MARCO Dev Set

Table 3 reports the effectiveness of a Tiny BERT model on a 6,980 queries sub-set of the MS MARCO dev set (dataset irds:msmarco-passage/dev/small in PyTerrier). The evaluation follows the scheme described in Sect. 4.3, with the exception of using MRR@10, which is the official metric for this queryset, instead of NDCG@10. As we can see from the table, the overall trends follow the observations in Sect. 4.3. In particular, an increased number of negatives is more important than the loss function; gBCE loss improves results with a small number of negatives but has a moderate effect when the number of negatives increases. However, we observe that overall gBCE in this experiment is better than BCE loss in 5 out of 6 cases. With 1 negative, the improvement over BCE loss is statistically significant. Overall, the combination of gBCE loss and 128 number of negatives provides a significant improvement of MRR@10, from 0.2942 to 0.3200 (+8.76%), compared to the “standard” training scheme with 1 negative and BCE loss. Note that this result is lower compared to the larger models – e.g. Nogueira et al. [27] achieved MRR@10 of 0.36 on this queryset with a BERT-Large model. Lower effectiveness compared to the full-scale models is an expected result, as we do not control for latency in this experiment. When latency is limited, shallow Cross-Encoders are more effective (see Fig. 1).

Table 3. Effect of the loss function and the number of negatives training on Tiny BERT-based Cross-Encoder MRR@10 on the MS MARCO dev set. Bold indicates the best result, and * indicates a statistically significant difference (\(pvalue < 0.05\)) compared to the baseline (BCE loss, one negative).

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Petrov, A.V., MacAvaney, S., Macdonald, C. (2024). Shallow Cross-Encoders for Low-Latency Retrieval. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56063-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56062-0

  • Online ISBN: 978-3-031-56063-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation