Abstract
Transformer-based Cross-Encoders achieve state-of-the-art effectivness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, kee** search latencies low is important for user satisfaction and energy usage. In this paper, we show that weaker shallow transformer models (i.e. transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings, since they can estimate the relevance of more documents in the same time budget. We further show that shallow transformers may benefit from the generalised Binary Cross-Entropy (gBCE) training scheme, which has recently demonstrated success for recommendation tasks. Our experiments with TREC Deep Learning passage ranking querysets demonstrate significant improvements in shallow and full-scale models in low-latency scenarios. For example, when the latency limit is 25 ms per query, MonoBERT-Large (a cross-encoder based on a full-scale BERT model) is only able to achieve NDCG@10 of 0.431 on TREC DL 2019, while TinyBERT-gBCE (a cross-encoder based on TinyBERT trained with gBCE) reaches NDCG@10 of 0.652, a +51% gain over MonoBERT-Large. We also show that shallow Cross-Encoders are effective even when used without a GPU (e.g., with CPU inference, NDCG@10 decreases only by 3% compared to GPU inference with 50 ms latency), which makes Cross-Encoders practical to run even without specialised hardware acceleration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In [6], Google researchers argued that for a smooth user experience, total search latency should be kept under 100 ms. This includes time for network round-trips, page rendering, and other overheads. Therefore, this paper uses a 50 ms cutoff for defining low-latency retrieval, leaving the remainder of the time to these other overheads.
- 2.
- 3.
- 4.
- 5.
For simplicity, we omit batching in this reasoning. With a slight tweaking, it remains valid for batching as well (e.g. inference time should be divided by the batch size).
- 6.
Source code for this paper can be found at https://github.com/asash/shallow-cross-encoders.
- 7.
We do not use the standard MSMARCO triplets file because it only contains one negative per query, and for gBCE training scheme we need up to 128 negatives.
- 8.
- 9.
- 10.
References
Bajaj, P., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Proceedings of NeurIPS (2018)
Bhargava, P., Drozd, A., Rogers, A.: Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics (2021). http://arxiv.org/abs/2110.01518
Bruch, S., Lucchese, C., Nardini, F.M.: Efficient and effective tree-based and neural learning to rank. Found. Trends® Inf. Retrieval 17(1), 1–123 (2023)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. In: Proceedings of TREC (2020)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. In: Proceedings of TREC (2019)
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of of NAACL-HLT, pp. 4171–4186 (2019)
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of SIGIR, pp. 993–1002 (2011)
Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of SIGIR, pp. 2288–2292 (2021)
Goodfellow, I., Bengio, Y., Courville, A., Bach, F.: Deep Learning. MIT Press, Cambridge (2017)
Hofstätter, S., Althammer, S., Schröder, M., Sertkan, M., Hanbury, A.: Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation (2021). http://arxiv.org/abs/2010.02666
Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of SIGIR, pp. 113–122 (2021)
Hofstätter, S., Zlabinger, M., Hanbury, A.: Interpretable & time-budget-constrained contextualization for re-ranking. In: Proceedings of ECAI (2020)
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (2020). http://arxiv.org/abs/1905.01969
Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of SIGIR, pp. 39–48 (2020)
Kohavi, R., Tang, D., Xu, Y.: Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press, Cambridge (2020)
Kulkarni, H., MacAvaney, S., Goharian, N., Frieder, O.: Lexically-accelerated dense retrieval. In: Proceedings of SIGIR, pp. 152–162 (2023)
Lin, S.C., Yang, J.H., Lin, J.: In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval. In: Proceedings of RepL4NLP, pp. 163–173 (2021)
Lu, W., Jiao, J., Zhang, R.: TwinBERT: distilling knowledge to twin-structured compressed BERT models for large-scale retrieval. In: Proceedings of CIKM, pp. 2645–2652 (2020)
MacAvaney, S., Macdonald, C.: A python interface to PISA! In: Proceedings of SIGIR, pp. 3339–3344 (2022)
MacAvaney, S., Macdonald, C., Ounis, I.: Streamlining evaluation with IR-measures. In: Proceedings of ECIR, pp. 305–310 (2022)
MacAvaney, S., Nardini, F.M., Perego, R., Tonellotto, N., Goharian, N., Frieder, O.: Expansion via prediction of importance with contextualization. In: Proceedings of SIGIR, pp. 1573–1576 (2020)
MacAvaney, S., Tonellotto, N., Macdonald, C.: Adaptive re-ranking with a corpus graph. In: Proceedings of SIGIR, pp. 1491–1500 (2022)
MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: Proceedings of SIGIR, pp. 1101–1104 (2019)
Macdonald, C., Tonellotto, N., MacAvaney, S., Ounis, I.: PyTerrier: declarative experimentation in python from BM25 to dense retrieval. In: Proceedings of CIKM, pp. 4526–4533 (2021)
Mallia, A., Siedlaczek, M., Mackenzie, J.M., Suel, T.: PISA: performant indexes and search for academia. In: Proceedings of OSIRRC@SIGIR 2019, vol. 2409, pp. 50–56 (2019)
Nogueira, R., Cho, K.: Passage Re-ranking with BERT (2020). http://arxiv.org/abs/1901.04085
Nogueira, R., Jiang, Z., Lin, J.: Document Ranking with a Pretrained Sequence-to-Sequence Model (2020). http://arxiv.org/abs/2003.06713
Petrov, A.V., Macdonald, C.: gSASRec: reducing overconfidence in sequential recommendation trained with negative sampling. In: Proceedings of RecSys, pp. 116–128 (2023)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of EMNLP (2019)
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC 3. In: Proceedings of TREC (1994)
Scells, H., Zhuang, S., Zuccon, G.: Reduce, reuse, recycle: green information retrieval research. In: Proceedings of SIGIR, pp. 2825–2837 (2022)
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models (2021). http://arxiv.org/abs/2104.08663
Turc, I., Chang, M.W., Lee, K., Toutanova, K.: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (2019). http://arxiv.org/abs/1908.08962
Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)
Wallat, J., Beringer, F., Anand, A., Anand, A.: Probing BERT for ranking abilities. In: Proceedings of ECIR, pp. 255–273 (2023)
Wang, X., MacAvaney, S., Macdonald, C., Ounis, I.: An inspection of the reproducibility and replicability of TCT-ColBERT. In: Proceedings of SIGIR, pp. 2790–2800 (2022)
Wang, X., Macdonald, C., Tonellotto, N., Ounis, I.: Reproducibility, replicability, and insights into dense multi-representation retrieval models: from ColBERT to Col*. In: Proceedings of SIGIR, pp. 2552–2561 (2023)
Wolf, T., et al.: HuggingFace’s Transformers: State-of-the-art Natural Language Processing (2020). http://arxiv.org/abs/1910.03771
**ong, L., et al.: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval (2020)
Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. In: Proceedings of SIGIR, pp. 2308–2313 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Effect of gBCE Training Scheme on Tiny BERT-Based Cross-Encoder on the MS MARCO Dev Set
A Effect of gBCE Training Scheme on Tiny BERT-Based Cross-Encoder on the MS MARCO Dev Set
Table 3 reports the effectiveness of a Tiny BERT model on a 6,980 queries sub-set of the MS MARCO dev set (dataset irds:msmarco-passage/dev/small in PyTerrier). The evaluation follows the scheme described in Sect. 4.3, with the exception of using MRR@10, which is the official metric for this queryset, instead of NDCG@10. As we can see from the table, the overall trends follow the observations in Sect. 4.3. In particular, an increased number of negatives is more important than the loss function; gBCE loss improves results with a small number of negatives but has a moderate effect when the number of negatives increases. However, we observe that overall gBCE in this experiment is better than BCE loss in 5 out of 6 cases. With 1 negative, the improvement over BCE loss is statistically significant. Overall, the combination of gBCE loss and 128 number of negatives provides a significant improvement of MRR@10, from 0.2942 to 0.3200 (+8.76%), compared to the “standard” training scheme with 1 negative and BCE loss. Note that this result is lower compared to the larger models – e.g. Nogueira et al. [27] achieved MRR@10 of 0.36 on this queryset with a BERT-Large model. Lower effectiveness compared to the full-scale models is an expected result, as we do not control for latency in this experiment. When latency is limited, shallow Cross-Encoders are more effective (see Fig. 1).
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Petrov, A.V., MacAvaney, S., Macdonald, C. (2024). Shallow Cross-Encoders for Low-Latency Retrieval. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-56063-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)