Abstract
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
S1 and S3, as well as S2 and S\(3^\prime \), do not differ when averaging over the first queries.
- 5.
Applying the Bonferroni correction adjusts the alpha level to \(\alpha =\frac{0.05}{64}\approx 0.0008\) (considering eight users and eight query simulators for an alpha level of 0.05).
References
Allan, J., Harman, D., Kanoulas, E., Li, D., Gysel, C.V., Voorhees, E.M.: TREC 2017 common core track overview. In: Proceedings of the TREC (2017)
Azzopardi, L.: The economics in interactive information retrieval. In: Proceedings of the SIGIR, pp. 15–24 (2011)
Azzopardi, L., de Rijke, M.: Automatic construction of known-item finding test beds. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) Proceedings of the SIGIR, pp. 603–604 (2006)
Azzopardi, L., de Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six European languages. In: Proceedings of the SIGIR, pp. 455–462 (2007)
Baskaya, F., Keskustalo, H., Järvelin, K.: Time drives interaction: simulating sessions in diverse searching environments. In: Proceedings of the SIGIR, pp. 105–114 (2012)
Benham, R., Culpepper, J.S.: Risk-reward trade-offs in rank fusion. In: Proceedings of the ADCS, pp. 1:1–1:8 (2017)
Benham, R., et al.: RMIT at the 2017 TREC CORE track. In: Proceedings of the TREC (2017)
Benham, R., Mackenzie, J.M., Moffat, A., Culpepper, J.S.: Boosting search performance using query variations. ACM Trans. Inf. Syst. 37(4), 41:1-41:25 (2019)
Berendsen, R., Tsagkias, M., de Rijke, M., Meij, E.: Generating pseudo test collections for learning to rank scientific articles. In: Proceedings of the CLEF, pp. 42–53 (2012)
Breuer, T., et al.: How to measure the reproducibility of system-oriented IR experiments. In: Proceedings of the SIGIR, pp. 349–358 (2020)
Breuer, T., Ferro, N., Maistro, M., Schaer, P.: Repro_eval: a python interface to reproducibility measures of system-oriented IR experiments. In: Proceedings of the ECIR, pp. 481–486 (2021)
Carterette, B., Bah, A., Zengin, M.: Dynamic test collections for retrieval evaluation. In: Proceedings of the ICTIR, pp. 91–100. ACM (2015)
Chuklin, A., Markov, I., de Rijke, M.: Click models for web search. In: Retrieval, and Services, Morgan & Claypool Publishers, Synthesis Lectures on Information Concepts (2015)
Craswell, N., Campos, D., Mitra, B., Yilmaz, E., Billerbeck, B.: ORCAS: 20 million clicked query-document pairs for analyzing search. In: Proceedings of the CIKM, pp. 2983–2989 (2020)
Croft, W.B., Harper, D.J.: Using probabilistic models of document retrieval without relevance information. J. Document. 35(4), 285–295 (1979)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the SIGIR, pp. 299–306 (2002)
Eickhoff, C., Teevan, J., White, R., Dumais, S.T.: Lessons from the journey: a query log analysis of within-session learning. In: Proceedings of the WSDM, pp. 223–232 (2014)
Faggioli, G., Zendel, O., Culpepper, J.S., Ferro, N., Scholer, F.: An enhanced evaluation framework for query performance prediction. In: Proceedings of the ECIR, pp. 115–129 (2021)
Guan, D., Zhang, S., Yang, H.: Utilizing query change for session search. In: Proceedings of the SIGIR, pp. 453–462 (2013)
Günther, S., Hagen, M.: Assessing query suggestions for search session simulation. In: Proceedings of the Sim4IR (2021). http://ceur-ws.org/Vol-2911/paper6.pdf
Gysel, C.V., Kanoulas, E., de Rijke, M.: Lexical query modeling in session search. In: Proceedings of the ICTIR, pp. 69–72 (2016)
He, Y., Tang, J., Ouyang, H., Kang, C., Yin, D., Chang, Y.: Learning to rewrite queries. In: Proceedings of the CIKM, pp. 1443–1452 (2016)
Herdagdelen, A., et al.: Generalized syntactic and semantic models of query reformulation. In: Proceedings of the SIGIR, pp. 283–290 (2010)
Huurnink, B., Hofmann, K., de Rijke, M., Bron, M.: Validating query simulators: an experiment using commercial searches and purchases. In: Proceedings of the CLEF, pp. 40–51 (2010)
Jansen, B.J., Booth, D.L., Spink, A.: Patterns of query reformulation during web searching. J. Assoc. Inf. Sci. Technol. 60(7), 1358–1371 (2009)
Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted cumulated gain based evaluation of multiple-query IR sessions. In: Proceedings of the ECIR, pp. 4–15 (2008)
Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proceedings of the WWW, pp. 387–396 (2006)
Jordan, C., Watters, C.R., Gao, Q.: Using controlled query generation to evaluate blind relevance feedback algorithms. In: Proceedings of the JCDL, pp. 286–295 (2006)
Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M.: Test collection-based IR evaluation needs extension toward sessions - a case of extremely short queries. In: Proceedings of the AIRS, pp. 63–74 (2009)
Lin, J., Ma, X., Lin, S., Yang, J., Pradeep, R., Nogueira, R.: Pyserini: a python toolkit for reproducible information retrieval research with sparse and dense representations. In: Proceedings of the SIGIR, pp. 2356–2362. ACM (2021)
Liu, B., Craswell, N., Lu, X., Kurland, O., Culpepper, J.S.: A comparative analysis of human and automatic query variants. In: Proceedings of the SIGIR, pp. 47–50 (2019)
Liu, J., Sarkar, S., Shah, C.: Identifying and predicting the states of complex search tasks. In: Proceedings of the CHIIR, pp. 193–202 (2020)
Mackenzie, J., Moffat, A.: Modality effects when simulating user querying tasks. In: Proceedings of the ICTIR, pp. 197–201 (2021)
Maxwell, D., Azzopardi, L.: Agents, simulated users and humans: an analysis of performance and behaviour. In: Proceedings of the CIKM, pp. 731–740. ACM (2016)
Maxwell, D., Azzopardi, L.: Simulating interactive information retrieval: simiir: a framework for the simulation of interaction. In: Proceedings of the SIGIR, pp. 1141–1144. ACM (2016)
Moffat, A., Scholer, F., Thomas, P., Bailey, P.: Pooled evaluation over query variations: users are as diverse as systems. In: Proceedings of the CIKM, pp. 1759–1762 (2015)
Pääkkönen, T., Kekäläinen, J., Keskustalo, H., Azzopardi, L., Maxwell, D., Järvelin, K.: Validating simulated interaction for retrieval evaluation. Inf. Ret. J. 20(4), 338–362 (2017). https://doi.org/10.1007/s10791-017-9301-2
Ruthven, I., Lalmas, M.: A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev. 18(2), 95–145 (2003)
Sloan, M., Yang, H., Wang, J.: A term-based methodology for query reformulation understanding. Inf. Retriev. J. 18(2), 145–165 (2015). https://doi.org/10.1007/s10791-015-9251-5
Tague, J., Nelson, M.J.: Simulation of user judgments in bibliographic retrieval systems. In: Proceedings of the SIGIR, pp. 66–71 (1981)
Verberne, S., Sappelli, M., Järvelin, K., Kraaij, W.: User simulations for interactive search: Evaluating personalized query suggestion. In: Proceedings of the ECIR, pp. 678–690 (2015)
Verberne, S., Sappelli, M., Kraaij, W.: Query term suggestion in academic search. In: Proceedings of the ECIR, pp. 560–566 (2014)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of the SIGIR, pp. 315–323 (1998)
Yang, H., Guan, D., Zhang, S.: The query change model: modeling session search as a Markov decision process. ACM Trans. Inf. Syst. 33(4), 20:1-20:33 (2015)
Yang, P., Fang, H., Lin, J.: Anserini: reproducible ranking baselines using Lucene. ACM J. Data Inf. Qual. 10(4), 16:1-16:20 (2018)
Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the SIGIR, pp. 334–342 (2001)
Zhang, Y., Liu, X., Zhai, C.: Information retrieval evaluation as search simulation: a general formal framework for IR evaluation. In: Proceedings of the ICTIR, pp. 193–200 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Breuer, T., Fuhr, N., Schaer, P. (2022). Validating Simulations of User Query Variants. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-99736-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99735-9
Online ISBN: 978-3-030-99736-6
eBook Packages: Computer ScienceComputer Science (R0)