Hello. Sign in to personalize your visit. New user? Register now.


Estimating the Query Difficulty for Information Retrieval

Synthesis Lectures on Information Concepts, Retrieval, and Services

David Carmel​‌
IBM Research, Israel
Elad Yom-Tov​‌
IBM Research, Israel


Many information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is desirable that IR systems will be able to identify "difficult" queries so they can be handled properly. Understanding why some queries are inherently more difficult than others is essential for IR, and a good answer to this important question will help search engines to reduce the variance in performance, hence better servicing their customer needs. Estimating the query difficulty is an attempt to quantify the quality of search results retrieved for a query from a given collection of documents. This book discusses the reasons that cause search engines to fail for some of the queries, and then reviews recent approaches for estimating query difficulty in the IR field. It then describes a common methodology for evaluating the prediction quality of those estimators, and experiments with some of the predictors applied by various IR methods over several TREC benchmarks. Finally, it discusses potential applications that can utilize query difficulty estimators by handling each query individually and selectively, based upon its estimated difficulty.

Table of Contents: Introduction - The Robustness Problem of Information Retrieval / Basic Concepts / Query Performance Prediction Methods / Pre-Retrieval Prediction Methods / Post-Retrieval Prediction Methods / Combining Predictors / A General Model for Query Difficulty / Applications of Query Difficulty Estimation / Summary and Conclusions

PDF (959 KB) PDF Plus (802 KB)

Cited by

Maryam Khodabakhsh, Ebrahim Bagheri​‌. (2021) Semantics-enabled query performance prediction for ad hoc table retrieval. Information Processing & Management 58:1, 102399.
Online publication date: 1-Jan-2021.
Nicola Tonellotto, Craig Macdonald​‌. (2020) Using an Inverted Index Synopsis for Query Latency and Performance Prediction. ACM Transactions on Information Systems 38:3, 1-33.
Online publication date: 13-May-2020.
Negar Arabzadeh, Fattane Zarrinkalam, Jelena Jovanovic, Ebrahim Bagheri​‌. 2020. Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction. Advances in Information Retrieval, 78-85.
Josiane Mothe, Lea Laporte, Adrian-Gabriel Chifu​‌. (2019) Predicting Query Difficulty in IR: Impact of Difficulty Definition. 2019 11th International Conference on Knowledge and Systems Engineering (KSE), 1-6.
Daniel Valcarce, Javier Parapar, Álvaro Barreiro​‌. (2019) Document-based and term-based linear methods for pseudo-relevance feedback. ACM SIGAPP Applied Computing Review 18:4, 5-17.
Online publication date: 15-Jan-2019.
Alejandro Bellogín, Alan Said​‌. 2019. Information Retrieval and Recommender Systems. Data Science in Practice, 79-96.
Eduardo Vicente-López, Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete​‌. (2018) Predicting IR personalization performance using pre-retrieval query predictors. Journal of Intelligent Information Systems 51:3, 597-620.
Online publication date: 30-Jan-2018.
Kevin Roitero, Marco Passon, Giuseppe Serra, Stefano Mizzaro​‌. (2018) Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments. Journal of Data and Information Quality 10:3, 1-32.
Online publication date: 13-Oct-2018.
Hamed Zamani, W. Bruce Croft, J. Shane Culpepper​‌. (2018) Neural Query Performance Prediction using Weak Supervision from Multiple Signals. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18, 105-114.
Maram Hasanain, Tamer Elsayed​‌. (2017) Query performance prediction for microblog search. Information Processing & Management 53:6, 1320-1341.
Online publication date: 1-Nov-2017.
Toine Bogers, Vivien Petras​‌. (2017) Supporting Book Search: A Comprehensive Comparison of Tags vs. Controlled Vocabulary Metadata. Data and Information Management 1:1, 17-34.
Online publication date: 29-Sep-2017.
Mohammad Alsulmi, Ben Carterette​‌. (2016) Learning to predict the performance of clinical queries using an integrated approach. 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 930-937.
, , . (2016) Dynamic Information Retrieval Modeling. Synthesis Lectures on Information Concepts, Retrieval, and Services 8:3, 1-144.
Online publication date: 16-Jun-2016.
Abstract | PDF (2391 KB) | PDF Plus (2299 KB) 
Huan Gui, Haishan Liu, Xiangrui Meng, Anmol Bhasin, Jiawei Han​‌. (2016) Downside management in recommender systems. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 394-401.
Luca Soldaini, Andrew Yates, Elad Yom-Tov, Ophir Frieder, Nazli Goharian​‌. (2016) Enhancing web search in the medical domain via query clarification. Information Retrieval Journal 19:1-2, 149-173.
Online publication date: 16-Jul-2015.
Elad Kravi, Ido Guy, Avihai Mejer, David Carmel, Yoelle Maarek, Dan Pelleg, Gilad Tsur​‌. (2016) One Query, Many Clicks. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16, 1423-1432.
Romain Deveaud, Josiane Mothe, Jian-Yun Nia​‌. (2016) Learning to Rank System Configurations. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16, 2001-2004.
Maram Hasanain, Tamer Elsayed, Walid Magdy​‌. 2015. Improving Tweet Timeline Generation by Predicting Optimal Retrieval Depth. Information Retrieval Technology, 135-146.
Victor Makarenkov, Bracha Shapira, Lior Rokach​‌. (2015) Theoretical Categorization of Query Performance Predictors. Proceedings of the 2015 International Conference on Theory of Information Retrieval - ICTIR '15, 369-372.
Dipasree Pal, Mandar Mitra, Samar Bhattacharya​‌. (2014) Using Multiple Query Expansion Algorithms to Predict Query Performance. 2014 Fourth International Conference of Emerging Applications of Information Technology, 361-364.
Surendra Sarnikar, Zhu Zhang, J. Leon Zhao​‌. (2014) Query-performance prediction for effective query routing in domain-specific repositories. Journal of the Association for Information Science and Technology 65:8, 1597-1614.
Online publication date: 11-Apr-2014.
Xiang Wang, Tiejian Luo, Yu Huang, Wenjie Wang​‌. 2014. Evaluating Query Performance Predictors Based on Brownian Distance Correlation. Pervasive Computing and the Networked World, 643-654.
Nut Limsopatham, Craig Macdonald, Iadh Ounis​‌. (2014) Modelling Relevance towards Multiple Inclusion Criteria when Ranking Patients.. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14, 1639-1648.
Lior Meister, Oren Kurland, Inna Gelfer Kalmanovich​‌. (2011) Re-ranking search results using an additional retrieved list. Information Retrieval 14:4, 413-437.
Online publication date: 17-Nov-2010.

Prev. lecture | Next lecture
View/Print PDF (959 KB)
View PDF Plus (802 KB)
Add to favorites
Email to a friend
TOC Alert | Citation Alert What is RSS?

Quick Search
David Carmel
Elad Yom-Tov
information retrieval
retrieval robustness
query difficulty estimation
performance prediction
Home | Synthesis | Search | Profile | Access | Author | Help | About
Technology Partner - Atypon Systems, Inc.