Predicting web search hit counts

Tian Tian, James Geller, Soon Ae Chun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations


Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user's new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.

Original languageEnglish (US)
Title of host publication2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010
Number of pages5
StatePublished - Dec 13 2010
Event2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010 - Toronto, ON, Canada
Duration: Aug 31 2010Sep 3 2010


Other2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010
CityToronto, ON

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Software

Fingerprint Dive into the research topics of 'Predicting web search hit counts'. Together they form a unique fingerprint.

Cite this