A prediction model for web search hit counts using word frequencies

Tian Tian, Soon Ae Chun, James Geller

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


A search engine user with a well-defined information need is not interested in getting thousands of hits, but a few hits that are all highly relevant to their search. Often search words need to be refined and augmented to narrow results to more relevant pages. However, an overly specific query may lead to no hits at all, while most typical queries lead to thousands or even millions of them, both undesirable outcomes. This paper suggests a query rewriting method for generating alternative query strings and proposes a hit count prediction model for predicting the number of search engine hits for each alternative query string, based on the English language frequencies of the words in the search terms. Using the hit count prediction model, different types of search strategies, such as a lowest hit count query preference, can be utilized to improve users' search experience. We present an evaluation experiment of the hit count prediction model for three major search engines. We also discuss and quantify how far the Google, Yahoo! and Bing search engines diverge from monotonic behav our, considering negative and positive search terms separately.

Original languageEnglish (US)
Pages (from-to)462-475
Number of pages14
JournalJournal of Information Science
Issue number5
StatePublished - Oct 2011

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Library and Information Sciences


  • hit count estimations
  • negative search terms
  • positive search terms
  • prediction of hit counts
  • semantic search methods


Dive into the research topics of 'A prediction model for web search hit counts using word frequencies'. Together they form a unique fingerprint.

Cite this