TY - GEN
T1 - Using internet glossaries to determine interests from home pages
AU - Portscher, Edwin
AU - Geller, James
AU - Scherl, Richard
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2003.
PY - 2003
Y1 - 2003
N2 - There are millions of home pages on the web. Each page contains valuable data about the page’s owner that can be used for marketing purposes. These pages have to be classified according to interests. The traditional Information Retrieval approach requires large training sets that are classified by human experts. Knowledge-based methods, which use handcrafted rules, require a significant investment to develop the rule base. Both these approaches are very time consuming. We are using glossaries, which are freely available on the Internet, to determine interests from home pages. Processing of these glossaries can be automated and requires little human effort and time, compared to the other two approaches. Once the terms have been extracted from these glossaries, they can be used to infer interests from the home pages of web users. This paper describes the system we have developed for classifying home pages by interests. On an experiment of 400 pages, we found that the glossary with the highest number of word matches is the correct interest in 44.75% of the pages. The correct interest is in the top three highest returned interests in 72.25% of the pages, and the correct interest is in the top five returned interest matches in 84.5% of the pages.1.
AB - There are millions of home pages on the web. Each page contains valuable data about the page’s owner that can be used for marketing purposes. These pages have to be classified according to interests. The traditional Information Retrieval approach requires large training sets that are classified by human experts. Knowledge-based methods, which use handcrafted rules, require a significant investment to develop the rule base. Both these approaches are very time consuming. We are using glossaries, which are freely available on the Internet, to determine interests from home pages. Processing of these glossaries can be automated and requires little human effort and time, compared to the other two approaches. Once the terms have been extracted from these glossaries, they can be used to infer interests from the home pages of web users. This paper describes the system we have developed for classifying home pages by interests. On an experiment of 400 pages, we found that the glossary with the highest number of word matches is the correct interest in 44.75% of the pages. The correct interest is in the top three highest returned interests in 72.25% of the pages, and the correct interest is in the top five returned interest matches in 84.5% of the pages.1.
UR - http://www.scopus.com/inward/record.url?scp=84955567665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84955567665&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84955567665
SN - 3540408088
SN - 9783540408086
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 248
EP - 258
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A2 - Bauknecht, Kurt
A2 - Min Tjoa, A.
A2 - Quirchmayr, Gerald
PB - Springer Verlag
T2 - 4th International Conference on E-Commerce and Web Technology, EC-Web 2003
Y2 - 2 September 2003 through 5 September 2003
ER -