Research on topic discovery technology for Web news

Guixian Xu, Ziheng Yu, Changzhi Wang, Antai Wang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

With the development of information technology, Web news has become the main way of information dissemination. Web news topic discovery is useful for users to quickly find valuable information and its research is constantly improved. Traditional topic discovery research is based on vector space model, but it has the defects such as high dimension and data sparsity. However, the latent semantic analysis can map the high-dimensional and sparse words to k-dimensional semantic space and improve the similarity of the news of the same topic by the semantic correlation between words. In this paper, Web news topic discovery is studied. First, the set of Web news text is vectored and the weight of each feature in the texts is calculated by improved TFIDF. After the original text vector set is analysed by latent semantic analysis, the semantic relation is fully exploited between the texts and the words, and the news topics are extracted by clustering approach. For the extraction of sub-topics, the co-occurrence of words is used to display the sub-topics. In essence, the sub-topic vector is established through these co-occurrence words. The experimental results show that the proposed method can effectively capture the current hot topics of Web news and related sub-topics. It is meaningful for the technology of information retrieval and data mining.

Original languageEnglish (US)
Pages (from-to)73-83
Number of pages11
JournalNeural Computing and Applications
Volume32
Issue number1
DOIs
StatePublished - Jan 1 2020

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Keywords

  • Latent semantic analysis
  • Similarity
  • Topic discovery
  • Weight computation

Fingerprint

Dive into the research topics of 'Research on topic discovery technology for Web news'. Together they form a unique fingerprint.

Cite this