Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing

A. Aydin Alatan, Ali N. Akansu, Wayne Wolf

Research output: Contribution to journalArticlepeer-review

45 Scopus citations

Abstract

A class of audio-visual data (fiction entertainment: movies, TV series) is segmented into scenes, which contain dialogs, using a novel hidden Markov model-based (HMM) method. Each shot is classified using both audio track (via classification of speech, silence and music) and visual content (face and location information). The result of this shot-based classification is an audio-visual token to be used by the HMM state diagram to achieve scene analysis. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.

Original languageEnglish (US)
Pages (from-to)137-151
Number of pages15
JournalMultimedia Tools and Applications
Volume14
Issue number2
DOIs
StatePublished - Jun 2001

All Science Journal Classification (ASJC) codes

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Keywords

  • Content-based indexing
  • Dialog scene analysis
  • Hidden Markov models
  • Multi-modal analysis

Fingerprint

Dive into the research topics of 'Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing'. Together they form a unique fingerprint.

Cite this