The exploratory labeling assistant: Mixed-initiative label curation with large document collections

Cristian Felix, Aritra Dasgupta, Enrico Bertini

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Scopus citations

Abstract

In this paper, we define the concept of exploratory labeling: the use of computational and interactive methods to help analysts categorize groups of documents into a set of unknown and evolving labels. While many computational methods exist to analyze data and build models once the data is organized around a set of predefined categories or labels, few methods address the problem of reliably discovering and curating such labels in the first place. In order to move first steps towards bridging this gap, we propose an interactive visual data analysis method that integrates human-driven label ideation, specification and refinement with machine-driven recommendations. The proposed method enables the user to progressively discover and ideate labels in an exploratory fashion and specify rules that can be used to automatically match sets of documents to labels. To support this process of ideation, specification, as well as evaluation of the labels, we use unsupervised machine learning methods that provide suggestions and data summaries. We evaluate our method by applying it to a real-world labeling problem as well as through controlled user studies to identify and reflect on patterns of interaction emerging from exploratory labeling activities.

Original languageEnglish (US)
Title of host publicationUIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology
PublisherAssociation for Computing Machinery, Inc
Pages153-164
Number of pages12
ISBN (Electronic)9781450359481
DOIs
StatePublished - Oct 11 2018
Externally publishedYes
Event31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018 - Berlin, Germany
Duration: Oct 14 2018Oct 17 2018

Publication series

NameUIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology

Conference

Conference31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018
Country/TerritoryGermany
CityBerlin
Period10/14/1810/17/18

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design
  • Software

Keywords

  • Document labeling
  • Exploratory labeling
  • Text analysis
  • Visualization

Fingerprint

Dive into the research topics of 'The exploratory labeling assistant: Mixed-initiative label curation with large document collections'. Together they form a unique fingerprint.

Cite this