A probabilistic framework for estimating pairwise distances through crowdsourcing

Habibur Rahman, Senjuti Basu Roy, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Estimating all pairs of distances among a set of objects has wide applicability in various computational problems in databases, machine learning, and statistics. This work presents a probabilistic framework for estimating all pair distances through crowdsourcing, where the human workers are involved to provide distance between some object pairs. Since the workers are subject to error, their responses are considered with a probabilistic interpretation. In particular, the framework comprises of three problems: (1) Given multiple feedback on an object pair, how do we combine and aggregate those feedback and create a probability distribution of the distance? (2) Since the number of possible pairs is quadratic in the number of objects, how do we estimate, from the known feedback for a small numbers of object pairs, the unknown distances among all other object pairs? For this problem, we leverage the metric property of distance, in particular, the triangle inequality property in a probabilistic settings. (3) Finally, how do we improve our estimate by soliciting additional feedback from the crowd? For all three problems, we present principled modeling and solutions. We experimentally evaluate our proposed framework by involving multiple real-world and large scale synthetic data, by enlisting workers from a crowdsourcing platform.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2017
Subtitle of host publication20th International Conference on Extending Database Technology, Proceedings
EditorsBernhard Mitschang, Volker Markl, Sebastian Bress, Periklis Andritsos, Kai-Uwe Sattler, Salvatore Orlando
PublisherOpenProceedings.org
Pages258-269
Number of pages12
ISBN (Electronic)9783893180738
DOIs
StatePublished - 2017
Event20th International Conference on Extending Database Technology, EDBT 2017 - Venice, Italy
Duration: Mar 21 2017Mar 24 2017

Publication series

NameAdvances in Database Technology - EDBT
Volume2017-March
ISSN (Electronic)2367-2005

Other

Other20th International Conference on Extending Database Technology, EDBT 2017
CountryItaly
CityVenice
Period3/21/173/24/17

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications

Fingerprint Dive into the research topics of 'A probabilistic framework for estimating pairwise distances through crowdsourcing'. Together they form a unique fingerprint.

Cite this