TY - GEN

T1 - A probabilistic framework for estimating pairwise distances through crowdsourcing

AU - Rahman, Habibur

AU - Roy, Senjuti Basu

AU - Das, Gautam

N1 - Publisher Copyright:
© 2017, Copyright is with the authors.

PY - 2017

Y1 - 2017

N2 - Estimating all pairs of distances among a set of objects has wide applicability in various computational problems in databases, machine learning, and statistics. This work presents a probabilistic framework for estimating all pair distances through crowdsourcing, where the human workers are involved to provide distance between some object pairs. Since the workers are subject to error, their responses are considered with a probabilistic interpretation. In particular, the framework comprises of three problems: (1) Given multiple feedback on an object pair, how do we combine and aggregate those feedback and create a probability distribution of the distance? (2) Since the number of possible pairs is quadratic in the number of objects, how do we estimate, from the known feedback for a small numbers of object pairs, the unknown distances among all other object pairs? For this problem, we leverage the metric property of distance, in particular, the triangle inequality property in a probabilistic settings. (3) Finally, how do we improve our estimate by soliciting additional feedback from the crowd? For all three problems, we present principled modeling and solutions. We experimentally evaluate our proposed framework by involving multiple real-world and large scale synthetic data, by enlisting workers from a crowdsourcing platform.

AB - Estimating all pairs of distances among a set of objects has wide applicability in various computational problems in databases, machine learning, and statistics. This work presents a probabilistic framework for estimating all pair distances through crowdsourcing, where the human workers are involved to provide distance between some object pairs. Since the workers are subject to error, their responses are considered with a probabilistic interpretation. In particular, the framework comprises of three problems: (1) Given multiple feedback on an object pair, how do we combine and aggregate those feedback and create a probability distribution of the distance? (2) Since the number of possible pairs is quadratic in the number of objects, how do we estimate, from the known feedback for a small numbers of object pairs, the unknown distances among all other object pairs? For this problem, we leverage the metric property of distance, in particular, the triangle inequality property in a probabilistic settings. (3) Finally, how do we improve our estimate by soliciting additional feedback from the crowd? For all three problems, we present principled modeling and solutions. We experimentally evaluate our proposed framework by involving multiple real-world and large scale synthetic data, by enlisting workers from a crowdsourcing platform.

UR - http://www.scopus.com/inward/record.url?scp=85046421978&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046421978&partnerID=8YFLogxK

U2 - 10.5441/002/edbt.2017.24

DO - 10.5441/002/edbt.2017.24

M3 - Conference contribution

AN - SCOPUS:85046421978

T3 - Advances in Database Technology - EDBT

SP - 258

EP - 269

BT - Advances in Database Technology - EDBT 2017

A2 - Mitschang, Bernhard

A2 - Markl, Volker

A2 - Bress, Sebastian

A2 - Andritsos, Periklis

A2 - Sattler, Kai-Uwe

A2 - Orlando, Salvatore

PB - OpenProceedings.org

T2 - 20th International Conference on Extending Database Technology, EDBT 2017

Y2 - 21 March 2017 through 24 March 2017

ER -