TY - GEN
T1 - A probabilistic framework for estimating pairwise distances through crowdsourcing
AU - Rahman, Habibur
AU - Roy, Senjuti Basu
AU - Das, Gautam
N1 - Publisher Copyright:
© 2017, Copyright is with the authors.
PY - 2017
Y1 - 2017
N2 - Estimating all pairs of distances among a set of objects has wide applicability in various computational problems in databases, machine learning, and statistics. This work presents a probabilistic framework for estimating all pair distances through crowdsourcing, where the human workers are involved to provide distance between some object pairs. Since the workers are subject to error, their responses are considered with a probabilistic interpretation. In particular, the framework comprises of three problems: (1) Given multiple feedback on an object pair, how do we combine and aggregate those feedback and create a probability distribution of the distance? (2) Since the number of possible pairs is quadratic in the number of objects, how do we estimate, from the known feedback for a small numbers of object pairs, the unknown distances among all other object pairs? For this problem, we leverage the metric property of distance, in particular, the triangle inequality property in a probabilistic settings. (3) Finally, how do we improve our estimate by soliciting additional feedback from the crowd? For all three problems, we present principled modeling and solutions. We experimentally evaluate our proposed framework by involving multiple real-world and large scale synthetic data, by enlisting workers from a crowdsourcing platform.
AB - Estimating all pairs of distances among a set of objects has wide applicability in various computational problems in databases, machine learning, and statistics. This work presents a probabilistic framework for estimating all pair distances through crowdsourcing, where the human workers are involved to provide distance between some object pairs. Since the workers are subject to error, their responses are considered with a probabilistic interpretation. In particular, the framework comprises of three problems: (1) Given multiple feedback on an object pair, how do we combine and aggregate those feedback and create a probability distribution of the distance? (2) Since the number of possible pairs is quadratic in the number of objects, how do we estimate, from the known feedback for a small numbers of object pairs, the unknown distances among all other object pairs? For this problem, we leverage the metric property of distance, in particular, the triangle inequality property in a probabilistic settings. (3) Finally, how do we improve our estimate by soliciting additional feedback from the crowd? For all three problems, we present principled modeling and solutions. We experimentally evaluate our proposed framework by involving multiple real-world and large scale synthetic data, by enlisting workers from a crowdsourcing platform.
UR - http://www.scopus.com/inward/record.url?scp=85046421978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046421978&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2017.24
DO - 10.5441/002/edbt.2017.24
M3 - Conference contribution
AN - SCOPUS:85046421978
T3 - Advances in Database Technology - EDBT
SP - 258
EP - 269
BT - Advances in Database Technology - EDBT 2017
A2 - Mitschang, Bernhard
A2 - Markl, Volker
A2 - Bress, Sebastian
A2 - Andritsos, Periklis
A2 - Sattler, Kai-Uwe
A2 - Orlando, Salvatore
PB - OpenProceedings.org
T2 - 20th International Conference on Extending Database Technology, EDBT 2017
Y2 - 21 March 2017 through 24 March 2017
ER -