TY - GEN

T1 - A probabilistic framework for estimating pairwise distances through crowdsourcing

AU - Rahman, Habibur

AU - Roy, Senjuti Basu

AU - Das, Gautam

N1 - Funding Information:
8. CONCLUSION We present a probabilistic distance estimation framework in crowdsourcing platforms that has wide applicability in different domains. One of the novel contributions of the work is to consider worker feedback with probabilistic interpretation and describe the overall framework with three key components.The effectiveness of our proposed solutions are validated empirically using both real and synthetic data. Acknowledgment The work of Habibur Rahman and Gautam Das was supported in part by the Army Research Office under grant W911NF-15-1-0020, and a grant from Microsoft Research.
Publisher Copyright:
© 2017, Copyright is with the authors.

PY - 2017

Y1 - 2017

N2 - Estimating all pairs of distances among a set of objects has wide applicability in various computational problems in databases, machine learning, and statistics. This work presents a probabilistic framework for estimating all pair distances through crowdsourcing, where the human workers are involved to provide distance between some object pairs. Since the workers are subject to error, their responses are considered with a probabilistic interpretation. In particular, the framework comprises of three problems: (1) Given multiple feedback on an object pair, how do we combine and aggregate those feedback and create a probability distribution of the distance? (2) Since the number of possible pairs is quadratic in the number of objects, how do we estimate, from the known feedback for a small numbers of object pairs, the unknown distances among all other object pairs? For this problem, we leverage the metric property of distance, in particular, the triangle inequality property in a probabilistic settings. (3) Finally, how do we improve our estimate by soliciting additional feedback from the crowd? For all three problems, we present principled modeling and solutions. We experimentally evaluate our proposed framework by involving multiple real-world and large scale synthetic data, by enlisting workers from a crowdsourcing platform.

AB - Estimating all pairs of distances among a set of objects has wide applicability in various computational problems in databases, machine learning, and statistics. This work presents a probabilistic framework for estimating all pair distances through crowdsourcing, where the human workers are involved to provide distance between some object pairs. Since the workers are subject to error, their responses are considered with a probabilistic interpretation. In particular, the framework comprises of three problems: (1) Given multiple feedback on an object pair, how do we combine and aggregate those feedback and create a probability distribution of the distance? (2) Since the number of possible pairs is quadratic in the number of objects, how do we estimate, from the known feedback for a small numbers of object pairs, the unknown distances among all other object pairs? For this problem, we leverage the metric property of distance, in particular, the triangle inequality property in a probabilistic settings. (3) Finally, how do we improve our estimate by soliciting additional feedback from the crowd? For all three problems, we present principled modeling and solutions. We experimentally evaluate our proposed framework by involving multiple real-world and large scale synthetic data, by enlisting workers from a crowdsourcing platform.

UR - http://www.scopus.com/inward/record.url?scp=85046421978&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046421978&partnerID=8YFLogxK

U2 - 10.5441/002/edbt.2017.24

DO - 10.5441/002/edbt.2017.24

M3 - Conference contribution

AN - SCOPUS:85046421978

T3 - Advances in Database Technology - EDBT

SP - 258

EP - 269

BT - Advances in Database Technology - EDBT 2017

A2 - Mitschang, Bernhard

A2 - Markl, Volker

A2 - Bress, Sebastian

A2 - Andritsos, Periklis

A2 - Sattler, Kai-Uwe

A2 - Orlando, Salvatore

PB - OpenProceedings.org

T2 - 20th International Conference on Extending Database Technology, EDBT 2017

Y2 - 21 March 2017 through 24 March 2017

ER -