We consider the problem of 2-way interval join, where we want to find all pairs of overlapping intervals, i.e., intervals that share at least one point in common. We present lower and upper bounds on the replication rate for this problem when it is implemented in MapReduce. We study three cases, where intervals in the input are: (i) unit-length and equally-spaced, (ii) variable-length and equally-spaced, and (iii) equally-spaced with specific distribution of the various lengths. Our algorithms offer intuition as how to build algorithms for other cases, especially when we have some statistical knowledge about the distribution of the lengths of the intervals. E.g., if mostly large intervals interact with small intervals and not within themselves, then we believe our techniques can be extended to achieve better replication rate.
|Number of pages
|CEUR Workshop Proceedings
|Published - 2015
|Joint Workshops of the International Conference on Extending Database Technology and the International Conference on Database Theory, EDBT/ICDT-WS 2015 - Brussels, Belgium
Duration: Mar 27 2015 → …
All Science Journal Classification (ASJC) codes
- General Computer Science