Abstract
This paper presents a parallel varied density-based clustering algorithm with optimized data partition (PVDB). First, we improve the partition with reduced boundary points algorithm using shared nearest neighbour (SNN) methods and propose the reachable partition with reduced boundary points algorithm. Second, we introduce a layered grouping grid structure and propose an efficient k nearest neighbour (kNN) search method. This method enhances the efficiency of kNN searches and determines whether kNNs are in their own partitions. Third, we propose a new merging strategy for connecting clusters in different partitions, based on the reachable point concept. Meanwhile, the strategy avoids connecting clusters with varying densities by SNN as occurs with SNN-based clustering methods. Our algorithm is implemented and compared with DBSCAN-MR and GriDBSCAN using the MapReduce paradigm and shows better varied density clustering capability and scalability. In addition, varied applications show our algorithm’s capability of discerning spatial patterns and extending to many fields.
Original language | English (US) |
---|---|
Pages (from-to) | 93-114 |
Number of pages | 22 |
Journal | Journal of Spatial Science |
Volume | 63 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2 2018 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Geography, Planning and Development
- General Energy
- Atmospheric Science
Keywords
- Data mining
- data partition
- layered grouping grid-based k nearest neighbour searching
- parallel clustering
- varied density-based clustering