A parallel varied density-based clustering algorithm with optimized data partition

Yuhua Gu, Xinyue Ye, Feng Zhang, Zhenhong Du, Renyi Liu, Lifeng Yu

Research output: Contribution to journalReview articlepeer-review

3 Scopus citations

Abstract

This paper presents a parallel varied density-based clustering algorithm with optimized data partition (PVDB). First, we improve the partition with reduced boundary points algorithm using shared nearest neighbour (SNN) methods and propose the reachable partition with reduced boundary points algorithm. Second, we introduce a layered grouping grid structure and propose an efficient k nearest neighbour (kNN) search method. This method enhances the efficiency of kNN searches and determines whether kNNs are in their own partitions. Third, we propose a new merging strategy for connecting clusters in different partitions, based on the reachable point concept. Meanwhile, the strategy avoids connecting clusters with varying densities by SNN as occurs with SNN-based clustering methods. Our algorithm is implemented and compared with DBSCAN-MR and GriDBSCAN using the MapReduce paradigm and shows better varied density clustering capability and scalability. In addition, varied applications show our algorithm’s capability of discerning spatial patterns and extending to many fields.

Original languageEnglish (US)
Pages (from-to)93-114
Number of pages22
JournalJournal of Spatial Science
Volume63
Issue number1
DOIs
StatePublished - Jan 2 2018
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Geography, Planning and Development
  • Energy(all)
  • Atmospheric Science

Keywords

  • Data mining
  • data partition
  • layered grouping grid-based k nearest neighbour searching
  • parallel clustering
  • varied density-based clustering

Fingerprint

Dive into the research topics of 'A parallel varied density-based clustering algorithm with optimized data partition'. Together they form a unique fingerprint.

Cite this