Billion-scale Detection of Isomorphic Nodes

Luca Cappelletti, Tommaso Fontana, Justin Reese, David A. Bader

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

This paper presents an algorithm for detecting attributed high-degree node isomorphism. High-degree isomorphic nodes seldom happen by chance and often represent duplicated entities or data processing errors. By definition, isomorphic nodes are topologically indistinguishable and can be problematic in graph ML tasks. The algorithm employs a parallel, 'degree-bounded' approach that fingerprints each node's local properties through a hash, which constrains the search to nodes within hash-defined buckets, thus minimising the number of comparisons. This method scales on graphs with billions of nodes and edges. Finally, we provide isomorphic node oddities identified in real-world data.

Original languageEnglish (US)
Title of host publication2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages230-233
Number of pages4
ISBN (Electronic)9798350311990
DOIs
StatePublished - 2023
Event2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 - St. Petersburg, United States
Duration: May 15 2023May 19 2023

Publication series

Name2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023

Conference

Conference2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
Country/TerritoryUnited States
CitySt. Petersburg
Period5/15/235/19/23

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture

Keywords

  • graph
  • node isomorphism
  • parallel algorithm

Cite this