Project Details
Description
Networks, also referred to as graphs, consist of nodes (vertices) connected by edges (links). Many types of information can be represented as networks. For example, in social networks, vertices can represent people and edges can represent friendships, while in biological networks the vertices can represent proteins and the edges can represent interactions between proteins. A basic problem in network analysis is to partition the vertices of a graph into non-overlapping sets so that each set represents a cohesive group. This problem, referred to as “community detection” or “graph clustering”, has widespread application in many domains, including biology, engineering, and the social sciences. In this project, new community detection methods will be developed that can run on large networks in the order of millions and billions of vertices and will be implemented in highly efficient software that can be used in high performance computing platforms. An educational component is included with advanced training for both undergraduate and graduate students. Community detection, otherwise known as graph clustering, is the problem of partitioning the vertices of a graph into disjoint sets, so that each set has desirable properties, such as being well-connected (i.e., not having a small edge cut), having high internal density, and being relatively separated from other clusters. Common approaches for graph clustering include optimizing under the modularity criterion or the Constant Potts Model. Because these are NP-hard optimization problems, heuristic searches are used that can be very computationally intensive on large datasets. Furthermore, recent research has revealed limitations to currently popular methods, including the tendency to produce very poorly connected clusters, i.e., clusters with small edge cuts. The Connectivity Modifier software was developed to address this problem: it modifies a given clustering by iteratively finding and removing small edge cuts from clusters and then reclustering, until all clusters are well-connected. This project will develop new performant implementations of the Connectivity Modifier, and expand the set of clustering methods that can be used within the framework. The project will also develop a modular suite of clustering tools that address other problems in community detection, such as finding center-periphery clusters and overlapping clusters, that will enable developers to explore algorithmic approaches to clustering on large networks and also enable exploratory data analysis for applications researchers. The current implementations of these codes have not been implemented for very large networks nor for high performance computing (HPC) platforms. This project will develop parallel codes for these methods that can be deployed on a wide range of compute platforms. The research in this project will be integrated into graduate level courses and undergraduate and graduate students will participate in advanced research under the mentorship of the project investigators. The expected benefits include new software capable of analyzing networks with billions of vertices and that can be deployed in a wide range of hardware, enabling researchers to make discovery in a wide range of application areas, including systems biology, scientometrics, and social network analysis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Status | Active |
---|---|
Effective start/end date | 7/15/24 → 6/30/27 |
Funding
- National Science Foundation: $250,000.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.