TY - GEN
T1 - Streaming graph sampling with size restrictions
AU - Zakrzewska, Anita
AU - Bader, David A.
N1 - Publisher Copyright:
© 2017 Copyright is held by the owner/author(s).
PY - 2017/7/31
Y1 - 2017/7/31
N2 - Many graph datasets originating from online social network, financial or biological sources are too large to store or analyze. The analysis of such networks may be made more tractable if they are reduced to smaller subgraphs via sampling. While most of the known graph sampling methods are designed with static graphs in mind, many real datasets are massive and rapidly growing, making streaming methods necessary. We present two new techniques, Randomly Induced Edge Sampling (RIES) and Weighted Edge Sampling (WES). Both methods sample a stream of edges in a single pass, without the need to know future properties of the stream. In contrast to previous work that focused on limiting only the number of vertices, our methods restrict the number of edges, thus truly limiting the size of the sampled subgraph. We compare the performance of RIES and WES against the previously known streaming Random Edge (RE) method on eight social network datasets. Using four structural graph properties, we find that both RIES and WES produce subgraphs that are more structurally similar to the original graph than are the subgraphs produced by streaming RE. We also examine the sensitivity of the two algorithms with respect to their parameters. The parameters of WES affect its performance in a more predictable manner and are easier to set. Both new algorithms represent an improvement in the available streaming graph analysis toolkit.
AB - Many graph datasets originating from online social network, financial or biological sources are too large to store or analyze. The analysis of such networks may be made more tractable if they are reduced to smaller subgraphs via sampling. While most of the known graph sampling methods are designed with static graphs in mind, many real datasets are massive and rapidly growing, making streaming methods necessary. We present two new techniques, Randomly Induced Edge Sampling (RIES) and Weighted Edge Sampling (WES). Both methods sample a stream of edges in a single pass, without the need to know future properties of the stream. In contrast to previous work that focused on limiting only the number of vertices, our methods restrict the number of edges, thus truly limiting the size of the sampled subgraph. We compare the performance of RIES and WES against the previously known streaming Random Edge (RE) method on eight social network datasets. Using four structural graph properties, we find that both RIES and WES produce subgraphs that are more structurally similar to the original graph than are the subgraphs produced by streaming RE. We also examine the sensitivity of the two algorithms with respect to their parameters. The parameters of WES affect its performance in a more predictable manner and are easier to set. Both new algorithms represent an improvement in the available streaming graph analysis toolkit.
UR - http://www.scopus.com/inward/record.url?scp=85040233185&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040233185&partnerID=8YFLogxK
U2 - 10.1145/3110025.3110058
DO - 10.1145/3110025.3110058
M3 - Conference contribution
AN - SCOPUS:85040233185
T3 - Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017
SP - 282
EP - 290
BT - Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017
A2 - Diesner, Jana
A2 - Ferrari, Elena
A2 - Xu, Guandong
PB - Association for Computing Machinery, Inc
T2 - 9th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017
Y2 - 31 July 2017 through 3 August 2017
ER -