TY - JOUR
T1 - Algorithms and Applications to Weighted Rank-one Binary Matrix Factorization
AU - Lu, Haibing
AU - Chen, Xi
AU - Shi, Junmin
AU - Vaidya, Jaideep
AU - Atluri, Vijayalakshmi
AU - Hong, Yuan
AU - Huang, Wei
N1 - Funding Information:
Research reported in this publication was supported by the State Grid Corporation of China [5418-201958524A-0-0-00] and the National Institutes of Health under awards R01GM118574 and R35GM134927. Professor Huang would like to acknowledge the partial support of following NSFC grants (71731009, 2018WZDXM020; 71433001). The content is solely the responsibility of the authors and does not necessarily represent the official views of the agencies funding the research. Authors’ addresses: H. Lu, Santa Clara Unviersity, 500 El Camino Real, Santa Clara, CA 95053, USA; email: hlu@scu.edu; X. Chen, GEIRI North America, 250 W. Tasman Dr. Ste 100, San Jose, CA, 95134, USA; email: xi.chen@geirina.net; J. Shi, New Jersey Institute of Technology, University Heights, Newark, NJ 07102 USA; email: jshi@njit.edu; J. Vaidya and V. Atluri, Rutgers University, 1 Washington Park | Newark, NJ 07102 USA; emails: jsvaidya@rbs.rutgers.edu, atluri@rutgers.edu; Y. Hong, Illinois Institute of Technology, 10 W 31st St., Chicago, IL 60616 USA; email: yuan.hong@iit.edu; W. Huang, Southern University of Science & Technology, 1088 Xue Yuan Street, Nanshan, Shenzhen, Guangdong, 518055 China; Xi’an Jiaotong University, China; email: whuang@mail.xjtu.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2158-656X/2020/05-ART7 $15.00 https://doi.org/10.1145/3386599
Publisher Copyright:
© 2020 ACM.
PY - 2020/7
Y1 - 2020/7
N2 - Many applications use data that are better represented in the binary matrix form, such as click-stream data, market basket data, document-term data, user-permission data in access control, and others. Matrix factorization methods have been widely used tools for the analysis of high-dimensional data, as they automatically extract sparse and meaningful features from data vectors. However, existing matrix factorization methods do not work well for the binary data. One crucial limitation is interpretability, as many matrix factorization methods decompose an input matrix into matrices with fractional or even negative components, which are hard to interpret in many real settings. Some matrix factorization methods, like binary matrix factorization, do limit decomposed matrices to binary values. However, these models are not flexible to accommodate some data analysis tasks, like trading off summary size with quality and discriminating different types of approximation errors. To address those issues, this article presents weighted rank-one binary matrix factorization, which is to approximate a binary matrix by the product of two binary vectors, with parameters controlling different types of approximation errors. By systematically running weighted rank-one binary matrix factorization, one can effectively perform various binary data analysis tasks, like compression, clustering, and pattern discovery. Theoretical properties on weighted rank-one binary matrix factorization are investigated and its connection to problems in other research domains are examined. As weighted rank-one binary matrix factorization in general is NP-hard, efficient and effective algorithms are presented. Extensive studies on applications of weighted rank-one binary matrix factorization are also conducted.
AB - Many applications use data that are better represented in the binary matrix form, such as click-stream data, market basket data, document-term data, user-permission data in access control, and others. Matrix factorization methods have been widely used tools for the analysis of high-dimensional data, as they automatically extract sparse and meaningful features from data vectors. However, existing matrix factorization methods do not work well for the binary data. One crucial limitation is interpretability, as many matrix factorization methods decompose an input matrix into matrices with fractional or even negative components, which are hard to interpret in many real settings. Some matrix factorization methods, like binary matrix factorization, do limit decomposed matrices to binary values. However, these models are not flexible to accommodate some data analysis tasks, like trading off summary size with quality and discriminating different types of approximation errors. To address those issues, this article presents weighted rank-one binary matrix factorization, which is to approximate a binary matrix by the product of two binary vectors, with parameters controlling different types of approximation errors. By systematically running weighted rank-one binary matrix factorization, one can effectively perform various binary data analysis tasks, like compression, clustering, and pattern discovery. Theoretical properties on weighted rank-one binary matrix factorization are investigated and its connection to problems in other research domains are examined. As weighted rank-one binary matrix factorization in general is NP-hard, efficient and effective algorithms are presented. Extensive studies on applications of weighted rank-one binary matrix factorization are also conducted.
KW - clustering
KW - compression
KW - Discrete data
KW - pattern discovery
UR - http://www.scopus.com/inward/record.url?scp=85090455930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090455930&partnerID=8YFLogxK
U2 - 10.1145/3386599
DO - 10.1145/3386599
M3 - Article
AN - SCOPUS:85090455930
SN - 2158-656X
VL - 11
JO - ACM Transactions on Management Information Systems
JF - ACM Transactions on Management Information Systems
IS - 2
M1 - 3386599
ER -