TY - JOUR
T1 - Remote data integrity checking with server-side repair1
AU - Chen, Bo
AU - Curtmola, Reza
N1 - Funding Information:
This research was supported by the National Science Foundation (NSF) under Grants No. CNS 1054754, CNS 1409523, and DGE 1565478, and by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under Contract No. A8650-15-C-7521. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF, DARPA, and AFRL. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notice herein.
PY - 2017
Y1 - 2017
N2 - Distributed storage systems store data redundantly at multiple servers that are geographically spread throughout the world. This basic approach would be sufficient in handling server failure due to natural faults, because when one server fails, data from healthy servers can be used to restore the desired redundancy level. However, in a setting where servers are untrusted and can behave maliciously, data redundancy must be used in tandem with Remote Data Checking (RDC) to ensure that the redundancy level of the storage systems is maintained over time. All previous RDC schemes for distributed systems impose a heavy burden on the data owner (client) during data maintenance: To repair data at a faulty server, the data owner needs to first download a large amount of data, re-generate the data to be stored at a new server, and then upload this data at a new healthy server. We work on a new concept, namely, server-side repair, in which the servers are responsible to repair the corruption, whereas the client acts as a lightweight repair coordinator during repair. We propose two novel RDC schemes for replication-based distributed storage systems, RDC-SR and ERDC-SR, which enable server-side repair (thus taking advantage of the premium connections available between a CSP's data centers) and minimize the load on the client side. Although both schemes achieve a similar objective, RDC-SR assumes that the computational power of the CSP will not grow over time, whereas ERDC-SR relaxes this assumption and considers a CSP whose computational power can increase over time. Our guidelines on choosing the parameters of these schemes provide insights on their practical usage and also reveal that, whereas ERDC-SR can handle more powerful adversaries, it also imposes a minimal file size. Finally, we evaluate the performance of the two schemes. For the RDC-SR scheme, we build a prototype on the Amazon cloud and provide experimental results to support its effectiveness. Our prototype for RDC-SR built on Amazon AWS validates the practicality of this new approach. For the ERDC-SR scheme, our analytical performance analysis shows that the scheme is an order of magnitude more efficient than a simple extension of RDC-SR to defend against the stronger adversarial model.
AB - Distributed storage systems store data redundantly at multiple servers that are geographically spread throughout the world. This basic approach would be sufficient in handling server failure due to natural faults, because when one server fails, data from healthy servers can be used to restore the desired redundancy level. However, in a setting where servers are untrusted and can behave maliciously, data redundancy must be used in tandem with Remote Data Checking (RDC) to ensure that the redundancy level of the storage systems is maintained over time. All previous RDC schemes for distributed systems impose a heavy burden on the data owner (client) during data maintenance: To repair data at a faulty server, the data owner needs to first download a large amount of data, re-generate the data to be stored at a new server, and then upload this data at a new healthy server. We work on a new concept, namely, server-side repair, in which the servers are responsible to repair the corruption, whereas the client acts as a lightweight repair coordinator during repair. We propose two novel RDC schemes for replication-based distributed storage systems, RDC-SR and ERDC-SR, which enable server-side repair (thus taking advantage of the premium connections available between a CSP's data centers) and minimize the load on the client side. Although both schemes achieve a similar objective, RDC-SR assumes that the computational power of the CSP will not grow over time, whereas ERDC-SR relaxes this assumption and considers a CSP whose computational power can increase over time. Our guidelines on choosing the parameters of these schemes provide insights on their practical usage and also reveal that, whereas ERDC-SR can handle more powerful adversaries, it also imposes a minimal file size. Finally, we evaluate the performance of the two schemes. For the RDC-SR scheme, we build a prototype on the Amazon cloud and provide experimental results to support its effectiveness. Our prototype for RDC-SR built on Amazon AWS validates the practicality of this new approach. For the ERDC-SR scheme, our analytical performance analysis shows that the scheme is an order of magnitude more efficient than a simple extension of RDC-SR to defend against the stronger adversarial model.
KW - Cloud storage
KW - butterfly encoding
KW - remote data integrity checking
KW - replicate on the fly attack
KW - server-side repair
UR - http://www.scopus.com/inward/record.url?scp=85028517984&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028517984&partnerID=8YFLogxK
U2 - 10.3233/JCS-16868
DO - 10.3233/JCS-16868
M3 - Article
AN - SCOPUS:85028517984
SN - 0926-227X
VL - 25
SP - 537
EP - 584
JO - Journal of Computer Security
JF - Journal of Computer Security
IS - 6
ER -