Data redundancy elimination (DRE), also known as data de-duplication, reduces the amount of data to be transferred or stored by identifying and eliminating both intra-object and inter-object duplicated data elements with a reference or pointer to the unique data copy. Large scale trace-driven studies have showed that packet-level DRE techniques can achieve 15-60% bandwidth savings when deployed at access links of the service providers, up to almost 50% bandwidth savings in Wi-Fi networks and as much as 60% mobile data volume reduction in cellular networks. In this paper, we survey the state-of-the-art protocol-independent redundancy elimination techniques. We overview the system architecture and main processing of protocol-independent DRE techniques, followed by discussion on major mechanisms activated in protocol-independent DRE, including the fingerprinting mechanism, cache management mechanism, chunk matching mechanism, and decoding error recovery mechanism. We also present several redundancy elimination systems deployed in wireline, wireless and cellular networks, respectively. Several other techniques to enhance the DRE performance are further discussed, such as DRE bypass techniques, non-uniform sampling, and chunk overlap.
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
- Data redundancy elimination (DRE)
- content delivery acceleration
- data de-duplication
- protocol-independent DRE
- wide area network (WAN) optimization