DRAM cells leak charge over time, causing stored data to be lost. Therefore, periodic refreshes are required to ensure data integrity. Modern DRAM usually refreshes cells at rank level, resulting in an entire rank being unavailable during a refresh period. As DRAM density keeps increasing, more rows need to be refreshed during a single refresh operation, which causes higher refresh latency and significantly degrades the overall memory system performance. To mitigate DRAM refresh overhead, we propose a caching scheme, called Rank-level Piggyback Caching, or RPC for short, based on the fact that ranks in the same channel are refreshed in a staggered manner. The key idea is to cache the to-be-read data in a rank (e.g. Rank 1) to its adjacent rank (e.g. Rank 2) before Rank 1 is locked for refresh. Each rank reserves or over-provisions a very small area, denoted as a cache region, to store the cached data. The cache regions from all ranks are organized in a rotated fashion. In other words, the cached data for the last rank is stored in the first rank. When a read request arrives at a rank undergoing refresh, the memory controller first checks the cache region in the next rank in the same channel, if the requested data is cached, the memory controller services the request from the cache without waiting for the refresh operation to complete, which reduces memory access latency and improves system performance. Our experimental results show that RPC outperforms existing Fine Granularity Refresh modes. In a single-core and four-rank system, it improves system performance by 8.7% and 10.8% on average for the PARSEC 2.1 and SPLASH-2 benchmark suites, respectively. In a four-core and four-rank system, the improvement of system performance for these two benchmark suites is 8.6% and 12.2%, respectively.