Consider a single-hop wireless sensor network, where a central node (or fusion center, FC) collects data from a set of M energy harvesting (EH)-capable sensors (or nodes). In each time-slot only a subset of K M nodes can be scheduled by the FC for transmission over K orthogonal communication resources (e.g., frequencies). The scheduling problem is tackled by assuming that the FC has no direct access to the instantaneous states of the nodes' batteries, but it only knows the outcomes of previous transmissions attempts and the statistical properties of the energy harvesting/discharging processes. Based on a simple Markovian modeling of the EH and battery leakage processes, the FC's scheduling problem is formulated as partially observable Markov decision processes (POMDPs) and then cast into a restless multi-armed bandit (RMAB) framework. It is shown that in some special cases, a myopic (or greedy) scheduling policy is optimal, and that such a policy coincides with the so called Whittle index policy.