Accelerated structure-aware reinforcement learning for delay-sensitive energy harvesting wireless sensors

Nikhilesh Sharma, Nicholas Mastronarde, Jacob Chakareski

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


We consider a time-slotted energy-harvesting wireless sensor transmitting delay-sensitive data over a fading channel. The sensor injects captured data packets into its transmission queue and relies on ambient energy harvested from the environment to transmit them. We aim to find the optimal scheduling policy that decides how many packets to transmit in each time slot to minimize the expected queuing delay. No prior knowledge of the stochastic processes that govern the channel, captured data, and harvested energy dynamics is assumed, thereby necessitating online learning to optimize the scheduling policy. We formulate this problem as a Markov decision process (MDP) with state-space spanning the sensor's buffer, battery, and channel states, and show that its optimal value function is non-decreasing and has increasing differences, in the buffer state, and that it is non-increasing and has increasing differences, in the battery state. We exploit this value function structure knowledge to formulate a novel accelerated reinforcement learning (RL) algorithm based on value function approximation that can solve the scheduling problem online with controlled approximation error, while inducing limited computational and memory complexity. We rigorously capture the trade-off between approximation accuracy and computational/memory complexity savings associated with our approach. Our simulations demonstrate that the proposed algorithm closely approximates the optimal offline solution, which requires complete knowledge of the system state dynamics. Simultaneously, our approach achieves competitive performance relative to a state-of-the-art RL algorithm, at orders of magnitude lower complexity. Moreover, considerable performance gains are demonstrated over the widely popular Q-learning RL technique.

Original languageEnglish (US)
Article number8998306
Pages (from-to)1409-1424
Number of pages16
JournalIEEE Transactions on Signal Processing
StatePublished - 2020
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering


  • Energy harvesting
  • accelerated reinforcement learning
  • delay-sensitive remote sensing
  • post-decision state learning
  • structural properties
  • transmission scheduling
  • value function
  • virtual experience learning


Dive into the research topics of 'Accelerated structure-aware reinforcement learning for delay-sensitive energy harvesting wireless sensors'. Together they form a unique fingerprint.

Cite this