TY - GEN
T1 - Low-Complexity Physics-Informed Reinforcement Learning Using Post-Decision States with Stochastic Sampling
AU - Corra, Andrew
AU - Mastronarde, Nicholas
AU - Chakareski, Jacob
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Delay-sensitive Internet of Things (IoT) applications continue to grow in prevalence as new wireless technologies are adopted. Since these applications often operate in unknown dynamic environments, reinforcement learning (RL) has emerged as an effective method to learn optimal decision policies that improve their overall performance. However, typical data-driven RL techniques that have been adopted to solve these problems do not exploit available knowledge of system dynamics. Consequently, they must 'learn' some information about the system that may already be known to the system's designer. Post-decision state (PDS) learning, on the other hand, leverages known system information (i.e., it is 'physics-informed') to simplify the learning task and improve learning performance. However, this comes at the cost of increased computational complexity, and makes it impractical to implement on resource constrained devices. This work introduces stochastic PDS learning, a novel RL algorithm that combines traditional PDS learning with stochastic sampling to produce a physics-informed RL agent that can leverage known system information even with limited computational resources. Performance of stochastic PDS learning is compared against numerous traditional RL algorithms in the context of a delaysensitive energy-efficient scheduling problem simulated as an environment in Gymnasium.
AB - Delay-sensitive Internet of Things (IoT) applications continue to grow in prevalence as new wireless technologies are adopted. Since these applications often operate in unknown dynamic environments, reinforcement learning (RL) has emerged as an effective method to learn optimal decision policies that improve their overall performance. However, typical data-driven RL techniques that have been adopted to solve these problems do not exploit available knowledge of system dynamics. Consequently, they must 'learn' some information about the system that may already be known to the system's designer. Post-decision state (PDS) learning, on the other hand, leverages known system information (i.e., it is 'physics-informed') to simplify the learning task and improve learning performance. However, this comes at the cost of increased computational complexity, and makes it impractical to implement on resource constrained devices. This work introduces stochastic PDS learning, a novel RL algorithm that combines traditional PDS learning with stochastic sampling to produce a physics-informed RL agent that can leverage known system information even with limited computational resources. Performance of stochastic PDS learning is compared against numerous traditional RL algorithms in the context of a delaysensitive energy-efficient scheduling problem simulated as an environment in Gymnasium.
UR - https://www.scopus.com/pages/publications/105018473758
UR - https://www.scopus.com/inward/citedby.url?scp=105018473758&partnerID=8YFLogxK
U2 - 10.1109/ICC52391.2025.11161698
DO - 10.1109/ICC52391.2025.11161698
M3 - Conference contribution
AN - SCOPUS:105018473758
T3 - IEEE International Conference on Communications
SP - 4559
EP - 4564
BT - ICC 2025 - IEEE International Conference on Communications
A2 - Valenti, Matthew
A2 - Reed, David
A2 - Torres, Melissa
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Communications, ICC 2025
Y2 - 8 June 2025 through 12 June 2025
ER -