TY - JOUR
T1 - A deep reinforcement learning framework for solving two-stage stochastic programs
AU - Yilmaz, Dogacan
AU - Büyüktahtakın, I. Esra
N1 - Funding Information:
We gratefully acknowledge the support of the National Science Foundation CAREER Award co-funded by the CBET/ENG Environmental Sustainability program and the Division of Mathematical Sciences in MPS/NSF under Grant No. CBET-1554018. We also acknowledge the support of New Jersey Institute of Technology Academic & Research Computing Systems for the High Performance Computing resources. Also, we thank two reviewers and the editor for their helpful feedback, which helped improve our exposition’s clarity.
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023
Y1 - 2023
N2 - In this study, we present a deep reinforcement learning framework for solving scenario-based two-stage stochastic programming problems. Stochastic programs have numerous real-time applications, such as scheduling, disaster management, and route planning, yet they are computationally challenging to solve and require specially designed solution strategies such as hand-crafted heuristics. To the extent of our knowledge, this is the first study that decomposes two-stage stochastic programs with a multi-agent structure in a deep reinforcement learning algorithmic framework to solve them faster. Specifically, we propose a general two-stage deep reinforcement learning framework that can generate high-quality solutions within a fraction of a second, in which two different learning agents sequentially learn to solve each stage of the problem. The first-stage agent is trained with the feedback of the second-stage agent using a new policy gradient formulation since the decisions are interconnected through the stages. We demonstrate our framework through a general multi-dimensional stochastic knapsack problem. The results show that solution time can be reduced up to five orders of magnitude with sufficiently good optimality gaps of around 7%. Also, a decision-making agent can be trained with a few scenarios and can solve problems with many scenarios and achieve a significant reduction in solution times. Considering the vast state and action space of the problem of interest, the results show a promising direction for generating fast solutions for stochastic online optimization problems without expert knowledge.
AB - In this study, we present a deep reinforcement learning framework for solving scenario-based two-stage stochastic programming problems. Stochastic programs have numerous real-time applications, such as scheduling, disaster management, and route planning, yet they are computationally challenging to solve and require specially designed solution strategies such as hand-crafted heuristics. To the extent of our knowledge, this is the first study that decomposes two-stage stochastic programs with a multi-agent structure in a deep reinforcement learning algorithmic framework to solve them faster. Specifically, we propose a general two-stage deep reinforcement learning framework that can generate high-quality solutions within a fraction of a second, in which two different learning agents sequentially learn to solve each stage of the problem. The first-stage agent is trained with the feedback of the second-stage agent using a new policy gradient formulation since the decisions are interconnected through the stages. We demonstrate our framework through a general multi-dimensional stochastic knapsack problem. The results show that solution time can be reduced up to five orders of magnitude with sufficiently good optimality gaps of around 7%. Also, a decision-making agent can be trained with a few scenarios and can solve problems with many scenarios and achieve a significant reduction in solution times. Considering the vast state and action space of the problem of interest, the results show a promising direction for generating fast solutions for stochastic online optimization problems without expert knowledge.
KW - Deep learning
KW - Discrete optimization
KW - Online optimization
KW - Real-time problems
KW - Reinforcement learning
KW - Two-stage stochastic programming
UR - http://www.scopus.com/inward/record.url?scp=85160721399&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160721399&partnerID=8YFLogxK
U2 - 10.1007/s11590-023-02009-5
DO - 10.1007/s11590-023-02009-5
M3 - Article
AN - SCOPUS:85160721399
SN - 1862-4472
JO - Optimization Letters
JF - Optimization Letters
ER -