Solving Dynamic Traveling Salesman Problems With Deep Reinforcement Learning

Zizhen Zhang, Hong Liu, Meng Chu Zhou, Jiahai Wang

Research output: Contribution to journalArticlepeer-review

Abstract

A traveling salesman problem (TSP) is a well-known NP-complete problem. Traditional TSP presumes that the locations of customers and the traveling time among customers are fixed and constant. In real-life cases, however, the traffic conditions and customer requests may change over time. To find the most economic route, the decisions can be made constantly upon the time-point when the salesman completes his service of each customer. This brings in a dynamic version of the traveling salesman problem (DTSP), which takes into account the information of real-time traffic and customer requests. DTSP can be extended to a dynamic pickup and delivery problem (DPDP). In this article, we ameliorate the attention model to make it possible to perceive environmental changes. A deep reinforcement learning algorithm is proposed to solve DTSP and DPDP instances with a size of up to 40 customers in 100 locations. Experiments show that our method can capture the dynamic changes and produce a highly satisfactory solution within a very short time. Compared with other baseline approaches, more than 5% improvements can be observed in many cases.

Original languageEnglish (US)
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
StateAccepted/In press - 2021

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Keywords

  • Attention model
  • Computational modeling
  • Decision making
  • deep reinforcement learning (DRL)
  • dynamic traveling salesman problem (DTSP)
  • machine learning
  • Planning
  • policy gradient.
  • Real-time systems
  • Reinforcement learning
  • Routing
  • Traveling salesman problems

Fingerprint

Dive into the research topics of 'Solving Dynamic Traveling Salesman Problems With Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this