Last-position elimination-based learning automata

Junqi Zhang, Cheng Wang, Meng Chu Zhou

Research output: Contribution to journalArticlepeer-review

49 Scopus citations

Abstract

An update scheme of the state probability vector of actions is critical for learning automata (LA). The most popular is the pursuit scheme that pursues the estimated optimal action and penalizes others. This paper proposes a reverse philosophy that leads to last-position elimination-based learning automata (LELA). The action graded last in terms of the estimated performance is penalized by decreasing its state probability and is eliminated when its state probability becomes zero. All active actions, that is, actions with nonzero state probability, equally share the penalized state probability from the last-position action at each iteration. The proposed LELA is characterized by the relaxed convergence condition for the optimal action, the accelerated step size of the state probability update scheme for the estimated optimal action, and the enriched sampling for the estimated nonoptimal actions. The proof of the ε-optimal property for the proposed algorithm is presented. Last-position elimination is a widespread philosophy in the real world and has proved to be also helpful for the update scheme of the learning automaton via the simulations of well-known benchmark environments. In the simulations, two versions of the LELA, using different selection strategies of the last action, are compared with the classical pursuit algorithms Discretized Pursuit Reward-Inaction (DPRI) and Discretized Generalized Pursuit Algorithm (DGPA). Simulation results show that the proposed schemes achieve significantly faster convergence and higher accuracy than the classical ones. Specifically, the proposed schemes reduce the interval to find the best parameter for a specific environment in the classical pursuit algorithms. Thus, they can have their parameter tuning easier to perform and can save much more time when applied to a practical case. Furthermore, the convergence curves and the corresponding variance coefficient curves of the contenders are illustrated to characterize their essential differences and verify the analysis results of the proposed algorithms.

Original languageEnglish (US)
Article number6782439
Pages (from-to)2484-2492
Number of pages9
JournalIEEE Transactions on Cybernetics
Volume44
Issue number12
DOIs
StatePublished - Dec 1 2014

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Keywords

  • Last-position elimination
  • Learning automata (LA)
  • Stationary environments
  • Update scheme

Fingerprint Dive into the research topics of 'Last-position elimination-based learning automata'. Together they form a unique fingerprint.

Cite this