An Actor-critic Algorithm Using Cross Evaluation of Value Functions

Hui Wang; Peng Zhang; Quan Liu

doi:10.11591/ijra.v7i1.pp39-47

An Actor-critic Algorithm Using Cross Evaluation of Value Functions

Hui Wang, Peng Zhang, Quan Liu

Abstract

In order to overcome the difficulty of learning a global optimal policy caused by maximization bias in a continuous space, an actor-critic algorithm for cross evaluation of double value function is proposed. Two independent value functions make the critique closer to the real value function. And the actor is guided by a crossover function to choose its optimal actions. Cross evaluation of value functions avoids the policy jitter phenomenon behaved by greedy optimization methods in continuous spaces. The algorithm is more robust than CACLA learning algorithm, and the experimental results show that our algorithm is smoother and the stability of policy is improved obviously under the condition that the computation remains almost unchanged.

Keywords

Actor-critic; Continuous spaces; Cross evaluation; Reinforcement learning

Full Text:

PDF

DOI: http://doi.org/10.11591/ijra.v7i1.pp39-47

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IAES International Journal of Robotics and Automation (IJRA)
ISSN 2089-4856, e-ISSN 2722-2586

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

IJRA Visitor Statistics

Username
Password
Remember me