A Symbiotic framework for Hierarchical Policy Search

Authors: 

Peter Lichodzijewski
John A. Doucette
Malcolm I. Heywood

Author Addresses: 

Peter Lichodzijewski
Faculty of Computer Science, Dalhousie University, NS, Canada
{piotr}@cs.dal.ca

John A. Doucette
David R. Cheriton School of Computer Science, Waterloo University, ON, Canada
{jaedoucette}@gmail.com

Malcolm I. Heywood
Faculty of Computer Science, Dalhousie University, NS, Canada
{mheywood}@cs.dal.ca

Abstract: 

Hierarchical reinforcement learning (HRL) traditionally represents a framework in which a machine learning algorithm is applied to build solutions to temporal sequence style prob- lems under the guidance of a priori identified sub-tasks. Once learning relative to one set of subtasks is complete, these can then be reused to build more complex behaviours. The principal caveat is that appropriate subtasks can be identified, preferably without requir- ing a priori knowledge. This work proposes a generic architecture for evolving hierarchical policies through symbiosis. Specifically, symbionts define an action and an evolved context, whereas each host identifies a subset of symbionts. Symbionts effectively coevolve within a host. Natural selection operates on the hosts, with symbiont existence a function of host performance; or a form of group selection. It is now possible to support hierarchical policies as a symbiotic process by letting hosts evolved in an earlier population become the symbiont actions at the next. Two benchmarking studies are performed to illustrate the approach. An initial tutorial is conducted using a truck reversal domain in which the benefits of evolving a hierarchical solution over non-hierarchical solutions is clearly demonstrated. A second benchmarking study is then performed using the Acrobot handstand task. Solutions to date from reinforcement learning have not been able to approach those established 13 years ago using an A∗ search and a priori knowledge regarding the Acrobot energy equations. The proposed symbiotic approach is able to match and, for the first time, better these results. Moreover, unlike previous works, solutions are tested under a broad range of Acrobot initial conditions, with hierarchical solutions providing significantly better generalization performance.

Tech Report Number: 
CS-2011-06
Report Date: 
October 25, 2011
AttachmentSize
PDF icon CS-2011-06.pdf7.02 MB