Robot Behaviors

Exploring the T-Maze: Evolving Learning-Like Robot Behaviors using CTRNNs


Auhor: Jesper Blynel and Dario Floreano
Autonomous Systems Lab
Institute of Systems Engineering
Swiss Federal Institute of Technology (EPFL)
CH-1015, Lausanne, Switzerland

Abstract.
This paper explores the capabilities of continuous time recurrent neural networks (CTRNNs) to display reinforcement learning-like abilities on a set of T-Maze and double T-Maze navigation tasks, where the Robot has to locate and “remember” the position of a reward-zone. The “learning” comes about without modifications of synapse strengths, but simply from internal network dynamics, as proposed by [12]. Neural controllers are evolved in simulation and in the simple case evaluated on a real robot. The evolved controllers are analyzed and the results obtained are discussed.

1 Introduction

Learning in neural networks is normally thought of as modifications of synaptic strengths by for example back-propagation or Hebbian learning. This view was in 1994 challenged by Yamauchi and Beer in [12], where the authors described the abilities of fixed synapse continuous time recurrent neural networks (CTRNNs) to display reinforcement learning-like properties by exploiting internal network dynamics. The task studied was generation and learning of short bit sequences. In [11] this work was extended to an artificial agent task where the relationship between the positions of a goal and a landmark in an environment had to be learned. However, the movement of the agent was restricted and it was equipped with artificial high level goal- and landmark-detection sensors. These restrictions were loosened in the recent work by [10] where an extended version the same landmark navigation task was studied. In the present work we apply a similar approach in which a simulated Khepera robot has to navigate in first a simple and then a double T-Maze. The task for the robot is to locate and “remember” the location of a reward-zone in the environment it happens to be evaluated in. In contrast to the above mentioned work the evolved behaviors are verified by testing them on a real robot in a real environment. Previous work on TMaze navigation in evolutionary Robotics includes delayed response tasks where the Robots had to perform one or several turns in a maze on the basis of light source cues given to the robot [5][13]. In contrast to these works our focus is how to retain information over successive trials in the same environment. This becomes possible by equipping the robot with a sensor to detect the position of the reward-zone used for fitness evaluation.1

A different line of research has studied how agents in a self-organized ways can learn internal models of the environment [9]. The authors successfully trained a hierarchy of recurrent neural networks to predict increasingly complex information about the environment. The high level information which emerged was in which of two rooms the agent was currently navigating. The authors argued that the model learned could later on be used to generate action plans for goal seeking behaviors as in [8]. In the present work no explicit model of the environment exists, but is tightly coupled with both the learning of behaviors and the generation of motor actions. This corresponds with our belief that, as pointed out by [12], a direct distinction between mechanisms responsible for behavior and mechanisms responsible for learning is hard to defend biologically.

2 Neural Architecture and Genetic Encoding

Continuous-time recurrent neural networks (CTRNNs) are utilized for the experiments in this paper. The state of each neuron can be described by the following differential equation:

The network architecture is shown in figure 1(b). The network consists of 6 fully interconnected neurons (4 hidden + 2 motor outputs) and 5 sensory receptors. Every neuron has synaptic connections from all neurons and all sensory receptors. The receptors are configured as follows: 4 inputs from the infrared proximity Sensors paired two-by-two and 1 additional input from a floor sensor pointing downwards measuring the surface brightness. The 4 proximity values

are scaled between 0 and 1. The floor sensor input is set to 1 if the robot is inside a black reward-zone and 0 otherwise. The activation of the 2 output neurons, linearly scaled between -10 and 10, are used to set the wheel-speeds of the robot.

The network parameters are encoded in a bitstring genotype. Each neuron has 13 encoded parameters: A time constant ( ), a bias threshold (), and 11 synaptic strengths (wij ). Each of the 78 network parameters is encoded linearly within its range using 5 bits, resulting in a total genotype length of 390 bits.

Page 1, 2, 3, 4, 5, 6

Tech Materials (Free)

Robot Behaviors Exploring the T-Maze: Evolving Learning-Like Robot Behaviors using CTRNNs
Humanoid Robotics A Biochemical Subsystem for a Humanoid Robot
Industrial Automation Systems Applying Agents for Engineering of Industrial Automation Systems
Robot Team Cooperation A Descriptive Model of Robot Team and the Dynamic Evolution of Robot Team Cooperation
Kuka Robots For ONU ONU Robotics Technology Center of Excellence, powered by KUKA Robotics Corporation
Augmented reality Annotation System for Robotic Application
Modular Robots Self-Reconfiguration Planning Of Identical Modules
Autonomous robots A New Approach To Robotics
Robotic Mounting Flat Panel Displays With Robotic Mounting
Calibration of Industrial Robots A Photogrammetric Robot Calibration System Based On Off-The-Shelf Low Cost Hardware Components

More...

Amazon Books
Creative Projects with LEGO Mindstorms Creative Projects with LEGO Mindstorms by Benjamin Erwin
Buy new: $20.64 / Used from: $13.00
A good place to start, especially for kids, with Lego Mindstorms
RobotProgramming : A Practical Guide to Behavior-BasedRobotics A Practical Guide to Behavior-Based Robotics by Joe Jones
Buy new: $20.67 / Used from: $15.13
Very good for programming not so much behavior as control. Language and controller agnostic


Add to Google
Add to Yahoo

Robotics  What is Robotics?
     - Robotic Applications
     - Communication Types
     - Robo Structures
     - Grippers
     - Direction Control
     - Power Sources
     - Programming Methods
Human Robot Interaction  Interaction Dynamics Among Humans And Robots
     - Seal Robot
     - I-Blocks
     - LEGO Mindstorms
Industrial Automation  Modern trends in Industrial Automation, Process Control and Robotics
Design Priniciples  Design principles of Human Machine Interface Systems In Industrial automation
     - Design Process
Gallery  Industrial Robots Gallery
     - ABB Robots
     - Epson Robots
     - Faunc Robots
     - Humanoid Robots
     - Scara Robots