Just as Reinforcement Learning requires careful design of the state space to ensure that it is compact, it also requires careful design of the action set to ensure that it is small but also sufficient for the robot to achieve its goals.
Physically, the robot is able to simultaneously perform two types of actions: moving actions and looking actions. Moving actions make the robot move in a given direction. Looking actions employ the camera to identify or track landmarks in the environment in specified sectors. The Vision system can either search for new landmarks or re-acquire already-detected landmarks, but it is not able to do both things at the same time, because different image processing routines are required for each. In either case, however, the Vision system returns the heading and distance to the landmarks it detects.
An additional constraint on the design of actions is that the Vision system is most effective when the robot is moving in certain directions relative to the landmarks being observed.
Given these constraints, we have designed the following set of actions for the Learning Agent:
These actions should affect the state variables as follows. All
actions except MOT make the distance to the target decrease. MB
makes all imprecisions grow. MLL should increase the number of
detected landmarks. MOT should reduce the imprecision about the
target's location, while MVL should reduce the overall
imprecision. MVT also reduces the imprecision of the target's
location, but not as much as MOT. All actions require that the
heading to the target is known (at least approximately). The heading
is chosen as the center of the fuzzy interval for . If the
heading is completely unknown, the center of this interval is
. This causes the robot to ``pace'' back and forth, turning 180
degrees (
radians) each time an action is executed.
We have assigned an immediate reward to each action to reflect the
load on the Vision system and the motion system. The rewards are
negative, because they are costs. MB is the cheapest action, since it
does not use the camera. It has a reward of . MVL and MVT
produce a reward of
, since they make moderate demands on the
Vision system. MOT gives a reward of
, because it requires more
motion in addition to the same image processing as MVL and MVT.
Finally, MLL is the most expensive, with a reward of
, because it
must do extensive image processing to search for new landmarks and
verify that they are robust to changes in viewpoint.
The system receives a reward of 0 when it reaches the target location. The Reinforcement Learning objective is to maximize the total reward. In this case, this is equivalent to minimizing the total cost of the actions taken to reach the target.
© 2003 Dídac Busquets