Actions

Just as Reinforcement Learning requires careful design of the state space to ensure that it is compact, it also requires careful design of the action set to ensure that it is small but also sufficient for the robot to achieve its goals.

Physically, the robot is able to simultaneously perform two types of actions: moving actions and looking actions. Moving actions make the robot move in a given direction. Looking actions employ the camera to identify or track landmarks in the environment in specified sectors. The Vision system can either search for new landmarks or re-acquire already-detected landmarks, but it is not able to do both things at the same time, because different image processing routines are required for each. In either case, however, the Vision system returns the heading and distance to the landmarks it detects.

An additional constraint on the design of actions is that the Vision system is most effective when the robot is moving in certain directions relative to the landmarks being observed.

Given these constraints, we have designed the following set of actions for the Learning Agent:

Move Blind (MB): move toward the target (i.e., in the direction in which the target is believed to be). Do not use the Vision system.
Move and Look for Landmarks (MLL): move toward the target. Point the camera in the sector that contains the fewest number of known landmarks, and look for new landmarks in this sector.
Move Orthogonally to Target (MOT): move orthogonally to the direction of the target. Point the camera at the target and attempt to improve the precision of the heading and distance to the target.
Move and Verify Landmarks (MVL): move toward the target. Point the camera to the sector with the maximum imprecision, $\overline{I}$ , and attempt to re-acquire known landmarks and measure their heading and distance more accurately.
Move and Verify Target (MVT): move toward the target. Point the camera at the target and attempt to re-acquire it and measure its heading and distance more accurately.

These actions should affect the state variables as follows. All actions except MOT make the distance to the target decrease. MB makes all imprecisions grow. MLL should increase the number of detected landmarks. MOT should reduce the imprecision about the target's location, while MVL should reduce the overall imprecision. MVT also reduces the imprecision of the target's location, but not as much as MOT. All actions require that the heading to the target is known (at least approximately). The heading is chosen as the center of the fuzzy interval for . If the heading is completely unknown, the center of this interval is $\pi$ . This causes the robot to ``pace'' back and forth, turning 180 degrees ( $\pi$ radians) each time an action is executed.

We have assigned an immediate reward to each action to reflect the load on the Vision system and the motion system. The rewards are negative, because they are costs. MB is the cheapest action, since it does not use the camera. It has a reward of . MVL and MVT produce a reward of , since they make moderate demands on the Vision system. MOT gives a reward of , because it requires more motion in addition to the same image processing as MVL and MVT. Finally, MLL is the most expensive, with a reward of , because it must do extensive image processing to search for new landmarks and verify that they are robust to changes in viewpoint.

The system receives a reward of 0 when it reaches the target location. The Reinforcement Learning objective is to maximize the total reward. In this case, this is equivalent to minimizing the total cost of the actions taken to reach the target.