Other behavior-based systems

Rosenblatt [56], in CMU's Distributed Architecture for Mobile Navigation project (DAMN), proposed an architecture that is similar to our approach. In this architecture, a set of modules (behaviors) cooperate to control a robot's path by voting for various possible actions (steering angle and speed), and an arbiter decides which is the action to be performed. The action with more votes is the one actually executed. However, the set of actions is pre-defined, while in our system each agent can bid for any action it wants to perform. Moreover, in the experiments carried out with this system (DAMN), the navigation system used a grid-based map and did not use at all landmark based navigation.

Saffioti et al [58,57] developed the Saphira architecture, which uses fuzzy logic to implement the behaviors. Each behavior consists of several fuzzy rules that have fuzzy variables as antecedents (extracted from sensory and world model information), and generate as output a control set (i.e. fuzzy control variable). This control set is computed from the values of the fuzzy variables, and it represents the desirability of executing the control action, being similar to the activation level of the action selection architecture. Each behavior also has a fixed priority factor which is used for coordinating all the behaviors. This coordination is very similar to the cooperative mechanism used in Motor schemas. However, instead of combining vectors, it combines control sets and then defuzzifies the resulting set in order to get a single control value.

Humphrys [31] presents several action selection mechanisms that use a similar coordination mechanism to ours. Each agent suggests the action it wants the robot to perform with a given strength or weight (equivalent to our bid), and the action with the highest weight is the one executed. These weights are computed (and learned through Reinforcement Learning) using the one-step reward of executing an action, which each agent is able to predict for the actions it suggests. This is an important difference with our problem, since we cannot assign a one-step reward to an action; the only reward the robot may receive is when the robot reaches the target, and it is very difficult to split this reward into smaller rewards for each action taken during the navigation to the target.