We want the Learning Agent to learn a general policy that works for any environment, independently of the locations of the landmarks and targets. Hence, our state representation must not directly employ the locations of the landmarks. Moreover, the robot cannot directly observe the complete state of the environment, which would include the location of the robot, all obstacles, and all landmarks! Instead, the task of the robot is to learn, under conditions of incomplete knowledge, about the locations of obstacles, landmarks, and targets.
State spaces that encode incomplete knowledge are known as ``belief state spaces'' [15]. The purpose of a belief state representation is to capture the current state of knowledge of the agent, rather than the current state of the external world. In our case, the Learning Agent is trying to move from a starting belief state in which it knows nothing to a goal belief state in which it is confident that it is located at the target location. Along the way, it seeks to avoid getting lost (which is a belief state in which it does not know its location relative to the target position).
To explain our state representation, we begin by defining a set of
belief state variables. Then we explain how these are discretized to
provide a small set of features each taking on a small set of values,
so that and
can be represented with small
tables.
At any given point in time, the headings to all objects (landmarks and the target position) are divided into six sectors. The field of view of the robot is 60 degrees, so at any point in time, the robot can observe one sector, see Figure 5.5. For each sector, we represent information about the number of landmarks believed to be in that sector and the precision of our beliefs about their headings and distances. This information is gathered from an initial version of the Visual Memory that constantly updates the location of the seen landmarks, and to which the Learning Agent has access.
![]() |
Given these sectors, the following state variables can be defined:
The imprecision of a landmark is computed using the equation
3.3 already given in Section 3.2.2:
We summarize the agent's knowledge of the landmarks in each sector by
averaging the imprecision of the four most-precisely-known landmarks.
The function
selects a subset,
, of
a group of landmarks,
, such that
.
Having 4 landmarks in one sector is already very good, since only 3
landmarks are needed to use the beta-coefficient system
network. Furthermore, we do not want these measures to be affected by
bad landmarks when we have some that are good enough. That is why we
use
when computing
.
© 2003 Dídac Busquets