Although the obtained results show that the Learning Agent has learned
to select actions to resolve the complex camera tradeoff, we need to
integrate it into the overall multi-agent system (as depicted in Figure
5.4), to see if the performance of the whole system is
also improved.
Even though the Learning Agent knows which actions it
has to bid for (following the learn policy), it is not clear how its
bidding function should be (e.g. constant, depending on the values of ).
Some more further work will be focused on the design of the state
and feature representation and the set of available actions.
Asada et al. [5] proposed a solution for coping with
the ``state-action deviation problem'', in which actions operate at a finer
grain than the features can represent, having the effect that most
actions appear to leave the state unchanged, and learning becomes impossible.
We plan to evaluate the suitability of this approach in our experiments.
Regarding the action set design, we found that the set of available
actions was maybe too small and some more actions may be needed.
We are working on an ``action refinement''
method [20] that exploits prior knowledge information about the similarity
of actions to speed up the learning process.
In this approach, the set of available actions is larger, but in
order to not slow down the learning, the actions are grouped into
subsets of similar actions. Early in the learning process, the
Reinforcement Learning algorithm treats each subset of similar actions
as a single ``abstract'' action, estimating not only from
the execution of action
, but also from the execution of its
similar actions. This action abstraction is later on stopped, and then
each action is treated on its own, thus, refining the values of
learned with abstraction.
© 2003 Dídac Busquets