Although the results obtained through Reinforcement Learning showed that the system learned
to select actions in order to solve the complex camera tradeoff, we still need to
integrate it into the overall multi-agent system, to see if the
performance of the whole system is also improved.
Even though the Learning Agent knows which actions it
has to bid for (following the learned policy), it is not clear what its
bidding function should be; it could be a constant bidding value, or a
bidding depending on the values of .
Some more further work will be focused on the design of the state
and feature representation and the set of available actions.
Asada et al. [5] proposed a solution for coping with
the ``state-action deviation problem'', in which actions operate at a finer
grain than the features can represent, having the effect that most
actions appear to leave the state unchanged, and learning becomes impossible.
We plan to evaluate the suitability of this approach in our experiments.
Regarding the action set design, we found that the set of available
actions was maybe too small and some more actions may be needed.
We are working on an ``action refinement''
method [20] that exploits prior knowledge information about the similarity
of actions to speed up the learning process.
In this approach, the set of available actions is larger, but in
order to not slow down the learning process, the actions are grouped into
subsets of similar actions. Early in the learning process, the
Reinforcement Learning algorithm treats each subset of similar actions
as a single ``abstract'' action, estimating not only from
the execution of action
, but also from the execution of its
similar actions. This action abstraction is later on stopped, and then
each action is treated on its own, thus, refining the values of
learned with abstraction.
© 2003 Dídac Busquets