Future Work

Although the obtained results show that the Learning Agent has learned to select actions to resolve the complex camera tradeoff, we need to integrate it into the overall multi-agent system (as depicted in Figure 5.4), to see if the performance of the whole system is also improved. Even though the Learning Agent knows which actions it has to bid for (following the learn policy), it is not clear how its bidding function should be (e.g. constant, depending on the values of $V(s)$).

Some more further work will be focused on the design of the state and feature representation and the set of available actions. Asada et al. [5] proposed a solution for coping with the ``state-action deviation problem'', in which actions operate at a finer grain than the features can represent, having the effect that most actions appear to leave the state unchanged, and learning becomes impossible. We plan to evaluate the suitability of this approach in our experiments. Regarding the action set design, we found that the set of available actions was maybe too small and some more actions may be needed. We are working on an ``action refinement'' method [20] that exploits prior knowledge information about the similarity of actions to speed up the learning process. In this approach, the set of available actions is larger, but in order to not slow down the learning, the actions are grouped into subsets of similar actions. Early in the learning process, the Reinforcement Learning algorithm treats each subset of similar actions as a single ``abstract'' action, estimating $P(s'\vert s,a)$ not only from the execution of action $a$, but also from the execution of its similar actions. This action abstraction is later on stopped, and then each action is treated on its own, thus, refining the values of $P(s'\vert s,a)$ learned with abstraction.

© 2003 Dídac Busquets