Reinforcement Learning

Although the results obtained through Reinforcement Learning showed that the system learned to select actions in order to solve the complex camera tradeoff, we still need to integrate it into the overall multi-agent system, to see if the performance of the whole system is also improved. Even though the Learning Agent knows which actions it has to bid for (following the learned policy), it is not clear what its bidding function should be; it could be a constant bidding value, or a bidding depending on the values of .

Some more further work will be focused on the design of the state and feature representation and the set of available actions. Asada et al. [5] proposed a solution for coping with the ``state-action deviation problem'', in which actions operate at a finer grain than the features can represent, having the effect that most actions appear to leave the state unchanged, and learning becomes impossible. We plan to evaluate the suitability of this approach in our experiments. Regarding the action set design, we found that the set of available actions was maybe too small and some more actions may be needed. We are working on an ``action refinement'' method [20] that exploits prior knowledge information about the similarity of actions to speed up the learning process. In this approach, the set of available actions is larger, but in order to not slow down the learning process, the actions are grouped into subsets of similar actions. Early in the learning process, the Reinforcement Learning algorithm treats each subset of similar actions as a single ``abstract'' action, estimating $P(s'\vert s,a)$ not only from the execution of action , but also from the execution of its similar actions. This action abstraction is later on stopped, and then each action is treated on its own, thus, refining the values of $P(s'\vert s,a)$ learned with abstraction.