TR03-0001 (February)

Title
Cluster BP and cluster CCCP : Two simple methods for computing Kikuchi approximations

Author(s)
Taisuke Sato

Contact person
Taisuke Sato

Abstract
Loopy BP (belief propagation) came out of the UAI community and it is expected to be an efficient approximation method for computing marginal distributions not only for Bayesian networks but for distributions specified by a product of potentials in general.
In this paper, we first look back on the derivation of loopy BP from the Bethe approximation to variational free energy, and its generalization to the Kikuchi approximation. We then propose two new algorithms, cluster BP and cluster CCCP, which run on cluster graphs and are guaranteed to compute the Kikuchi approximation when the graph satisfies the tree condition. The latter is also guaranteed to compute a local minimum of the Kikuchi approximation.


TR03-0002 (June)

Title
Generating Self-Evaluations to Learn Appropriate Actions in Various Games

Author(s)
Koichi Moriyama and Masayuki Numao

Contact person
Koichi Moriyama (moriyama@mori.cs.titech.ac.jp)

Abstract
In game theory, the combination of each player's action goes to a Nash equilibrium. However, this combination is not optimal in some games such as the prisoner's dilemma (PD). Nevertheless, almost all multi-agent reinforcement learning algorithms aim to converge to a Nash equilibrium. There are, on the other hand, several methods aiming to depart from bad Nash equilibria and proceed to a better combination by handling rewards in PD-like games, but since they use fixed handling methods, they are worse in non-PD-like games. In this paper, we construct an agent learning appropriate actions in both PD-like and non-PD-like games through self-evaluations. The agent has two conditions for judging whether the game is like PD or not and two methods which generate self-evaluations according to the judgement. We conducted experiments in two kinds of game played by the agents.