Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

The Cross-Entropy Method for Policy Search in Decentralized POMDPs

Frans A. Oliehoek, Julian F. P. Kooij, and Nikos Vlassis. The Cross-Entropy Method for Policy Search in Decentralized POMDPs. Informatica, 32:341–357, 2008.

Download

pdf [322.2kB]

Abstract

Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.

BibTeX Entry

@Article{Oliehoek08Informatica,
    author =        {Frans A. Oliehoek and Julian F. P. Kooij and Nikos
                    Vlassis},
    title =         {The Cross-Entropy Method for Policy Search in 
                    Decentralized {POMDPs}},
    journal =       {Informatica},
    year =          2008,
    volume =        {32},
    pages =         {341--357},
    url =           {http://www.informatica.si/index.php/informatica/article/view/208},
    abstract = {
    Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as
    models for multiagent planning under uncertainty, but solving a
    Dec-POMDP exactly is known to be an intractable combinatorial
    optimization problem. In this paper we apply the Cross-Entropy (CE)
    method, a recently introduced method for combinatorial optimization, to
    Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for
    approximately solving Dec-POMDPs. This algorithm operates by sampling
    pure policies from an appropriately parametrized stochastic policy, and
    then evaluates these policies either exactly or approximately in order
    to define the next stochastic policy to sample from, and so on until
    convergence.  Experimental results demonstrate that the CE method can
    search huge spaces efficiently, supporting our claim that combinatorial
    optimization methods can bring leverage to the approximate solution of
    Dec-POMDPs.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Nov 05, 2024 16:13:37 UTC