Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Frans A. Oliehoek, Matthijs T. J. Spaan, and Nikos Vlassis. Optimal and Approximate Q-value Functions for Decentralized POMDPs. Journal of Artificial Intelligence Research, 32:289–353, 2008.

Download

pdf [613.0kB] ps.gz [403.8kB] ps HTML

Abstract

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

BibTeX Entry

@Article{Oliehoek08JAIR,
    author =        {Frans A. Oliehoek and Matthijs T. J. Spaan and Nikos
                    Vlassis},
    title =         {Optimal and Approximate {Q}-value Functions for
                    Decentralized {POMDPs}},
    journal =       JAIR,
    year =          2008,
    volume =        {32},
    pages =         {289--353},
    url =           {https://doi.org/10.1613/jair.2447},
    doi =           {10.1613/jair.2447},
    abstract = {
    Decision-theoretic planning is a popular approach to sequential
    decision making problems, because it treats uncertainty in sensing
    and acting in a principled way. In single-agent frameworks like
    MDPs and POMDPs, planning can be carried out by resorting to
    Q-value functions: an optimal Q-value function Q* is computed in a
    recursive manner by dynamic programming, and then an optimal
    policy is extracted from Q*.  In this paper we study whether
    similar Q-value functions can be defined for decentralized POMDP
    models (Dec-POMDPs), and how policies can be extracted from such
    value functions. We define two forms of the optimal Q-value
    function for Dec-POMDPs: one that gives a normative description as
    the Q-value function of an optimal pure joint policy and another
    one that is sequentially rational and thus gives a recipe for
    computation. This computation, however, is infeasible for all but
    the smallest problems. Therefore, we analyze various approximate
    Q-value functions that allow for efficient computation. We
    describe how they relate, and we prove that they all provide an
    upper bound to the optimal Q-value function Q*. Finally, unifying
    some previous approaches for solving Dec-POMDPs, we describe a
    family of algorithms for extracting policies from such Q-value
    functions, and perform an experimental evaluation on existing test
    problems, including a new firefighting benchmark problem.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Jun 25, 2024 12:39:45 UTC