Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

Solving Transition-Independent Multi-agent MDPs with Sparse Interactions

Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, and Mathijs de Weerdt. Solving Transition-Independent Multi-agent MDPs with Sparse Interactions. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3174–3180, February 2016.
Also see the extended version on arXiv.

Download

pdf [593.5kB]  

Abstract

In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully- observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.

BibTeX Entry

@inproceedings{Scharpff16AAAI,
    author =    {Joris Scharpff and Diederik M. Roijers and Frans A. Oliehoek 
                 and Matthijs T. J. Spaan and Mathijs de Weerdt},
    title =     {Solving Transition-Independent Multi-agent {MDPs} with Sparse Interactions},
    booktitle = AAAI16,
    year =      2016,
    month =     Feb,
    pages =     {3174--3180},
    wwwnote =   {Also see the <a href="b2hd-Scharpff16arxiv.html">extended version</a> on arXiv.},
    abstract = {
    In cooperative multi-agent sequential decision making under uncertainty, agents
    must coordinate to find an optimal joint policy that maximises joint value.
    Typical algorithms exploit additive structure in the value function, but in
    the fully- observable multi-agent MDP (MMDP) setting such structure is not
    present. We propose a new optimal solver for transition-independent MMDPs,
    in which agents can only affect their own state but their reward depends on
    joint transitions. We represent these dependencies compactly in conditional
    return graphs (CRGs). Using CRGs the value of a joint policy and the bounds
    on partially specified joint policies can be efficiently computed. We
    propose CoRe, a novel branch-and-bound policy search algorithm building on
    CRGs. CoRe typically requires less runtime than the available alternatives
    and finds solutions to previously unsolvable problems.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Wed Jul 11, 2018 09:29:59 UTC