Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Difference Rewards Policy Gradients

Jacopo Castellini, Sam Devlin, Frans A. Oliehoek, and Rahul Savani. Difference Rewards Policy Gradients. Neural Computing and Applications, November 2022. Postproceedings of ALA'21 workshop, where the paper won the best paper award.

Download

pdf [16.0MB] ps.gz ps HTML

Abstract

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning.A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing anagent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithmcalled Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learningdecentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforceavoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA),a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show theeffectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the differencerewards.

BibTeX Entry

@article{Castellini22NCA,
    author =    {Castellini, Jacopo and 
                 Devlin, Sam and
                 Oliehoek, Frans A. and 
                 Savani, Rahul},
    title =     {Difference Rewards Policy Gradients},
    journal =   {Neural Computing and Applications},
    year =      2022,
    month =     nov,
    doi  =      {10.1007/s00521-022-07960-5},
    url =       {https://doi.org/10.1007/s00521-022-07960-5},
    note =      {Postproceedings of ALA'21 workshop, where the paper won the \textbf{best paper award}.},
    keywords =  {refereed},
    abstract = {
Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning.
A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an
agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm
called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning
decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce
avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA),
a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the
effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference
rewards.        
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Nov 05, 2024 16:13:37 UTC