Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Shi Yuan Tang, Athirai A. Irissappane, Frans A. Oliehoek, and Jie Zhang. Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork. In Proceedings of the Twentieth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1308–1316, May 2021. Invited for JAAMAS fast track

Download

pdf [2.0MB]

Abstract

Cross-Entropy Method (CEM) is a gradient-free direct policy searchmethod, which has greater stability and is insensitive to hyper-parameter tuning. CEM bears similarity to population-based evo-lutionary methods, but, rather than using a population it uses adistribution over candidate solutions (policies in our case). Usu-ally, a natural exponential family distribution such as multivariateGaussian is used to parameterize the policy distribution. Using amultivariate Gaussian limits the quality of CEM policies as thesearch becomes confined to a less representative subspace. Weaddress this drawback by using an adversarially-trained hypernet-work, enabling a richer and complex representation of the policydistribution. To achieve better training stability and faster conver-gence, we use a multivariate Gaussian CEM policy to guide ouradversarial training process. Experiments demonstrate that our ap-proach outperforms state-of-the-art CEM-based methods by 15.8%in terms of rewards while achieving faster convergence. Resultsalso show that our approach is less sensitive to hyper-parametersthan other deep-RL methods such as REINFORCE, DDPG and DQN.

BibTeX Entry

@inproceedings{Tang21AAMAS,
    author= {Shi Yuan Tang and
            Athirai A. Irissappane and
            Frans A. Oliehoek and
            Jie Zhang},
    title =     {Learning Complex Policy Distribution with {CEM} Guided Adversarial Hypernetwork},
    booktitle = AAMAS21,
    year =      2021,
    month =     may,
    pages =     {1308--1316},
    keywords =   {refereed},
    note =      {\textbf{Invited for JAAMAS fast track}},
    abstract = {
Cross-Entropy Method (CEM) is a gradient-free direct policy search
method, which has greater stability and is insensitive to hyper-
parameter tuning. CEM bears similarity to population-based evo-
lutionary methods, but, rather than using a population it uses a
distribution over candidate solutions (policies in our case). Usu-
ally, a natural exponential family distribution such as multivariate
Gaussian is used to parameterize the policy distribution. Using a
multivariate Gaussian limits the quality of CEM policies as the
search becomes confined to a less representative subspace. We
address this drawback by using an adversarially-trained hypernet-
work, enabling a richer and complex representation of the policy
distribution. To achieve better training stability and faster conver-
gence, we use a multivariate Gaussian CEM policy to guide our
adversarial training process. Experiments demonstrate that our ap-
proach outperforms state-of-the-art CEM-based methods by 15.8\%
in terms of rewards while achieving faster convergence. Results
also show that our approach is less sensitive to hyper-parameters
than other deep-RL methods such as REINFORCE, DDPG and DQN.        
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Nov 05, 2024 16:13:37 UTC