Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Canmanie T. Ponnambalam, Danial Kamran, Thiago Dias Simão, Frans A. Oliehoek, and Matthijs T. J. Spaan. Back to the Future: Solving Hidden Parameter MDPs with Hindsight. In Proceedings of the AAMAS Workshop on Adaptive Learning Agents (ALA), May 2022.

Download

pdf [3.3MB]

Abstract

Reinforcement learning is limited by how the task is defined at the start of learning and is generally inflexible to accommodating new information during training. In contrast, humans are capable of learning from hindsight and can easily incorporate new information to gain insight into past experience. Humans also learn in a more modular fashion that facilitates transfer of knowledge across many different types of problems, resulting in flexible and sample efficient learning. This ability is often missing in reinforcement learning, as agents should generally be trained from scratch even when there are minor disruptions or changes in the environment. We aim to empower reinforcement learning agents with a modular approach that allows learning from hindsight, giving them the ability to learn from their past experience after new information is revealed. We address partially-observable problems that can be modeled as hidden parameter MDPs, where crucial state information is not observable during action selection but is later revealed. Our work focuses on the benefits of separating the tasks of policy optimization and hidden parameter estimation. By decoupling the two, we enable more data-efficient learning that is flexible to changes in the environment and can readily make use of existing predictors or offline data-sets. We demonstrate in discrete and continuous experiments that learning from hindsight offers scalable and sample efficient performance in HiP-MDPs and enables transfer of knowledge between tasks.

BibTeX Entry

@inproceedings{Ponnambalam22ALA,
author = {Ponnambalam, Canmanie T. and
Kamran, Danial and
Simão, Thiago Dias and
Oliehoek, Frans A. and
Spaan, Matthijs T. J.},
title = {Back to the Future: Solving Hidden Parameter {MDPs} with Hindsight},
booktitle = ALA22,
year = 2022,
month = may,
keywords = {refereed},
abstract = {
Reinforcement learning is limited by how the task is defined at the start of learning and is generally inflexible to accommodating new information during training. In contrast, humans are capable of learning from hindsight and can easily incorporate new information to gain insight into past experience. Humans also learn in a more modular fashion that facilitates transfer of knowledge across many different types of problems, resulting in flexible and sample efficient learning. This ability is often missing in reinforcement learning, as agents should generally be trained from scratch even when there are minor disruptions or changes in the environment. We aim to empower reinforcement learning agents with a modular approach that allows learning from hindsight, giving them the ability to learn from their past experience after new information is revealed. We address partially-observable problems that can be modeled as hidden parameter MDPs, where crucial state information is not observable during action selection but is later revealed. Our work focuses on the benefits of separating the tasks of policy optimization and hidden parameter estimation. By decoupling the two, we enable more data-efficient learning that is flexible to changes in the environment and can readily make use of existing predictors or offline data-sets. We demonstrate in discrete and continuous experiments that learning from hindsight offers scalable and sample efficient performance in HiP-MDPs and enables transfer of knowledge between tasks.
}
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Nov 05, 2024 16:13:37 UTC