Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Leveraging diverse offline data in POMDPs with unobserved confounders

Oussama Azizi, Philip Boeken, Onno Zoeter, Frans A Oliehoek, and Matthijs T. J. Spaan. Leveraging diverse offline data in POMDPs with unobserved confounders. In Seventeenth European Workshop on Reinforcement Learning (EWRL), October 2024.

Download

pdf [615.6kB]

Abstract

In many Reinforcement Learning (RL) applications, offline data is readily available before an algorithm is deployed. Often, however, data-collection policies have had access to information that is not recorded in the dataset, requiring the RL agent to take unobserved confounders into account. We focus on the setting where the confounders are i.i.d. and, without additional assumptions on the strength of the confounding, we derive tight bounds for the causal effects of the actions on the observations and reward. In particular, we show that these bounds are tight when we leverage multiple datasets collected from diverse behavioral policies. We incorporate these bounds into Posterior Sampling for Reinforcement Learning (PSRL) and demonstrate their efficacy experimentally.

BibTeX Entry

@inproceedings{Azizi24EWRL,
    title=      {Leveraging diverse offline data in {POMDP}s with unobserved confounders},
    author=     {Oussama Azizi and Philip Boeken and Onno Zoeter and Frans A Oliehoek and Matthijs T. J. Spaan},
    booktitle={Seventeenth European Workshop on Reinforcement Learning (EWRL)},
    year=       2024,
    month =     oct,
    OPTurl=        {https://openreview.net/forum?id=uSheVlIgzc}
    keywords =   {refereed},
    abstract=
        {In many Reinforcement Learning (RL) applications, offline data is
        readily available before an algorithm is deployed. Often, however,
        data-collection policies have had access to information that is not
        recorded in the dataset, requiring the RL agent to take unobserved
        confounders into account. We focus on the setting where the
        confounders are i.i.d. and, without additional assumptions on the
        strength of the confounding, we derive tight bounds for the causal
        effects of the actions on the observations and reward. In
        particular, we show that these bounds are tight when we leverage
        multiple datasets collected from diverse behavioral policies. We
        incorporate these bounds into Posterior Sampling for Reinforcement
        Learning (PSRL) and demonstrate their efficacy experimentally.}
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Nov 05, 2024 16:13:37 UTC