Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

Quality Assessment of MORL Algorithms: A Utility-Based Approach

Luisa M. Zintgraf, Timon V. Kanters, Diederik M. Roijers, Frans A. Oliehoek, and Philipp Beau. Quality Assessment of MORL Algorithms: A Utility-Based Approach. In Benelearn 2015: Proceedings of the 24th Annual Machine Learning Conference of Belgium and the Netherlands, 2015.

Download

pdf [292.2kB]  

Abstract

Sequential decision-making problems with multiple objectives occur often in practice. In such settings, the utility of a policy depends on how the user values different trade-offs between the objectives. Such valuations can be expressed by a so-called scalarisation function. However, the exact scalarisation function can be unknown when the agents should learn or plan. Therefore, instead of a single solution, the agents aim to produce a solution set that contains an optimal solution for all possible scalarisations. Because it is often not possible to produce an exact solution set, many algorithms have been proposed that produce approximate solution sets instead. We argue that when comparing these algorithms we should do so on the basis of user utility, and on a wide range of problems. In practice however, comparison of the quality of these algorithms have typically been done with only a few limited benchmarks and metrics that do not directly express the utility for the user. In this paper, we propose two metrics that express either the expected utility, or the maximal utility loss with respect to the optimal solution set. Furthermore, we propose a generalised benchmark in order to compare algorithms more reliably.

BibTeX Entry

@inproceedings{Zintgraf15Benelearn,
    author =    {Luisa M. Zintgraf and
                 Timon V. Kanters and
                 Diederik M. Roijers and
                 Frans A. Oliehoek and
                 Philipp Beau},
    title =     {Quality Assessment of {MORL} Algorithms: A Utility-Based Approach},
    booktitle = Benelearn15,
    year =      2015,
    abstract = {
    Sequential decision-making problems with multiple objectives occur
    often in practice. In such settings, the utility of a policy depends on
    how the user values different trade-offs between the objectives. Such
    valuations can be expressed by a so-called scalarisation function.
    However, the exact scalarisation function can be unknown when the
    agents should learn or plan. Therefore, instead of a single solution,
    the agents aim to produce a solution set that contains an optimal
    solution for all possible scalarisations.  Because it is often not
    possible to produce an exact solution set, many algorithms have been
    proposed that produce approximate solution sets instead. We argue that
    when comparing these algorithms we should do so on the basis of user
    utility, and on a wide range of problems. In practice however,
    comparison of the quality of these algorithms have typically been done
    with only a few limited benchmarks and metrics that do not directly
    express the utility for the user.  In this paper, we propose two
    metrics that express either the expected utility, or the maximal
    utility loss with respect to the optimal solution set. Furthermore, we
    propose a generalised benchmark in order to compare algorithms more
    reliably.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Wed Jul 11, 2018 09:29:59 UTC