Introduction

MultiAgentDecisionProcess (MADP) is a toolbox for scientific research in decision-theoretic planning and learning in multiagent systems. It is designed to be rather general, but most effort has been put in planning algorithms for discrete Dec-POMDPs.

The PDF doc/MADPToolbox.pdf provides more general background about MADP models, and documents general design principles and details about indices and history representations.

Authors: Frans Oliehoek, Matthijs Spaan, Philipp Robbel, João Messias

MADP Libraries

The framework consists of several parts, grouped in different libraries. The base library (libMADPBase) contains:

A representation of the basic elements in a decision process such as states, (joint) actions and observations. See State, StateDiscrete, Action, ActionDiscrete, Observation, ObservationDiscrete, JointAction, JointActionDiscrete, JointObservation, JointObservationDiscrete, Agent.

A representation of the transition, observation and reward models in a multiagent decision process. These models can also be stored in a sparse fashion. See TransitionModelMapping, TransitionModelMappingSparse, ObservationModelMapping, ObservationModelMappingSparse, RewardModelMapping.

A uniform representation for MADP problems, which provides an interface to a problem's model parameters. See MultiAgentDecisionProcessInterface, MultiAgentDecisionProcessDiscreteInterface, DecPOMDPDiscreteInterface, POSGDiscreteInterface, TransitionObservationIndependentMADPDiscrete, TOIDecMDPDiscrete, TOIDecPOMDPDiscrete, TOIFactoredRewardDecPOMDPDiscrete, TOICompactRewardDecPOMDPDiscrete.

Auxiliary functionality regarding manipulating indices, exception handling and printing: E, IndexTools, PrintTools. Some project-wide definitions are stored in the Globals namespace.

The parser library (libMADPParser) only requires the base library, and contains:

A parser for dpomdp problem specifications, which is a fileformat for discrete Dec-POMDPs. A set of benchmark problem files can be found in the problems/ directory, and the dpomdp syntax is documented in example.dpomdp. The format is based on Tony's POMDP file format, and the formal specification is found in dpomdp.spirit. The parser uses the Boost Spirit library. See MADPParser.

Parsers ParserTOIDecMDPDiscrete, ParserTOIDecPOMDPDiscrete, ParserTOIFactoredRewardDecPOMDPDiscrete, and ParserTOICompactRewardDecPOMDPDiscrete, which are all derived from the .dpomdp parser. These can be accessed in a uniform way by using MADPParser.

The support library (libMADPSupport) contains basic data types and support useful for planning:

A representation for (joint) histories, for storing and manipulating observation, action and action-observation histories. See ActionHistory, ObservationHistory, ActionObservationHistory, JointActionHistory, JointObservationHistory, JointActionObservationHistory.

A representation for (joint) beliefs, both stored as a full vector as well as a sparse one. See JointBeliefInterface, JointBelief, JointBeliefSparse.

Functionality for representing (joint) policies, as mappings from histories to actions. See JointPolicyPureVector, JointPolicyDiscretePure, JointPolicyDiscrete.

An implementation of the DecTiger problem which does not use dectiger.dpomdp, see ProblemDecTiger. Also an implementation of ProblemFireFighting.

Shared functionality for discrete MADP planning algorithms, collect in PlanningUnitMADPDiscrete and PlanningUnitDecPOMDPDiscrete. Computes (joint) history trees, joint beliefs, and value functions.

Functionality for handling command-line arguments is provided by ArgumentHandlers.

Finally, the planning library (libMADPplanning) contains functionality for planning algorithms, as well as some solution methods.

Dec-POMDP solution algorithms: BruteForceSearchPlanner, JESPExhaustivePlanner, JESPDynamicProgrammingPlanner, DICEPSPlanner, GMAA_kGMAA, and GMAA_MAAstar.

POMDP solution techniques: Perseus.

Functionality for building and solving Bayesian Games: see BayesianGame and BayesianGameIdenticalPayoffInterface.

Heuristic Q-functions: QMDP, QPOMDP, and QBG.

A simple simulator to empirically test the control quality of a solution, see SimulationDecPOMDPDiscrete.

Programs using the MADP libraries

In the src/examples/ and src/utils/ directories are a number of programs included that use the MADP libraries. Running each binary with as argument –help will display a short summary of usage.

JESP runs the JESPDynamicProgrammingPlanner on a dpomdp problem specification, for instance
```
JESP -h 3 <PATH_TO>/dectiger.dpomdp
```
or
```
JESP -h 3 DT
```
runs JESP for horizon 3 on the DecTiger problem. First one parses the dectiger.dpomdp file, the second one uses the ProblemDecTiger class. Many more problem files are provided in the problems/ directory.

BFS runs the BruteForceSearchPlanner, JESP the JESPExhaustivePlanner or JESPDynamicProgrammingPlanner, DICEPS the DICEPSPlanner, and GMAA runs the GMAA variations (MAAstar or Forward Search Policy Sweep).

Perseus runs the Perseus POMDP or BG planner.

printProblem loads a dpomdp problem description and prints it to standard out. printJointPolicyPureVector prints out a particular joint policy given its index.

evaluateJointPolicyPureVector simulates a particular joint policy for a problem. evaluateRandomPolicy uses a policy that chooses actions uniformly at random.

analyzeRewardResults and getAvgReward print information about the expected reward of simulation runs, saved using SimulationResult::Save().

Author Info and Acknowledgments

=Main Authors=

Frans Oliehoek <fao@liverpool.ac.uk>
Department of Computer Science, University of Liverpool
Liverpool, United Kingdom
AMLab, University of Amsterdam
Amsterdam, The Netherlands

Matthijs Spaan <m.t.j.spaan@tudelft.nl>
Delft University of Technology
Delft, The Netherlands

Bas Terwijn <bterwijn@uva.nl>
AMLab, University of Amsterdam
Amsterdam, The Netherlands

João Messias   <jmessias@isr.ist.utl.pt>
Institute for Systems and Robotics (ISR), Instituto Superior Técnico (IST)
Lisbon, Portugal

Philipp Robbel <robbel@mit.edu>
Media Lab, Massachusetts Institute of Technology
Cambridge, MA, USA

F.O. is funded by NWO Innovational Research Incentives Scheme Veni
\#639.021.336.  Previous development efforts were funded (in part) by AFOSR
MURI project \#FA9550-09-1-0538, and the Interactive Collaborative Information
Systems (ICIS) project, supported by the Dutch Ministry of Economic Affairs,
grant nr: BSIK03024. 

M.S. was funded by the FP7 Marie Curie Actions Individual Fellowship #275217
(FP7-PEOPLE-2010-IEF) and was previously supported by Fundacao para a Ciencia
e a Tecnologia (ISR/IST pluriannual funding) through the POS_Conhecimento
Program that includes FEDER funds and through grant PTDC/EEA-ACR/73266/2006.

=Other Contributors=
Philipp Beau
Abdeslam Boularias
Timon Kanters
Francisco Melo
Julian Kooij
Tiago Veiga
Erwin Walraven
Xuanjie Liu