[Home / contact]
Home | Contact | me@academia | me@linkedin
[Research]
Overview | Dec-POMDPs | PhD Thesis
[Publications]
My publication page | google scholar | DBLP
[Software]
Software | directly to MADP!
[Teaching]
Teaching Overview | COMP219 AI
[Various]
Various stuff
[Vacancies]
Vacancies

MADP toolbox Examples

Here we demonstrate some simple examples of MADP.

Note: many more simplified programs are included in the MADP release. You can find these under src/examples.

Example of .dpomdp files: dectiger.dpomdp

A typical way to use the MADP toolbox is via so-called .(d)pomdp files. Some example problems are found in the 'problems' directory. One of them is 'dectiger.dpomdp' in which two agents are standing in a hallway with two doors. Behind one of the doors is a tiger, behind the other a treasure. Independently each agent at each time step hears a sound from the left or right door and can choose to listen for the tiger or open a door. For a complete description see Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings by Nair, R., Tambe, M., Yokoo, M., Pynadath, D. V., & Marsella, S.

To use the GMAA planner to find the best policy for 3 time steps we can execute the command:

  ./src/solvers/GMAA-ICE  -h 3 ./problems/dectiger.dpomdp

  ./src/solvers/GMAA -v  -h 3 ./problems/dectiger.dpomdp

The former uses clustering and incremental expansion, while the latter is the plain form of GMAA, resulting in the policy in Figure 1 where for each possible observation sequence the best action is given (the policy for both agents is the same in this problem). The expected value of the total reward for this policy is 5.19081.

...
value=5.19081
policy:
JointPolicyPureVectorForClusteredBG, depth 3
() -> listen
(hear-left) -> listen
(hear-left,hear-left) -> open-right
(hear-left,hear-right) -> listen
(hear-right) -> listen
(hear-right,hear-left) -> listen
(hear-right,hear-right) -> open-left
() -> listen
(hear-left) -> listen
(hear-left,hear-left) -> open-right
(hear-left,hear-right) -> listen
(hear-right) -> listen
(hear-right,hear-left) -> listen
(hear-right,hear-right) -> open-left
...

Figure 1:

Output of the above command

An example using the Dec-Tiger C++ Class

Here we give an example of how to use the MADP toolbox. Figure 2 provides the full source code listing of a simple program.

1  #include "ProblemDecTiger.h"
2  #include "JESPExhaustivePlanner.h"
3  int main()
4  {
5      ProblemDecTiger dectiger;
6      JESPExhaustivePlanner jesp(3,&dectiger);
7      jesp.Plan();
8      cout << jesp.GetExpectedReward() << endl;
9      cout << jesp.GetJointPolicy()->SoftPrint() << endl;
10     return(0);
11 }

Figure 2:

A small example program that runs JESP on the DecTiger problem.

It uses exhaustive JESP to plan for 3 time steps for the DecTiger problem, and prints out the computed value as well as the policy. Line 5 constructs an instance of the DecTiger problem directly, without the need to parse dectiger.dpomdp. Line 6 instantiates the planner, with as arguments the planning horizon and a reference to the problem it should consider. Line 7 invokes the actual planning and lines 8 and 9 print out the results.

This is a simple but complete program, and in the distribution (in src/examples) more elaborate examples are provided which, for instance, demonstrate the command-line parsing functionality and the use of the .dpomdp parser.

src/examples>  ./decTigerJESP 
Value computed for DecTiger horizon 3: 5.19081
Policy computed:
JointPolicyPureVector index 120340 depth 999999
Policy for agent 0 (index 55):
Oempty,  --> a00:Listen
Oempty, o00:HearLeft,  --> a00:Listen
Oempty, o01:HearRight,  --> a00:Listen
Oempty, o00:HearLeft, o00:HearLeft,  --> a02:OpenRight
Oempty, o00:HearLeft, o01:HearRight,  --> a00:Listen
Oempty, o01:HearRight, o00:HearLeft,  --> a00:Listen
Oempty, o01:HearRight, o01:HearRight,  --> a01:OpenLeft
Policy for agent 1 (index 55):
Oempty,  --> a10:Listen
Oempty, o10:HearLeft,  --> a10:Listen
Oempty, o11:HearRight,  --> a10:Listen
Oempty, o10:HearLeft, o10:HearLeft,  --> a12:OpenRight
Oempty, o10:HearLeft, o11:HearRight,  --> a10:Listen
Oempty, o11:HearRight, o10:HearLeft,  --> a10:Listen
Oempty, o11:HearRight, o11:HearRight,  --> a11:OpenLeft

Figure 3:

Output of the above program.

Performing a Simulation with a Computed Policy

As a final example, we show how to set up a simulation of a computed (in this case multiagent MDP (MMDP)) policy. In particular, we will cover the relevant parts of src/examples/example_MMDP_SolveAndSimulate.cpp. Let us start with looking at main:

1   int main(int argc, char **argv)
2   {
3     ArgumentHandlers::Arguments args;
4     argp_parse (&ArgumentHandlers::theArgpStruc, argc, 
5                                       argv, 0, 0, &args);
6     try
7     {
8     cout << "Instantiating the problem..." << endl;
9     DecPOMDPDiscreteInterface* decpomdp = 
                GetDecPOMDPDiscreteInterfaceFromArgs(args);
10    cout << "...done." << endl;
11    PlanningUnitDecPOMDPDiscrete *np = new 
                NullPlanner(args.horizon, decpomdp);
13    MDPValueIteration vi(*np);
14    vi.Plan();
15    QTable q = vi.GetQTable(0); 
16 
17    int nrRuns = args.nrRuns; //500;
18    int seed = args.randomSeed;
19    SimulationDecPOMDPDiscrete sim(*np, nrRuns, seed);
20    vector avgRewards;
21
23    double r;
24    r = runOneSimulation(q, np, sim );
25    avgRewards.push_back(r);
26    cout << "Avg rewards: " << SoftPrintVector(avgRewards) 
                                                    << endl;
27
28    }
29    catch(E& e){ e.Print(); }
30    return(0);
31  }

Figure 4:

main.

This program shows some more interesting functionality:

It shows that MADP uses the argp library for command line handling.
It also demonstrates the exception handling employed in MADP ("E" and derived objects).
It demonstrates the use of the "GetDecPOMDPDiscreteInterfaceFromArgs" function, which uses the parsed command line option to load the specified multiagent decision problem (a DecPOMDPDiscreteInterface*).
It shows the use of a NullPlanner; a planner that does nothing itself, but provides functionality to the MDPValueIteration class.
The planning is performed (on line 14) for an infinite horizon, which means that there is a stationary value function, which is represented with time index 0. This is why line 15 gets the value function of stage 0 (with "GetQTable(0)").
A simulator is instantiated on line 19, except for a reference to the NullPlanner and a random seed, it takes nrRuns (how often the simulation is performed) from the command line arguments
On Line 23 an actual simulation is run by a function "runOneSimulation" defined in the same file.

Alright, so in order to make full sense of this, we will need to have a look at "runOneSimulation":

1   double runOneSimulation(const QTable & q, 
2         const PlanningUnitDecPOMDPDiscrete *np,
3         const SimulationDecPOMDPDiscrete &sim
4         )
5   {
6     AgentMDP agent(np, 0, q);
7     AgentFullyObservable *newAgent;
8     vector agents;
9                                                            
10    for(Index i=0; i < np->GetNrAgents(); i++)
11    {
13        newAgent=new AgentMDP(agent);
14        newAgent->SetIndex(i);
15        agents.push_back(newAgent);
16    }
17    SimulationResult result=sim.RunSimulations(agents);
18    double avgReward=result.GetAvgReward();
19                                                           
20    for(Index i=0; i < np->GetNrAgents(); i++)
21        delete agents[i];
23                                                           
24    return(avgReward);
25  }

Figure 5:

runOneSimulation.

Hopefully, this code is relatively easy to read. Essentially, all the agents (of type AgentMDP) are created and put into a vector of agents (called "agents"). In order to facilitate this process, line first creates a template agent, which is copied "GetNrAgents()" times on line 13.

Once all the agents are created, the "RunSimulations" of the SimulationDecPOMDPDiscrete objects is called with these agents as argument, thus performing the simulations and returning the average reward of the nrRuns runs.

To run the program:

$ cd src/examples
$ ./example_MMDP_SolveAndSimulate DT -g -0.99 -h 10
Instantiating the problem...
...done.
Avg rewards: < 200 >

Figure 1:

Output of example_MMDP_SolveAndSimulate

As we see, the Dectiger problem is (obviously) quite trivial when stripping away the partial observability and decentralized aspect: the agents simply open the right door 10 times in a row, leading to a reward of 10*20=200.

This example also demonstrates that for Dec-Tiger we can use 'DT' to specify the C++ class implementation of Dectiger. Currently this is supported for 5 problems: DT, FF, FFF, FFG, Aloha. See also the help provided for the command line options:

$ ./example_MMDP_SolveAndSimulate --help