MultiAgentDecisionProcess
DICEPSPlanner Class Reference

DICEPSPlanner implements the Direct Cross-Entropy Policy Search method. More...

#include <DICEPSPlanner.h>

Inheritance diagram for DICEPSPlanner:
[legend]

Public Member Functions

 DICEPSPlanner (size_t horizon, DecPOMDPDiscreteInterface *p, size_t nrRestarts, size_t nrIterations, size_t nrSamples, size_t nrSamplesForUpdate, bool use_hard_threshold, double CEalpha, size_t nrEvalRuns, const PlanningUnitMADPDiscreteParameters *params=0, bool convergenceStats=false, std::ofstream *convergenceStatsFile=0, int verbose=0)
 Constructor. More...
 
double GetExpectedReward (void) const
 Returns the expected reward of the best found joint policy. More...
 
boost::shared_ptr< JointPolicyGetJointPolicy ()
 Returns the found joint policy. More...
 
boost::shared_ptr
< JointPolicyDiscrete
GetJointPolicyDiscrete ()
 
JPPV_sharedPtr GetJointPolicyPureVector ()
 Returns the found joint policy. More...
 
void Plan ()
 The methods that performs the planning according to the CE for Dec-POMDP algorithm. More...
 
- Public Member Functions inherited from PlanningUnitDecPOMDPDiscrete
void ExportDecPOMDPFile (const std::string &filename) const
 Exports the Dec-POMDP to file named filename. More...
 
double GetDiscount () const
 Returns the discount parameter. More...
 
DecPOMDPDiscreteInterfaceGetDPOMDPD () const
 Returns the DecPOMDPDiscreteInterface pointer. More...
 
double GetReward (Index sI, Index jaI) const
 Return the reward for state, joint action indices. More...
 
 PlanningUnitDecPOMDPDiscrete (size_t horizon=3, DecPOMDPDiscreteInterface *p=0, const PlanningUnitMADPDiscreteParameters *params=0)
 the (default) Constructor. More...
 
void SetProblem (DecPOMDPDiscreteInterface *p)
 Tell which SetReferred to use by default. More...
 
- Public Member Functions inherited from PlanningUnitMADPDiscrete
virtual bool AreCachedJointToIndivIndices (PolicyGlobals::PolicyDomainCategory pdc) const
 Check whether certain index conversions are cached. More...
 
void ComputeHistoryArrays (Index hI, Index t, Index t_offset, Index Indices[], size_t indexDomainSize) const
 This function computes the indices of the sequence corr. More...
 
Index ComputeHistoryIndex (Index t, Index t_offset, const Index indices[], size_t indexDomainSize) const
 This function computes the index of a history. More...
 
void ExportDotGraph (const std::string &filename, const PolicyDiscretePure &policy, Index agentI, bool labelEdges=true) const
 Export a policy in dot format (from the GraphViz tools). More...
 
const ActionGetAction (Index agentI, Index a) const
 Returns a ref to the a-th action of agent agentI. More...
 
void GetActionHistoryArrays (Index agentI, Index ahI, Index t, Index aIs[]) const
 Computes the joint actions for jahI. More...
 
ActionHistoryTreeGetActionHistoryTree (Index agentI, Index ohI) const
 Returns a pointer to action history# ohI of agent# agentI. More...
 
void GetActionObservationHistoryArrays (Index agentI, Index aohI, Index t, Index aIs[], Index oIs[]) const
 Computes the joint actions and observations for aohI. More...
 
Index GetActionObservationHistoryIndex (Index agentI, Index t, const std::vector< Index > &actions, const std::vector< Index > &observations) const
 converts the vectors of actions and observations of length t to a (individual) ObservationHistory Index for agentI. More...
 
ActionObservationHistoryTreeGetActionObservationHistoryTree (Index agentI, Index aohI) const
 Returns a pointer to observation history# ohI of agent# agentI. More...
 
virtual
PolicyGlobals::PolicyDomainCategory 
GetDefaultIndexDomCat () const
 Return the default PolicyDomainCategory for the problem. More...
 
LIndex GetFirstJointActionObservationHistoryIndex (Index ts) const
 Returns the index of the first joint action observation history of time step ts. More...
 
LIndex GetFirstJointObservationHistoryIndex (Index ts) const
 Returns the index of the first joint observation history of time step ts. More...
 
LIndex GetFirstObservationHistoryIndex (Index agI, Index ts) const
 Returns the index of the first ts observation history of agent agI. More...
 
double GetInitialStateProbability (Index sI) const
 Returns the probability of a state in the initial state distribution. More...
 
double GetJAOHProb (LIndex jaohI, Index p_jaohI=0, const JointBeliefInterface *p_jb=NULL, const JointPolicyDiscrete *jpol=NULL) const
 returns the probability of jaohI. More...
 
double GetJAOHProbGivenPred (LIndex jaohI) const
 Gives the conditional probability of the realization of the joint action-observation history jaohI (and thus of the joint belief corresponding to JointActionObservationHistory jaohI). More...
 
double GetJAOHProbs (JointBeliefInterface *jb, LIndex jaohI, LIndex p_jaohI=0, const JointBeliefInterface *p_jb=NULL, const JointPolicyDiscrete *jpol=NULL) const
 returns the probability of jaohI AND the corresponding joint belief given the predecessor p_jaohI (and its corresponding belief) More...
 
double GetJAOHProbsRecursively (JointBeliefInterface *jb, const Index jaIs[], const Index joIs[], Index t_p, Index t, LIndex p_jaohI=0, const JointPolicyDiscrete *jpol=NULL) const
 the function that perfoms most of the work, called by GetJAOHProbs. More...
 
const JointActionGetJointAction (Index jaI) const
 Returns a ref to the i-th joint action. More...
 
Index GetJointActionHistoryIndex (Index t, const std::vector< Index > &jointActions) const
 converts the vector joint observations of length t to a JointObservationHistory Index. More...
 
Index GetJointActionHistoryIndex (Index t, const Index jointActions[]) const
 converts the vector joint observations of length t to a JointObservationHistory Index. More...
 
Index GetJointActionHistoryIndex (JointActionHistoryTree *joh) const
 Returns the index of a JointActionHistoryTree pointer. More...
 
JointActionHistoryTreeGetJointActionHistoryTree (Index jahI) const
 Returns a pointer to joint action history#. More...
 
void GetJointActionObservationHistoryArrays (LIndex jaohI, Index t, Index jaIs[], Index joIs[]) const
 Computes the joint actions and observations for jaohI. More...
 
LIndex GetJointActionObservationHistoryIndex (JointActionObservationHistoryTree *jaoh) const
 Returns the index of a JointActionObservationHistoryTree pointer. More...
 
Index GetJointActionObservationHistoryIndex (Index t, const std::vector< Index > &Jactions, const std::vector< Index > &Jobservations) const
 converts the vectors of actions and observations of length t to a joint ActionObservationHistory Index. More...
 
Index GetJointActionObservationHistoryIndex (Index t, const std::vector< Index > &Jactions, const std::vector< Index > &Jobservations, const Scope &agentSope) const
 like GetJointActionObservationHistoryIndex, but works on a subset of agents. More...
 
JointActionObservationHistoryTreeGetJointActionObservationHistoryTree (LIndex jaohI) const
 Returns a pointer to JointActionObservation history#. More...
 
void GetJointActionObservationHistoryVectors (LIndex jaohI, std::vector< Index > &jaIs, std::vector< Index > &joIs) const
 returns the vectors with joint actions and observations for JointActionObservationHistory jaohi. More...
 
JointBeliefInterfaceGetJointBeliefInterface (LIndex jaohI) const
 brief Returns a pointer to a new joint belief. More...
 
const JointObservationGetJointObservation (Index joI) const
 Returns a ref to the joI-th joint observation. More...
 
void GetJointObservationHistoryArrays (Index johI, Index t, Index joIs[]) const
 Computes the joint observations for johI. More...
 
Index GetJointObservationHistoryIndex (JointObservationHistoryTree *joh) const
 Returns the index of a JointObservationHistoryTree pointer. More...
 
Index GetJointObservationHistoryIndex (Index t, const std::vector< Index > &jointObservations) const
 converts the vector joint observations of length t to a JointObservationHistory Index. More...
 
Index GetJointObservationHistoryIndex (Index t, const Index jointObservations[]) const
 converts the vector joint observations of length t to a JointObservationHistory Index. More...
 
JointObservationHistoryTreeGetJointObservationHistoryTree (Index johI) const
 Returns a pointer to joint observation history#. More...
 
MultiAgentDecisionProcessDiscreteInterfaceGetMADPDI ()
 
const
MultiAgentDecisionProcessDiscreteInterface
GetMADPDI () const
 
JointBeliefInterfaceGetNewJointBeliefFromISD () const
 Returns a new joint belief with the value of the initial state distribution. More...
 
virtual JointBeliefInterfaceGetNewJointBeliefInterface () const
 a function that forces derives classes to specify which types of joint beliefs are used. More...
 
virtual JointBeliefInterfaceGetNewJointBeliefInterface (size_t size) const
 
size_t GetNrActionHistories (Index agentI) const
 Returns the number of action histories for agentI. More...
 
size_t GetNrActionHistories (Index agentI, Index ts) const
 Returns the number of action histories for agentI for time step ts. More...
 
size_t GetNrActionObservationHistories (Index agentI) const
 Returns the number of action observation histories for agentI. More...
 
const std::vector< size_t > & GetNrActions () const
 Returns the number of actions vector. More...
 
size_t GetNrActions (Index agentI) const
 Returns the number of actions of agent agentI. More...
 
size_t GetNrAgents () const
 Gets the number of agents. More...
 
size_t GetNrJointActionHistories () const
 Returns the number of joint action histories. More...
 
size_t GetNrJointActionObservationHistories () const
 Returns the number of jointActionObservation histories. More...
 
size_t GetNrJointActions () const
 return the number of joint actions. More...
 
size_t GetNrJointObservationHistories () const
 Returns the number of joint observation histories. More...
 
size_t GetNrJointObservationHistories (Index ts) const
 Returns the number of joint observation histories for time step ts. More...
 
size_t GetNrJointObservations () const
 Returns the number of joint observations. More...
 
LIndex GetNrJointPolicies () const
 Returns the number of joint policies. More...
 
size_t GetNrObservationHistories (Index agentI) const
 Returns the number of observation histories for agentI. More...
 
size_t GetNrObservationHistories (Index agentI, Index ts) const
 Returns the number of observation histories for agentI for time step ts. More...
 
const std::vector< size_t > GetNrObservationHistoriesVector () const
 Returns a vector with the number of OHs for each agent. More...
 
const std::vector< size_t > GetNrObservationHistoriesVector (Index ts) const
 Returns a vector with the number of OHs in stage ts for each agent. More...
 
const std::vector< size_t > & GetNrObservations () const
 Returns the number of observations vector. More...
 
size_t GetNrObservations (Index agentI) const
 Returns the number of observations of agent agentI. More...
 
LIndex GetNrPolicies (Index agentI) const
 Returns the number of policies for agentI. More...
 
size_t GetNrPolicyDomainElements (Index agentI, PolicyGlobals::PolicyDomainCategory cat, size_t depth=MAXHORIZON) const
 Get the number of elements in the domain of an agent's policy. More...
 
size_t GetNrStates () const
 Returns the number of states. More...
 
const ObservationGetObservation (Index agentI, Index o) const
 Returns a ref to the o-th observation of agent agentI. More...
 
void GetObservationHistoryArrays (Index agentI, Index ohI, Index t, Index oIs[]) const
 Computes the observations for ohI. More...
 
Index GetObservationHistoryIndex (Index agentI, Index t, const std::vector< Index > &observations) const
 converts the vector observations of length t to a (individual) ObservationHistory Index for agentI. More...
 
ObservationHistoryTreeGetObservationHistoryTree (Index agentI, Index ohI) const
 Returns a pointer to observation history# ohI of agent# agentI. More...
 
const ObservationModelDiscreteGetObservationModelDiscretePtr () const
 
double GetObservationProbability (Index jaI, Index sucSI, Index joI) const
 Returns P(joI | jaI, sucSI ). Arguments are time-ordered. More...
 
const
PlanningUnitMADPDiscreteParameters
GetParams () const
 Get the parameters for this planning unit. More...
 
const
MultiAgentDecisionProcessDiscreteInterface
GetProblem () const
 Returns a reference to the problem of the PlanningUnitMADPDiscrete. More...
 
MultiAgentDecisionProcessDiscreteInterfaceGetProblem ()
 
const StateGetState (Index i) const
 Get a pointer to a State by index. More...
 
Index GetSuccessorAHI (Index agentI, Index ohI, Index oI) const
 
Index GetSuccessorAOHI (Index agI, Index aohI, Index aI, Index oI) const
 Returns the index of the successor of joint action-observation history jaohI via joint action jaI and joint observation joI. More...
 
Index GetSuccessorJAHI (Index johI, Index joI) const
 
LIndex GetSuccessorJAOHI (LIndex jaohI, Index jaI, Index joI) const
 Returns the index of the successor of agent agI's action-observation history aohI via action aI and observation oI. More...
 
Index GetSuccessorJOHI (Index johI, Index joI) const
 Returns the index of the successor of observation history johI via joint observation joI. More...
 
Index GetSuccessorOHI (Index agentI, Index ohI, Index oI) const
 Returns the index of the successor of observation history ohI of agentI via observation joI. More...
 
Index GetTimeStepForAHI (Index agentI, Index ohI) const
 Returns the time step of observation history ohI. More...
 
Index GetTimeStepForAOHI (Index agentI, Index aohI) const
 Returns the time step of joint action-observation historyaohI. More...
 
Index GetTimeStepForJAHI (Index johI) const
 Returns the time step of joint observation history johI. More...
 
Index GetTimeStepForJAOHI (LIndex jaohI) const
 Returns the time step of joint action-observation historyjaohI. More...
 
Index GetTimeStepForJOHI (Index johI) const
 Returns the time step of joint observation history johI. More...
 
Index GetTimeStepForOHI (Index agentI, Index ohI) const
 Returns the time step of observation history ohI. More...
 
const TransitionModelDiscreteGetTransitionModelDiscretePtr () const
 
double GetTransitionProbability (Index sI, Index jaI, Index sucSI) const
 Returns the trans. prob for state, joint action, suc state indices. More...
 
Index IndividualToJointActionHistoryIndex (Index t, const std::vector< Index > &indivIs) const
 converts individual history indices to a joint index More...
 
Index IndividualToJointActionIndices (const Index *indivActionIndices) const
 Returns the joint action index that corresponds to the array of specified individual action indices. More...
 
Index IndividualToJointActionIndices (const std::vector< Index > &indivActionIndices) const
 Returns the joint action index that corresponds to the vector of specified individual action indices. More...
 
LIndex IndividualToJointActionObservationHistoryIndex (Index t, const std::vector< Index > &indivIs) const
 converts individual history indices to a joint index More...
 
Index IndividualToJointObservationHistoryIndex (Index t, const std::vector< Index > &indivIs) const
 converts individual history indices to a joint index More...
 
Index IndividualToJointObservationIndices (const std::vector< Index > &inObs) const
 Returns the joint observation index that corresponds to the vector of specified individual observation indices. More...
 
void JointAOHIndexToIndividualActionObservationVectors (LIndex jaohI, std::vector< std::vector< Index > > &indivO_vec, std::vector< std::vector< Index > > &indivA_vec) const
 computes the vectors of actions and obs. More...
 
void JointAOHIndexToIndividualActionObservationVectors (LIndex jaohI, std::vector< std::vector< Index > > &indivAO_vec) const
 computes the vector of action-observations corresponding to jaohI indivAO_vec[agentI][t] = aoI More...
 
std::vector< IndexJointToIndividualActionHistoryIndices (Index JAHistI) const
 Returns a vector containing the indices of the individual ObservationHistory s corresponding to the JointActionHistory index JAHistI. More...
 
const std::vector< Index > & JointToIndividualActionHistoryIndicesRef (Index JAHistI) const
 Returns a reference to a cached vector containing the indices of the indiv. More...
 
std::vector< IndexJointToIndividualActionIndices (Index jaI) const
 Returns a vector containing the indices of the indiv. More...
 
std::vector< IndexJointToIndividualActionObservationHistoryIndices (LIndex jaohI) const
 Returns a vector containing the indices of the indiv. More...
 
const std::vector< Index > & JointToIndividualActionObservationHistoryIndicesRef (LIndex jaohI) const
 Returns a vector containing the indices of the indiv. More...
 
std::vector< IndexJointToIndividualObservationHistoryIndices (Index johI) const
 Returns a vector containing the indices of the indiv. More...
 
const std::vector< Index > & JointToIndividualObservationHistoryIndicesRef (Index johI) const
 Returns a vector containing the indices of the indiv. More...
 
std::vector< IndexJointToIndividualObservationIndices (Index joI) const
 Returns a vector containing the indices of the indiv. More...
 
std::vector< IndexJointToIndividualPolicyDomainIndices (Index jdI, PolicyGlobals::PolicyDomainCategory cat) const
 Converts joint indices to individual policy domain element indices. More...
 
const std::vector< Index > & JointToIndividualPolicyDomainIndicesRef (Index jdI, PolicyGlobals::PolicyDomainCategory cat) const
 Converts individual policy domain element indices to joint indices. More...
 
 PlanningUnitMADPDiscrete (size_t horizon=3, MultiAgentDecisionProcessDiscreteInterface *p=0, const PlanningUnitMADPDiscreteParameters *params=0)
 Constructor with specified parameters. More...
 
 PlanningUnitMADPDiscrete (size_t horizon=3, MultiAgentDecisionProcessDiscreteInterface *p=0)
 Constructor with default parameters. DEPRECATED?: More...
 
std::string PolicyToDotGraph (const PolicyDiscretePure &policy, Index agentI, bool labelEdges=true) const
 Convert a policy to dot format (from the GraphViz tools). More...
 
void Print ()
 Prints info regarding the planning unit. More...
 
void PrintActionHistories ()
 Prints the action histories for all agents. More...
 
void PrintActionObservationHistories ()
 Prints the actionObservation histories for all agents. More...
 
void PrintObservationHistories ()
 Prints the observation histories for all agents. More...
 
void RegisterJointActionObservationHistoryTree (JointActionObservationHistoryTree *jaoht)
 Register a new jaoht in the vector of indices. More...
 
void SetHorizon (size_t h)
 Sets the horizon for the planning problem. More...
 
void SetParams (const PlanningUnitMADPDiscreteParameters &params)
 Sets the parameters for this planning unit. More...
 
void SetProblem (MultiAgentDecisionProcessDiscreteInterface *madp)
 Sets the problem for which to plan, using a pointer. More...
 
std::string SoftPrintAction (Index agentI, Index actionI) const
 soft prints action actionI of agent agentI. More...
 
std::string SoftPrintObservationHistory (Index agentI, Index ohIndex) const
 soft prints ObservationHistory ohIndex of agent agentI. More...
 
std::string SoftPrintPolicyDomainElement (Index agentI, Index dI, PolicyGlobals::PolicyDomainCategory cat) const
 Virtual function that has to be implemented by derived class. More...
 
 ~PlanningUnitMADPDiscrete ()
 Destructor. More...
 
- Public Member Functions inherited from PlanningUnit
size_t GetHorizon () const
 Returns the planning horizon. More...
 
Index GetNextAgentIndex ()
 Maintains a agent index and returns the next one on calling */. More...
 
size_t GetNrAgents () const
 Return the number of agents. More...
 
const
MultiAgentDecisionProcessInterface
GetProblem () const
 Get the problem pointer. More...
 
int GetSeed () const
 Returns the random seed stored. More...
 
void InitSeed () const
 Initializes the random number generator (srand) to the stored seed. More...
 
 PlanningUnit (size_t horizon, MultiAgentDecisionProcessInterface *p)
 (default) Constructor More...
 
void SetProblem (MultiAgentDecisionProcessInterface *p)
 Updates the problem pointer. More...
 
void SetSeed (int s)
 Stores the random seed and calls InitSeed(). More...
 
virtual ~PlanningUnit ()
 Destructor. More...
 
- Public Member Functions inherited from Interface_ProblemToPolicyDiscretePure
LIndex GetNrJointPolicies (PolicyGlobals::PolicyDomainCategory cat, size_t depth=MAXHORIZON) const
 Get the number of joint policies, given the policy's domain. More...
 
LIndex GetNrPolicies (Index ag, PolicyGlobals::PolicyDomainCategory cat, size_t depth=MAXHORIZON) const
 Get the number of policies for an agent, given the policy's domain. More...
 
virtual ~Interface_ProblemToPolicyDiscretePure ()
 Destructor. More...
 
- Public Member Functions inherited from Interface_ProblemToPolicyDiscrete
size_t GetNrJointActions () const
 Get the number of joint actions. More...
 
 Interface_ProblemToPolicyDiscrete ()
 (default) Constructor More...
 
virtual ~Interface_ProblemToPolicyDiscrete ()
 Destructor. More...
 
- Public Member Functions inherited from TimedAlgorithm
void AddTimedEvent (const std::string &id, clock_t duration)
 Adds event of certain duration, e.g., an external program call. More...
 
std::vector< double > GetTimedEventDurations (const std::string &id)
 Returns all stored durations (in s) for a particular event. More...
 
void LoadTimers (const std::string &filename)
 Load timing info from file filename. More...
 
void PrintTimers () const
 Print stored timing info. More...
 
void PrintTimersSummary () const
 Sums data and prints out a summary. More...
 
void SaveTimers (const std::string &filename) const
 Save collected timing info to file filename. More...
 
void SaveTimers (std::ofstream &of) const
 Save collected timing info to ofstream of. More...
 
void StartTimer (const std::string &id) const
 Start to time an event identified by id. More...
 
void StopTimer (const std::string &id) const
 Stop to time an event identified by id. More...
 
 TimedAlgorithm ()
 (default) Constructor More...
 
virtual ~TimedAlgorithm ()
 Destructor. More...
 

Protected Member Functions

double ApproximateEvaluate (JointPolicyDiscrete &jpol, int nrRuns)
 
void UpdateCEProbDistribution (vector< vector< vector< double > > > &Xi, const list< JPPVValuePair * > &best_samples)
 
- Protected Member Functions inherited from PlanningUnitDecPOMDPDiscrete
bool SanityCheck () const
 Runs some consistency tests. More...
 

Static Protected Member Functions

static void OrderedInsertJPPVValuePair (JPPVValuePair *pv, list< JPPVValuePair * > &l)
 
static void PrintBestSamples (const list< JPPVValuePair * > &l)
 
static void SampleIndividualPolicy (PolicyPureVector &pol, const vector< vector< double > > &ohistActionProbs)
 

Private Attributes

double _m_alpha
 
double _m_expectedRewardFoundPolicy
 
JPPV_sharedPtr _m_foundPolicy
 
size_t _m_nrEvalRuns
 
size_t _m_nrIterations
 
size_t _m_nrJointPoliciesForUpdate
 
size_t _m_nrRestarts
 
size_t _m_nrSampledJointPolicies
 
std::ofstream * _m_outputConvergenceFile
 
bool _m_outputConvergenceStatistics
 
bool _m_use_gamma
 
int _m_verbose
 

Additional Inherited Members

- Static Public Member Functions inherited from PlanningUnitDecPOMDPDiscrete
static void ExportDecPOMDPFile (const std::string &filename, const DecPOMDPDiscreteInterface *decpomdp)
 Exports the Dec-POMDP represented by pu to file named filename. More...
 
- Protected Attributes inherited from PlanningUnitMADPDiscrete
std::vector< ActionHistoryTree * > _m_actionHistoryTreeRootPointers
 A vector that stores pointers to the roots of the action history trees of each agent. More...
 
std::vector< std::vector
< ActionHistoryTree * > > 
_m_actionHistoryTreeVectors
 A vector which, for each agents, stores a vector with all ActionHistoryTree pointers. More...
 
std::vector
< ActionObservationHistoryTree * > 
_m_actionObservationHistoryTreeRootPointers
 A vector that stores pointers to the roots of the action-observation history trees of each agent. More...
 
std::vector< std::vector
< ActionObservationHistoryTree * > > 
_m_actionObservationHistoryTreeVectors
 A vector which, for each agents, stores a vector with all ActionObservationHistoryTree pointers. More...
 
std::vector< std::vector
< LIndex > > 
_m_firstAHIforT
 The _m_firstAHIforT[aI][t] contains the first action history for time-step t of agent aI. More...
 
std::vector< std::vector
< LIndex > > 
_m_firstAOHIforT
 The _m_firstAOHIforT[aI][t] contains the first actionObservation history for time-step t of agent aI. More...
 
std::vector< LIndex_m_firstJAHIforT
 The _m_firstJAHIforT[t] contains the first joint action history for time-step t. More...
 
std::vector< LIndex_m_firstJAOHIforT
 _m_firstJAOHIforT[t] contains the first joint actionObservation history for time-step t. More...
 
std::vector< LIndex_m_firstJOHIforT
 The _m_firstJOHIforT[t] contains the first joint observation history for time-step t. More...
 
std::vector< std::vector
< LIndex > > 
_m_firstOHIforT
 The _m_firstOHIforT[aI][t] contains the first observation history for time-step t of agent aI. More...
 
std::vector< double > _m_jaohConditionalProbs
 Stores the conditional probability of this joint belief. More...
 
std::vector< double > _m_jaohProbs
 Caches the probabilities of JointActionObservationHistory's (assuming b^0 is as specified by the problem and that a pure joint policy consistent with the i-th JointActionObservationHistory is followed). More...
 
std::vector< const
JointBeliefInterface * > 
_m_jBeliefCache
 _m_jBeliefCache[i] stores a pointer to the joint belief corresponding to the i-th JointActionObservationHistory (assuming b^0 is as specified by the problem and that a pure joint policy consistent with the i-th JointActionObservationHistory is followed) More...
 
JointActionHistoryTree_m_jointActionHistoryTreeRoot
 The root node of the joint action histories tree. More...
 
std::vector
< JointActionHistoryTree * > 
_m_jointActionHistoryTreeVector
 A vector which stores a JointActionHistoryTree pointer. More...
 
std::map< LIndex,
JointActionObservationHistoryTree * > 
_m_jointActionObservationHistoryTreeMap
 A map which is used instead of _m_jointActionObservationHistoryTreeVector when we don't cache all JointActionObservationHistoryTree's. More...
 
JointActionObservationHistoryTree_m_jointActionObservationHistoryTreeRoot
 The root node of the joint actionObservation histories tree. More...
 
std::vector
< JointActionObservationHistoryTree * > 
_m_jointActionObservationHistoryTreeVector
 A vector which stores JointActionObservationHistoryTree pointer. More...
 
JointObservationHistoryTree_m_jointObservationHistoryTreeRoot
 The root node of the joint observation histories tree. More...
 
std::vector
< JointObservationHistoryTree * > 
_m_jointObservationHistoryTreeVector
 A vector which stores a JointObservationHistoryTree pointer. More...
 
std::vector< size_t > _m_nrActionHistories
 A vector that keeps track of the number of action histories per agent. More...
 
std::vector< std::vector
< size_t > > 
_m_nrActionHistoriesT
 A vector that keeps track of the number of action histories per agent per time step. More...
 
std::vector< size_t > _m_nrActionObservationHistories
 A vector that keeps track of the number of action-obs. More...
 
std::vector< std::vector
< size_t > > 
_m_nrActionObservationHistoriesT
 Keeps track of the number of action-obs. More...
 
size_t _m_nrJointActionHistories
 The number of joint action histories. More...
 
std::vector< size_t > _m_nrJointActionHistoriesT
 The number of joint action histories per time-step. More...
 
size_t _m_nrJointActionObservationHistories
 The number of joint actionAction histories. More...
 
std::vector< size_t > _m_nrJointActionObservationHistoriesT
 The number of joint actionObservation histories per time-step. More...
 
size_t _m_nrJointObservationHistories
 The number of joint observation histories. More...
 
std::vector< size_t > _m_nrJointObservationHistoriesT
 The number of joint observation histories per time-step. More...
 
std::vector< size_t > _m_nrObservationHistories
 A vector that keeps track of the number of observation histories per agent. More...
 
std::vector< std::vector
< size_t > > 
_m_nrObservationHistoriesT
 Keeps track of the number of observation histories per agent per time step. More...
 
std::vector
< ObservationHistoryTree * > 
_m_observationHistoryTreeRootPointers
 A vector that stores pointers to the roots of the observation history trees of each agent. More...
 
std::vector< std::vector
< ObservationHistoryTree * > > 
_m_observationHistoryTreeVectors
 A vector which, for each agents, stores a vector with all ObservationHistoryTree pointers. More...
 

Detailed Description

DICEPSPlanner implements the Direct Cross-Entropy Policy Search method.

The algorithm is described in refDICEPS (see DOC-References.h).

Constructor & Destructor Documentation

DICEPSPlanner::DICEPSPlanner ( size_t  horizon,
DecPOMDPDiscreteInterface p,
size_t  nrRestarts,
size_t  nrIterations,
size_t  nrSamples,
size_t  nrSamplesForUpdate,
bool  use_hard_threshold,
double  CEalpha,
size_t  nrEvalRuns,
const PlanningUnitMADPDiscreteParameters params = 0,
bool  convergenceStats = false,
std::ofstream *  convergenceStatsFile = 0,
int  verbose = 0 
)

Member Function Documentation

double DICEPSPlanner::ApproximateEvaluate ( JointPolicyDiscrete jpol,
int  nrRuns 
)
protected
double DICEPSPlanner::GetExpectedReward ( void  ) const
inlinevirtual

Returns the expected reward of the best found joint policy.

Implements PlanningUnitDecPOMDPDiscrete.

boost::shared_ptr<JointPolicy> DICEPSPlanner::GetJointPolicy ( void  )
inlinevirtual

Returns the found joint policy.

Reimplemented from PlanningUnitDecPOMDPDiscrete.

boost::shared_ptr<JointPolicyDiscrete> DICEPSPlanner::GetJointPolicyDiscrete ( void  )
inline
JPPV_sharedPtr DICEPSPlanner::GetJointPolicyPureVector ( void  )
inlinevirtual

Returns the found joint policy.

Reimplemented from PlanningUnitDecPOMDPDiscrete.

void DICEPSPlanner::OrderedInsertJPPVValuePair ( JPPVValuePair pv,
list< JPPVValuePair * > &  l 
)
staticprotected

References JointPolicyValuePair::GetValue().

Referenced by Plan().

void DICEPSPlanner::PrintBestSamples ( const list< JPPVValuePair * > &  l)
staticprotected

References JointPolicyValuePair::GetValue().

Referenced by Plan().

void DICEPSPlanner::SampleIndividualPolicy ( PolicyPureVector pol,
const vector< vector< double > > &  ohistActionProbs 
)
staticprotected

References PolicyPureVector::SetAction().

Referenced by Plan().

void DICEPSPlanner::UpdateCEProbDistribution ( vector< vector< vector< double > > > &  Xi,
const list< JPPVValuePair * > &  best_samples 
)
protected

Member Data Documentation

double DICEPSPlanner::_m_alpha
private
double DICEPSPlanner::_m_expectedRewardFoundPolicy
private

Referenced by Plan().

JPPV_sharedPtr DICEPSPlanner::_m_foundPolicy
private

Referenced by Plan().

size_t DICEPSPlanner::_m_nrEvalRuns
private

Referenced by DICEPSPlanner(), and Plan().

size_t DICEPSPlanner::_m_nrIterations
private

Referenced by DICEPSPlanner(), and Plan().

size_t DICEPSPlanner::_m_nrJointPoliciesForUpdate
private

Referenced by DICEPSPlanner(), and Plan().

size_t DICEPSPlanner::_m_nrRestarts
private

Referenced by DICEPSPlanner(), and Plan().

size_t DICEPSPlanner::_m_nrSampledJointPolicies
private

Referenced by DICEPSPlanner(), and Plan().

std::ofstream* DICEPSPlanner::_m_outputConvergenceFile
private

Referenced by DICEPSPlanner().

bool DICEPSPlanner::_m_outputConvergenceStatistics
private

Referenced by DICEPSPlanner(), and Plan().

bool DICEPSPlanner::_m_use_gamma
private

Referenced by DICEPSPlanner(), and Plan().

int DICEPSPlanner::_m_verbose
private

Referenced by DICEPSPlanner(), and Plan().