MultiAgentDecisionProcess
|
DICEPSPlanner implements the Direct Cross-Entropy Policy Search method. More...
#include <DICEPSPlanner.h>
Public Member Functions | |
DICEPSPlanner (size_t horizon, DecPOMDPDiscreteInterface *p, size_t nrRestarts, size_t nrIterations, size_t nrSamples, size_t nrSamplesForUpdate, bool use_hard_threshold, double CEalpha, size_t nrEvalRuns, const PlanningUnitMADPDiscreteParameters *params=0, bool convergenceStats=false, std::ofstream *convergenceStatsFile=0, int verbose=0) | |
Constructor. More... | |
double | GetExpectedReward (void) const |
Returns the expected reward of the best found joint policy. More... | |
boost::shared_ptr< JointPolicy > | GetJointPolicy () |
Returns the found joint policy. More... | |
boost::shared_ptr < JointPolicyDiscrete > | GetJointPolicyDiscrete () |
JPPV_sharedPtr | GetJointPolicyPureVector () |
Returns the found joint policy. More... | |
void | Plan () |
The methods that performs the planning according to the CE for Dec-POMDP algorithm. More... | |
Public Member Functions inherited from PlanningUnitDecPOMDPDiscrete | |
void | ExportDecPOMDPFile (const std::string &filename) const |
Exports the Dec-POMDP to file named filename. More... | |
double | GetDiscount () const |
Returns the discount parameter. More... | |
DecPOMDPDiscreteInterface * | GetDPOMDPD () const |
Returns the DecPOMDPDiscreteInterface pointer. More... | |
double | GetReward (Index sI, Index jaI) const |
Return the reward for state, joint action indices. More... | |
PlanningUnitDecPOMDPDiscrete (size_t horizon=3, DecPOMDPDiscreteInterface *p=0, const PlanningUnitMADPDiscreteParameters *params=0) | |
the (default) Constructor. More... | |
void | SetProblem (DecPOMDPDiscreteInterface *p) |
Tell which SetReferred to use by default. More... | |
Public Member Functions inherited from PlanningUnitMADPDiscrete | |
virtual bool | AreCachedJointToIndivIndices (PolicyGlobals::PolicyDomainCategory pdc) const |
Check whether certain index conversions are cached. More... | |
void | ComputeHistoryArrays (Index hI, Index t, Index t_offset, Index Indices[], size_t indexDomainSize) const |
This function computes the indices of the sequence corr. More... | |
Index | ComputeHistoryIndex (Index t, Index t_offset, const Index indices[], size_t indexDomainSize) const |
This function computes the index of a history. More... | |
void | ExportDotGraph (const std::string &filename, const PolicyDiscretePure &policy, Index agentI, bool labelEdges=true) const |
Export a policy in dot format (from the GraphViz tools). More... | |
const Action * | GetAction (Index agentI, Index a) const |
Returns a ref to the a-th action of agent agentI. More... | |
void | GetActionHistoryArrays (Index agentI, Index ahI, Index t, Index aIs[]) const |
Computes the joint actions for jahI. More... | |
ActionHistoryTree * | GetActionHistoryTree (Index agentI, Index ohI) const |
Returns a pointer to action history# ohI of agent# agentI. More... | |
void | GetActionObservationHistoryArrays (Index agentI, Index aohI, Index t, Index aIs[], Index oIs[]) const |
Computes the joint actions and observations for aohI. More... | |
Index | GetActionObservationHistoryIndex (Index agentI, Index t, const std::vector< Index > &actions, const std::vector< Index > &observations) const |
converts the vectors of actions and observations of length t to a (individual) ObservationHistory Index for agentI. More... | |
ActionObservationHistoryTree * | GetActionObservationHistoryTree (Index agentI, Index aohI) const |
Returns a pointer to observation history# ohI of agent# agentI. More... | |
virtual PolicyGlobals::PolicyDomainCategory | GetDefaultIndexDomCat () const |
Return the default PolicyDomainCategory for the problem. More... | |
LIndex | GetFirstJointActionObservationHistoryIndex (Index ts) const |
Returns the index of the first joint action observation history of time step ts. More... | |
LIndex | GetFirstJointObservationHistoryIndex (Index ts) const |
Returns the index of the first joint observation history of time step ts. More... | |
LIndex | GetFirstObservationHistoryIndex (Index agI, Index ts) const |
Returns the index of the first ts observation history of agent agI. More... | |
double | GetInitialStateProbability (Index sI) const |
Returns the probability of a state in the initial state distribution. More... | |
double | GetJAOHProb (LIndex jaohI, Index p_jaohI=0, const JointBeliefInterface *p_jb=NULL, const JointPolicyDiscrete *jpol=NULL) const |
returns the probability of jaohI. More... | |
double | GetJAOHProbGivenPred (LIndex jaohI) const |
Gives the conditional probability of the realization of the joint action-observation history jaohI (and thus of the joint belief corresponding to JointActionObservationHistory jaohI). More... | |
double | GetJAOHProbs (JointBeliefInterface *jb, LIndex jaohI, LIndex p_jaohI=0, const JointBeliefInterface *p_jb=NULL, const JointPolicyDiscrete *jpol=NULL) const |
returns the probability of jaohI AND the corresponding joint belief given the predecessor p_jaohI (and its corresponding belief) More... | |
double | GetJAOHProbsRecursively (JointBeliefInterface *jb, const Index jaIs[], const Index joIs[], Index t_p, Index t, LIndex p_jaohI=0, const JointPolicyDiscrete *jpol=NULL) const |
the function that perfoms most of the work, called by GetJAOHProbs. More... | |
const JointAction * | GetJointAction (Index jaI) const |
Returns a ref to the i-th joint action. More... | |
Index | GetJointActionHistoryIndex (Index t, const std::vector< Index > &jointActions) const |
converts the vector joint observations of length t to a JointObservationHistory Index. More... | |
Index | GetJointActionHistoryIndex (Index t, const Index jointActions[]) const |
converts the vector joint observations of length t to a JointObservationHistory Index. More... | |
Index | GetJointActionHistoryIndex (JointActionHistoryTree *joh) const |
Returns the index of a JointActionHistoryTree pointer. More... | |
JointActionHistoryTree * | GetJointActionHistoryTree (Index jahI) const |
Returns a pointer to joint action history#. More... | |
void | GetJointActionObservationHistoryArrays (LIndex jaohI, Index t, Index jaIs[], Index joIs[]) const |
Computes the joint actions and observations for jaohI. More... | |
LIndex | GetJointActionObservationHistoryIndex (JointActionObservationHistoryTree *jaoh) const |
Returns the index of a JointActionObservationHistoryTree pointer. More... | |
Index | GetJointActionObservationHistoryIndex (Index t, const std::vector< Index > &Jactions, const std::vector< Index > &Jobservations) const |
converts the vectors of actions and observations of length t to a joint ActionObservationHistory Index. More... | |
Index | GetJointActionObservationHistoryIndex (Index t, const std::vector< Index > &Jactions, const std::vector< Index > &Jobservations, const Scope &agentSope) const |
like GetJointActionObservationHistoryIndex, but works on a subset of agents. More... | |
JointActionObservationHistoryTree * | GetJointActionObservationHistoryTree (LIndex jaohI) const |
Returns a pointer to JointActionObservation history#. More... | |
void | GetJointActionObservationHistoryVectors (LIndex jaohI, std::vector< Index > &jaIs, std::vector< Index > &joIs) const |
returns the vectors with joint actions and observations for JointActionObservationHistory jaohi. More... | |
JointBeliefInterface * | GetJointBeliefInterface (LIndex jaohI) const |
brief Returns a pointer to a new joint belief. More... | |
const JointObservation * | GetJointObservation (Index joI) const |
Returns a ref to the joI-th joint observation. More... | |
void | GetJointObservationHistoryArrays (Index johI, Index t, Index joIs[]) const |
Computes the joint observations for johI. More... | |
Index | GetJointObservationHistoryIndex (JointObservationHistoryTree *joh) const |
Returns the index of a JointObservationHistoryTree pointer. More... | |
Index | GetJointObservationHistoryIndex (Index t, const std::vector< Index > &jointObservations) const |
converts the vector joint observations of length t to a JointObservationHistory Index. More... | |
Index | GetJointObservationHistoryIndex (Index t, const Index jointObservations[]) const |
converts the vector joint observations of length t to a JointObservationHistory Index. More... | |
JointObservationHistoryTree * | GetJointObservationHistoryTree (Index johI) const |
Returns a pointer to joint observation history#. More... | |
MultiAgentDecisionProcessDiscreteInterface * | GetMADPDI () |
const MultiAgentDecisionProcessDiscreteInterface * | GetMADPDI () const |
JointBeliefInterface * | GetNewJointBeliefFromISD () const |
Returns a new joint belief with the value of the initial state distribution. More... | |
virtual JointBeliefInterface * | GetNewJointBeliefInterface () const |
a function that forces derives classes to specify which types of joint beliefs are used. More... | |
virtual JointBeliefInterface * | GetNewJointBeliefInterface (size_t size) const |
size_t | GetNrActionHistories (Index agentI) const |
Returns the number of action histories for agentI. More... | |
size_t | GetNrActionHistories (Index agentI, Index ts) const |
Returns the number of action histories for agentI for time step ts. More... | |
size_t | GetNrActionObservationHistories (Index agentI) const |
Returns the number of action observation histories for agentI. More... | |
const std::vector< size_t > & | GetNrActions () const |
Returns the number of actions vector. More... | |
size_t | GetNrActions (Index agentI) const |
Returns the number of actions of agent agentI. More... | |
size_t | GetNrAgents () const |
Gets the number of agents. More... | |
size_t | GetNrJointActionHistories () const |
Returns the number of joint action histories. More... | |
size_t | GetNrJointActionObservationHistories () const |
Returns the number of jointActionObservation histories. More... | |
size_t | GetNrJointActions () const |
return the number of joint actions. More... | |
size_t | GetNrJointObservationHistories () const |
Returns the number of joint observation histories. More... | |
size_t | GetNrJointObservationHistories (Index ts) const |
Returns the number of joint observation histories for time step ts. More... | |
size_t | GetNrJointObservations () const |
Returns the number of joint observations. More... | |
LIndex | GetNrJointPolicies () const |
Returns the number of joint policies. More... | |
size_t | GetNrObservationHistories (Index agentI) const |
Returns the number of observation histories for agentI. More... | |
size_t | GetNrObservationHistories (Index agentI, Index ts) const |
Returns the number of observation histories for agentI for time step ts. More... | |
const std::vector< size_t > | GetNrObservationHistoriesVector () const |
Returns a vector with the number of OHs for each agent. More... | |
const std::vector< size_t > | GetNrObservationHistoriesVector (Index ts) const |
Returns a vector with the number of OHs in stage ts for each agent. More... | |
const std::vector< size_t > & | GetNrObservations () const |
Returns the number of observations vector. More... | |
size_t | GetNrObservations (Index agentI) const |
Returns the number of observations of agent agentI. More... | |
LIndex | GetNrPolicies (Index agentI) const |
Returns the number of policies for agentI. More... | |
size_t | GetNrPolicyDomainElements (Index agentI, PolicyGlobals::PolicyDomainCategory cat, size_t depth=MAXHORIZON) const |
Get the number of elements in the domain of an agent's policy. More... | |
size_t | GetNrStates () const |
Returns the number of states. More... | |
const Observation * | GetObservation (Index agentI, Index o) const |
Returns a ref to the o-th observation of agent agentI. More... | |
void | GetObservationHistoryArrays (Index agentI, Index ohI, Index t, Index oIs[]) const |
Computes the observations for ohI. More... | |
Index | GetObservationHistoryIndex (Index agentI, Index t, const std::vector< Index > &observations) const |
converts the vector observations of length t to a (individual) ObservationHistory Index for agentI. More... | |
ObservationHistoryTree * | GetObservationHistoryTree (Index agentI, Index ohI) const |
Returns a pointer to observation history# ohI of agent# agentI. More... | |
const ObservationModelDiscrete * | GetObservationModelDiscretePtr () const |
double | GetObservationProbability (Index jaI, Index sucSI, Index joI) const |
Returns P(joI | jaI, sucSI ). Arguments are time-ordered. More... | |
const PlanningUnitMADPDiscreteParameters & | GetParams () const |
Get the parameters for this planning unit. More... | |
const MultiAgentDecisionProcessDiscreteInterface * | GetProblem () const |
Returns a reference to the problem of the PlanningUnitMADPDiscrete. More... | |
MultiAgentDecisionProcessDiscreteInterface * | GetProblem () |
const State * | GetState (Index i) const |
Get a pointer to a State by index. More... | |
Index | GetSuccessorAHI (Index agentI, Index ohI, Index oI) const |
Index | GetSuccessorAOHI (Index agI, Index aohI, Index aI, Index oI) const |
Returns the index of the successor of joint action-observation history jaohI via joint action jaI and joint observation joI. More... | |
Index | GetSuccessorJAHI (Index johI, Index joI) const |
LIndex | GetSuccessorJAOHI (LIndex jaohI, Index jaI, Index joI) const |
Returns the index of the successor of agent agI's action-observation history aohI via action aI and observation oI. More... | |
Index | GetSuccessorJOHI (Index johI, Index joI) const |
Returns the index of the successor of observation history johI via joint observation joI. More... | |
Index | GetSuccessorOHI (Index agentI, Index ohI, Index oI) const |
Returns the index of the successor of observation history ohI of agentI via observation joI. More... | |
Index | GetTimeStepForAHI (Index agentI, Index ohI) const |
Returns the time step of observation history ohI. More... | |
Index | GetTimeStepForAOHI (Index agentI, Index aohI) const |
Returns the time step of joint action-observation historyaohI. More... | |
Index | GetTimeStepForJAHI (Index johI) const |
Returns the time step of joint observation history johI. More... | |
Index | GetTimeStepForJAOHI (LIndex jaohI) const |
Returns the time step of joint action-observation historyjaohI. More... | |
Index | GetTimeStepForJOHI (Index johI) const |
Returns the time step of joint observation history johI. More... | |
Index | GetTimeStepForOHI (Index agentI, Index ohI) const |
Returns the time step of observation history ohI. More... | |
const TransitionModelDiscrete * | GetTransitionModelDiscretePtr () const |
double | GetTransitionProbability (Index sI, Index jaI, Index sucSI) const |
Returns the trans. prob for state, joint action, suc state indices. More... | |
Index | IndividualToJointActionHistoryIndex (Index t, const std::vector< Index > &indivIs) const |
converts individual history indices to a joint index More... | |
Index | IndividualToJointActionIndices (const Index *indivActionIndices) const |
Returns the joint action index that corresponds to the array of specified individual action indices. More... | |
Index | IndividualToJointActionIndices (const std::vector< Index > &indivActionIndices) const |
Returns the joint action index that corresponds to the vector of specified individual action indices. More... | |
LIndex | IndividualToJointActionObservationHistoryIndex (Index t, const std::vector< Index > &indivIs) const |
converts individual history indices to a joint index More... | |
Index | IndividualToJointObservationHistoryIndex (Index t, const std::vector< Index > &indivIs) const |
converts individual history indices to a joint index More... | |
Index | IndividualToJointObservationIndices (const std::vector< Index > &inObs) const |
Returns the joint observation index that corresponds to the vector of specified individual observation indices. More... | |
void | JointAOHIndexToIndividualActionObservationVectors (LIndex jaohI, std::vector< std::vector< Index > > &indivO_vec, std::vector< std::vector< Index > > &indivA_vec) const |
computes the vectors of actions and obs. More... | |
void | JointAOHIndexToIndividualActionObservationVectors (LIndex jaohI, std::vector< std::vector< Index > > &indivAO_vec) const |
computes the vector of action-observations corresponding to jaohI indivAO_vec[agentI][t] = aoI More... | |
std::vector< Index > | JointToIndividualActionHistoryIndices (Index JAHistI) const |
Returns a vector containing the indices of the individual ObservationHistory s corresponding to the JointActionHistory index JAHistI. More... | |
const std::vector< Index > & | JointToIndividualActionHistoryIndicesRef (Index JAHistI) const |
Returns a reference to a cached vector containing the indices of the indiv. More... | |
std::vector< Index > | JointToIndividualActionIndices (Index jaI) const |
Returns a vector containing the indices of the indiv. More... | |
std::vector< Index > | JointToIndividualActionObservationHistoryIndices (LIndex jaohI) const |
Returns a vector containing the indices of the indiv. More... | |
const std::vector< Index > & | JointToIndividualActionObservationHistoryIndicesRef (LIndex jaohI) const |
Returns a vector containing the indices of the indiv. More... | |
std::vector< Index > | JointToIndividualObservationHistoryIndices (Index johI) const |
Returns a vector containing the indices of the indiv. More... | |
const std::vector< Index > & | JointToIndividualObservationHistoryIndicesRef (Index johI) const |
Returns a vector containing the indices of the indiv. More... | |
std::vector< Index > | JointToIndividualObservationIndices (Index joI) const |
Returns a vector containing the indices of the indiv. More... | |
std::vector< Index > | JointToIndividualPolicyDomainIndices (Index jdI, PolicyGlobals::PolicyDomainCategory cat) const |
Converts joint indices to individual policy domain element indices. More... | |
const std::vector< Index > & | JointToIndividualPolicyDomainIndicesRef (Index jdI, PolicyGlobals::PolicyDomainCategory cat) const |
Converts individual policy domain element indices to joint indices. More... | |
PlanningUnitMADPDiscrete (size_t horizon=3, MultiAgentDecisionProcessDiscreteInterface *p=0, const PlanningUnitMADPDiscreteParameters *params=0) | |
Constructor with specified parameters. More... | |
PlanningUnitMADPDiscrete (size_t horizon=3, MultiAgentDecisionProcessDiscreteInterface *p=0) | |
Constructor with default parameters. DEPRECATED?: More... | |
std::string | PolicyToDotGraph (const PolicyDiscretePure &policy, Index agentI, bool labelEdges=true) const |
Convert a policy to dot format (from the GraphViz tools). More... | |
void | Print () |
Prints info regarding the planning unit. More... | |
void | PrintActionHistories () |
Prints the action histories for all agents. More... | |
void | PrintActionObservationHistories () |
Prints the actionObservation histories for all agents. More... | |
void | PrintObservationHistories () |
Prints the observation histories for all agents. More... | |
void | RegisterJointActionObservationHistoryTree (JointActionObservationHistoryTree *jaoht) |
Register a new jaoht in the vector of indices. More... | |
void | SetHorizon (size_t h) |
Sets the horizon for the planning problem. More... | |
void | SetParams (const PlanningUnitMADPDiscreteParameters ¶ms) |
Sets the parameters for this planning unit. More... | |
void | SetProblem (MultiAgentDecisionProcessDiscreteInterface *madp) |
Sets the problem for which to plan, using a pointer. More... | |
std::string | SoftPrintAction (Index agentI, Index actionI) const |
soft prints action actionI of agent agentI. More... | |
std::string | SoftPrintObservationHistory (Index agentI, Index ohIndex) const |
soft prints ObservationHistory ohIndex of agent agentI. More... | |
std::string | SoftPrintPolicyDomainElement (Index agentI, Index dI, PolicyGlobals::PolicyDomainCategory cat) const |
Virtual function that has to be implemented by derived class. More... | |
~PlanningUnitMADPDiscrete () | |
Destructor. More... | |
Public Member Functions inherited from PlanningUnit | |
size_t | GetHorizon () const |
Returns the planning horizon. More... | |
Index | GetNextAgentIndex () |
Maintains a agent index and returns the next one on calling */. More... | |
size_t | GetNrAgents () const |
Return the number of agents. More... | |
const MultiAgentDecisionProcessInterface * | GetProblem () const |
Get the problem pointer. More... | |
int | GetSeed () const |
Returns the random seed stored. More... | |
void | InitSeed () const |
Initializes the random number generator (srand) to the stored seed. More... | |
PlanningUnit (size_t horizon, MultiAgentDecisionProcessInterface *p) | |
(default) Constructor More... | |
void | SetProblem (MultiAgentDecisionProcessInterface *p) |
Updates the problem pointer. More... | |
void | SetSeed (int s) |
Stores the random seed and calls InitSeed(). More... | |
virtual | ~PlanningUnit () |
Destructor. More... | |
Public Member Functions inherited from Interface_ProblemToPolicyDiscretePure | |
LIndex | GetNrJointPolicies (PolicyGlobals::PolicyDomainCategory cat, size_t depth=MAXHORIZON) const |
Get the number of joint policies, given the policy's domain. More... | |
LIndex | GetNrPolicies (Index ag, PolicyGlobals::PolicyDomainCategory cat, size_t depth=MAXHORIZON) const |
Get the number of policies for an agent, given the policy's domain. More... | |
virtual | ~Interface_ProblemToPolicyDiscretePure () |
Destructor. More... | |
Public Member Functions inherited from Interface_ProblemToPolicyDiscrete | |
size_t | GetNrJointActions () const |
Get the number of joint actions. More... | |
Interface_ProblemToPolicyDiscrete () | |
(default) Constructor More... | |
virtual | ~Interface_ProblemToPolicyDiscrete () |
Destructor. More... | |
Public Member Functions inherited from TimedAlgorithm | |
void | AddTimedEvent (const std::string &id, clock_t duration) |
Adds event of certain duration, e.g., an external program call. More... | |
std::vector< double > | GetTimedEventDurations (const std::string &id) |
Returns all stored durations (in s) for a particular event. More... | |
void | LoadTimers (const std::string &filename) |
Load timing info from file filename. More... | |
void | PrintTimers () const |
Print stored timing info. More... | |
void | PrintTimersSummary () const |
Sums data and prints out a summary. More... | |
void | SaveTimers (const std::string &filename) const |
Save collected timing info to file filename. More... | |
void | SaveTimers (std::ofstream &of) const |
Save collected timing info to ofstream of. More... | |
void | StartTimer (const std::string &id) const |
Start to time an event identified by id. More... | |
void | StopTimer (const std::string &id) const |
Stop to time an event identified by id. More... | |
TimedAlgorithm () | |
(default) Constructor More... | |
virtual | ~TimedAlgorithm () |
Destructor. More... | |
Protected Member Functions | |
double | ApproximateEvaluate (JointPolicyDiscrete &jpol, int nrRuns) |
void | UpdateCEProbDistribution (vector< vector< vector< double > > > &Xi, const list< JPPVValuePair * > &best_samples) |
Protected Member Functions inherited from PlanningUnitDecPOMDPDiscrete | |
bool | SanityCheck () const |
Runs some consistency tests. More... | |
Static Protected Member Functions | |
static void | OrderedInsertJPPVValuePair (JPPVValuePair *pv, list< JPPVValuePair * > &l) |
static void | PrintBestSamples (const list< JPPVValuePair * > &l) |
static void | SampleIndividualPolicy (PolicyPureVector &pol, const vector< vector< double > > &ohistActionProbs) |
Private Attributes | |
double | _m_alpha |
double | _m_expectedRewardFoundPolicy |
JPPV_sharedPtr | _m_foundPolicy |
size_t | _m_nrEvalRuns |
size_t | _m_nrIterations |
size_t | _m_nrJointPoliciesForUpdate |
size_t | _m_nrRestarts |
size_t | _m_nrSampledJointPolicies |
std::ofstream * | _m_outputConvergenceFile |
bool | _m_outputConvergenceStatistics |
bool | _m_use_gamma |
int | _m_verbose |
Additional Inherited Members | |
Static Public Member Functions inherited from PlanningUnitDecPOMDPDiscrete | |
static void | ExportDecPOMDPFile (const std::string &filename, const DecPOMDPDiscreteInterface *decpomdp) |
Exports the Dec-POMDP represented by pu to file named filename. More... | |
Protected Attributes inherited from PlanningUnitMADPDiscrete | |
std::vector< ActionHistoryTree * > | _m_actionHistoryTreeRootPointers |
A vector that stores pointers to the roots of the action history trees of each agent. More... | |
std::vector< std::vector < ActionHistoryTree * > > | _m_actionHistoryTreeVectors |
A vector which, for each agents, stores a vector with all ActionHistoryTree pointers. More... | |
std::vector < ActionObservationHistoryTree * > | _m_actionObservationHistoryTreeRootPointers |
A vector that stores pointers to the roots of the action-observation history trees of each agent. More... | |
std::vector< std::vector < ActionObservationHistoryTree * > > | _m_actionObservationHistoryTreeVectors |
A vector which, for each agents, stores a vector with all ActionObservationHistoryTree pointers. More... | |
std::vector< std::vector < LIndex > > | _m_firstAHIforT |
The _m_firstAHIforT[aI][t] contains the first action history for time-step t of agent aI. More... | |
std::vector< std::vector < LIndex > > | _m_firstAOHIforT |
The _m_firstAOHIforT[aI][t] contains the first actionObservation history for time-step t of agent aI. More... | |
std::vector< LIndex > | _m_firstJAHIforT |
The _m_firstJAHIforT[t] contains the first joint action history for time-step t. More... | |
std::vector< LIndex > | _m_firstJAOHIforT |
_m_firstJAOHIforT[t] contains the first joint actionObservation history for time-step t. More... | |
std::vector< LIndex > | _m_firstJOHIforT |
The _m_firstJOHIforT[t] contains the first joint observation history for time-step t. More... | |
std::vector< std::vector < LIndex > > | _m_firstOHIforT |
The _m_firstOHIforT[aI][t] contains the first observation history for time-step t of agent aI. More... | |
std::vector< double > | _m_jaohConditionalProbs |
Stores the conditional probability of this joint belief. More... | |
std::vector< double > | _m_jaohProbs |
Caches the probabilities of JointActionObservationHistory's (assuming b^0 is as specified by the problem and that a pure joint policy consistent with the i-th JointActionObservationHistory is followed). More... | |
std::vector< const JointBeliefInterface * > | _m_jBeliefCache |
_m_jBeliefCache[i] stores a pointer to the joint belief corresponding to the i-th JointActionObservationHistory (assuming b^0 is as specified by the problem and that a pure joint policy consistent with the i-th JointActionObservationHistory is followed) More... | |
JointActionHistoryTree * | _m_jointActionHistoryTreeRoot |
The root node of the joint action histories tree. More... | |
std::vector < JointActionHistoryTree * > | _m_jointActionHistoryTreeVector |
A vector which stores a JointActionHistoryTree pointer. More... | |
std::map< LIndex, JointActionObservationHistoryTree * > | _m_jointActionObservationHistoryTreeMap |
A map which is used instead of _m_jointActionObservationHistoryTreeVector when we don't cache all JointActionObservationHistoryTree's. More... | |
JointActionObservationHistoryTree * | _m_jointActionObservationHistoryTreeRoot |
The root node of the joint actionObservation histories tree. More... | |
std::vector < JointActionObservationHistoryTree * > | _m_jointActionObservationHistoryTreeVector |
A vector which stores JointActionObservationHistoryTree pointer. More... | |
JointObservationHistoryTree * | _m_jointObservationHistoryTreeRoot |
The root node of the joint observation histories tree. More... | |
std::vector < JointObservationHistoryTree * > | _m_jointObservationHistoryTreeVector |
A vector which stores a JointObservationHistoryTree pointer. More... | |
std::vector< size_t > | _m_nrActionHistories |
A vector that keeps track of the number of action histories per agent. More... | |
std::vector< std::vector < size_t > > | _m_nrActionHistoriesT |
A vector that keeps track of the number of action histories per agent per time step. More... | |
std::vector< size_t > | _m_nrActionObservationHistories |
A vector that keeps track of the number of action-obs. More... | |
std::vector< std::vector < size_t > > | _m_nrActionObservationHistoriesT |
Keeps track of the number of action-obs. More... | |
size_t | _m_nrJointActionHistories |
The number of joint action histories. More... | |
std::vector< size_t > | _m_nrJointActionHistoriesT |
The number of joint action histories per time-step. More... | |
size_t | _m_nrJointActionObservationHistories |
The number of joint actionAction histories. More... | |
std::vector< size_t > | _m_nrJointActionObservationHistoriesT |
The number of joint actionObservation histories per time-step. More... | |
size_t | _m_nrJointObservationHistories |
The number of joint observation histories. More... | |
std::vector< size_t > | _m_nrJointObservationHistoriesT |
The number of joint observation histories per time-step. More... | |
std::vector< size_t > | _m_nrObservationHistories |
A vector that keeps track of the number of observation histories per agent. More... | |
std::vector< std::vector < size_t > > | _m_nrObservationHistoriesT |
Keeps track of the number of observation histories per agent per time step. More... | |
std::vector < ObservationHistoryTree * > | _m_observationHistoryTreeRootPointers |
A vector that stores pointers to the roots of the observation history trees of each agent. More... | |
std::vector< std::vector < ObservationHistoryTree * > > | _m_observationHistoryTreeVectors |
A vector which, for each agents, stores a vector with all ObservationHistoryTree pointers. More... | |
DICEPSPlanner implements the Direct Cross-Entropy Policy Search method.
The algorithm is described in refDICEPS (see DOC-References.h).
DICEPSPlanner::DICEPSPlanner | ( | size_t | horizon, |
DecPOMDPDiscreteInterface * | p, | ||
size_t | nrRestarts, | ||
size_t | nrIterations, | ||
size_t | nrSamples, | ||
size_t | nrSamplesForUpdate, | ||
bool | use_hard_threshold, | ||
double | CEalpha, | ||
size_t | nrEvalRuns, | ||
const PlanningUnitMADPDiscreteParameters * | params = 0 , |
||
bool | convergenceStats = false , |
||
std::ofstream * | convergenceStatsFile = 0 , |
||
int | verbose = 0 |
||
) |
Constructor.
References _m_alpha, _m_nrEvalRuns, _m_nrIterations, _m_nrJointPoliciesForUpdate, _m_nrRestarts, _m_nrSampledJointPolicies, _m_outputConvergenceFile, _m_outputConvergenceStatistics, _m_use_gamma, and _m_verbose.
|
protected |
References SimulationResult::GetAvgReward(), SimulationDecPOMDPDiscrete::RunSimulations(), TimedAlgorithm::StartTimer(), and TimedAlgorithm::StopTimer().
Referenced by Plan().
|
inlinevirtual |
Returns the expected reward of the best found joint policy.
Implements PlanningUnitDecPOMDPDiscrete.
|
inlinevirtual |
Returns the found joint policy.
Reimplemented from PlanningUnitDecPOMDPDiscrete.
|
inline |
|
inlinevirtual |
Returns the found joint policy.
Reimplemented from PlanningUnitDecPOMDPDiscrete.
|
staticprotected |
References JointPolicyValuePair::GetValue().
Referenced by Plan().
|
virtual |
The methods that performs the planning according to the CE for Dec-POMDP algorithm.
Implements PlanningUnit.
References _m_expectedRewardFoundPolicy, _m_foundPolicy, _m_nrEvalRuns, _m_nrIterations, _m_nrJointPoliciesForUpdate, _m_nrRestarts, _m_nrSampledJointPolicies, _m_outputConvergenceStatistics, _m_use_gamma, _m_verbose, ApproximateEvaluate(), ValueFunctionDecPOMDPDiscrete::CalculateV(), PlanningUnitDecPOMDPDiscrete::GetDPOMDPD(), MultiAgentDecisionProcessDiscreteInterface::GetNrActions(), MultiAgentDecisionProcessInterface::GetNrAgents(), PlanningUnitMADPDiscrete::GetNrJointObservationHistories(), PlanningUnitMADPDiscrete::GetNrObservationHistories(), PlanningUnitMADPDiscrete::GetNrStates(), JointPolicyValuePair::GetValue(), PolicyGlobals::OHIST_INDEX, OrderedInsertJPPVValuePair(), PrintBestSamples(), PrintTools::PrintVectorCout(), SampleIndividualPolicy(), TimedAlgorithm::StartTimer(), TimedAlgorithm::StopTimer(), and UpdateCEProbDistribution().
|
staticprotected |
References JointPolicyValuePair::GetValue().
Referenced by Plan().
|
staticprotected |
References PolicyPureVector::SetAction().
Referenced by Plan().
|
protected |
|
private |
Referenced by DICEPSPlanner(), and UpdateCEProbDistribution().
|
private |
Referenced by Plan().
|
private |
Referenced by Plan().
|
private |
Referenced by DICEPSPlanner(), and Plan().
|
private |
Referenced by DICEPSPlanner(), and Plan().
|
private |
Referenced by DICEPSPlanner(), and Plan().
|
private |
Referenced by DICEPSPlanner(), and Plan().
|
private |
Referenced by DICEPSPlanner(), and Plan().
|
private |
Referenced by DICEPSPlanner().
|
private |
Referenced by DICEPSPlanner(), and Plan().
|
private |
Referenced by DICEPSPlanner(), and Plan().
|
private |
Referenced by DICEPSPlanner(), and Plan().