- Tags:#mdp
- Pablo Samuel Castro et al. @ EWRL 2011
- Option framework
-
- is the set of states where hte option is available
- is the option’s policy
- is the probability of the option terminating at each state.
- An option is started in state the policy is followed until the option is terminated, as dictated by
- Bisimulation metrics
- discretizing the continuous action.
UTree algorithm (McCallum 1995) §
- Find state abstractions from sample interactions with the environment, focus directly on modeling the value function
Controlled Markov Process (CMP) homomorphisms §
- CMP is an MDP without the latter’s reward function.
MDP homomorphism (Ravindran 2004) §
Macro-actions, model MDPs at multiple time scales (old materials): §
- Theoretical results on reinforcement learning with temporally abstract behaviors ---- check this paper for macro-actions
- TD models: modeling the world at a mixture of time scales.
- Finding structure in reinforcement learning
- Milos Hauskrecht et al. @ UAI 1998
- Focus on how to constructing macro-actions automatically
- A macro-action is a local policy defined for a particular region