Automatic construction of temporally extended actions for MDPs using bisimulation metrics

  • Tags:#mdp
  • Pablo Samuel Castro et al. @ EWRL 2011
  • Option framework
      • is the set of states where hte option is available
      • is the option’s policy
      • is the probability of the option terminating at each state.
      • An option is started in state the policy is followed until the option is terminated, as dictated by
  • Bisimulation metrics

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning


Binary action search for learning continuous-action control policies

  • discretizing the continuous action.

UTree algorithm (McCallum 1995)

  • Find state abstractions from sample interactions with the environment, focus directly on modeling the value function

Controlled Markov Process (CMP) homomorphisms

  • CMP is an MDP without the latter’s reward function.

MDP homomorphism (Ravindran 2004)

  • Mapping

Macro-actions, model MDPs at multiple time scales (old materials):

  • Theoretical results on reinforcement learning with temporally abstract behaviors ---- check this paper for macro-actions
  • TD models: modeling the world at a mixture of time scales.
  • Finding structure in reinforcement learning

Hierachical solution of Markov Decision Processes using macro-actions

  • Milos Hauskrecht et al. @ UAI 1998
  • Focus on how to constructing macro-actions automatically
  • A macro-action is a local policy defined for a particular region