Reinforcement learning methods/theory

Reinforcement Learning for MDPs with Constraints

Peter Geibel, ECML 2006

Terms:
- : a finite state set
- : a finite action set
- : state transition probabilities
- : reward obtained
- : value of a policy
- : an additional second reward function
- : constrained value function
- CMDP: constrained MDP

主要工作：
- consider MDP with two criteria/two kinds of constraints
  1. CMDP (constrained Markov Decision Process): expected value of an infinite horizon cumulative return Have a constraint on the second criterion function itself, i.e. on the expected value of the return
  2. CPMDP (constrained probability of constraint violation): 是否服从inequality constraint, or there is maximum allowable probability that the single returns violate the constraints constrain the probability that the return, considered a random variable, violates a constraint. MDPs with constrained probability of constraint violation
- 3种解决这种控制问题的reinforcement learning方法
  - LinMDP: Linear programming CMDP：将unconstrained MDP通过额外的constraint expressing 增加约束（即要求按照某一policy时最终结果大于某一值） CPMDP的解决方案：将CPMDP转化为CMDP（通过映射）。缺陷：LinMDP might be suboptimal for solving CPMDP
    - 解决方案：vary the
  - WeiMDP: A weighted approach
    - Expressed the probability of entering an undesirable state as an (undiscounted) second value function
    - 引入权重, weighted reward function
    - 可使用Q-learning求解
  - AugMDP: State space extension
    - 当进入某个状态后accumlated costs低于时，施加一个an additional negative reward 。这个high absolute values of 将防止进入这个状态。
    - 局限性：要求问题的maximum cost有上限
  - RecMDP: Recursive reformulation of the constraint
    - Develop a new value function

Rainbow: Combining Improvements in Deep Reinforcement Learning

Matteo Hessel et al., AAAI 2018

DQN collection

Extensions	Problem to solve	Method
Double DQN (DDQN)	overestimation bias, due to the maximization step in	decoupling, select the action from its evaluation:
A3C		learning form multi-step bootstrap targets
Distributional Q-learning
Noisy DQN
Prioritized DDQN
Dueling DDQN

Combine all 6 extensions. What is the contribution of each components

Z's learning note

Explorer

rl_methods

Reinforcement learning methods/theory

Reinforcement Learning for MDPs with Constraints

Rainbow: Combining Improvements in Deep Reinforcement Learning

Playing Atari with deep reinforcement learning

Volodymyr Mnih et al., NIPS Deep Learning Workshop 2013

Table of Contents

Backlinks

Z's learning note

Explorer

rl_methods

Reinforcement learning methods/theory §

Reinforcement Learning for MDPs with Constraints §

Rainbow: Combining Improvements in Deep Reinforcement Learning §

Playing Atari with deep reinforcement learning §

Volodymyr Mnih et al., NIPS Deep Learning Workshop 2013 §

Table of Contents

Backlinks

Reinforcement learning methods/theory

Reinforcement Learning for MDPs with Constraints

Rainbow: Combining Improvements in Deep Reinforcement Learning

Playing Atari with deep reinforcement learning

Volodymyr Mnih et al., NIPS Deep Learning Workshop 2013