Control and reinforcement learning

1. Bellman equation Hamilton-Jacobi-Bellman equation, and Lyapunov stability

Hamilton-Jacobi-Bellman equation: continuous-time dynamic programming
Bellman equation for value iteration (Sutton’s book)
- 对于状态转移非概率，的情况，有
- 若将离散action 变为连续控制，将离散的reward 替换为连续的cost （令，奖励越高越好，代价越低越好，若从开始，则两个邻接状态之间的cost为），cost-to-go表示为，则可将上式写为（参考LaValle的 Planning Algorithm p872）：
- 连续系统，对应离散系统
- 足够小，应用泰勒展开，忽略一阶导之后的项
  结果：
- 对存在概率状态转移，有
- 对G is time-dependent，有
与Lyapunov的关系？知乎:
- 由上节
- Lyapunov稳定性的要求
若令为，那么，并且存在，又知，则有（系统），即。同时作为cost-to-go，（）在稳定点时不需要cost，并且cost恒大于0，所以满足三条条件。区别是HJB找出下降最快的方向，Lyapunov只要保证下降即可。