Z's learning note

Search

>

>

Machine_learning_general

>

>

reward_shaping

Dec 31, 2020, 3 min read

#rl
#rewardshaping

Reward shaping

tag:#rl #rewardshaping

1. [80%]Policy invariance under reward transformations: theory and application to reward shaping

Andrew Ng et al., ICML 1999

强化学习reward shaping推导和理解
强化学习之reward shaping有关论文简述
Reward shaping WILL NOT change the optimal policy! 不改变Policy
Reward shaping helps to converge to the optimal policy faster. 只是帮助更快的向optimal policy收敛
What if the results of with/without reward shaping different? There are multiple optimal policies in the question. Reward shaping helps to converge to one faster than others. The output policies are equivalent.

1.1. How to shape the shape the reward

, is difference of potentials
, is real-valued function
Equation (4) suggests that might be a good shaping potential

1.2. How to define ?

E.g.:
- In a grid world problem, Manhattan distance might be a good choice
- Understanding: human being tells the agent that moving in the directions that toward the goal may lead to the end state faster. By providing such a potential function, the agent has less chance to search directions that against human suggestion. Therefore, the agent reach the goal faster. 通过提供potential function，向agent提供一个大致的搜索方向，potential function指出的方向将会得到更高的搜索权重。

1.3. Code, howto

Use the equation , , and .
Define your own according to the question.
E.g., for value iteration, replace the reward by using above-mentioned equations while calculating the value table (compute the value function).
Question: do you need to apply the as well while extracting the policy from given value table?

2. Auto matic shaping and decomposition of reward functions

Reward shaping
1. [80%]Policy invariance under reward transformations: theory and application to reward shaping
1.1. How to shape the shape the reward
1.2. How to define V_M^∗(s)?
1.3. Code, howto
2. Auto matic shaping and decomposition of reward functions

Backlinks

No backlinks found

Created with Quartz v4.1.0, © 2024

GitHub
Discord Community