API
-
需要引用boost库
-
使用了引用计数指针boost::shared_ptr
- 与boost::shared_ptr类似的有std::shared_ptr和std::tr1::shared_ptr
std::shared_ptr和std::tr1:shared_ptr出现在boost之后,boost库中的实现更丰富一些
tr1:是C++ Technical Report 1 namespace的缩写,介于C++03和C++11标准之间,之后被归并到C++11标准中
智能指针,它们功能一样,都有引用计数的功能,即智能指针被拷贝后会增加引用计数,这样就不用担心之前的智能指针销毁资源而导致拷贝后的智能指针不能用。
- 与boost::shared_ptr类似的有std::shared_ptr和std::tr1::shared_ptr
std::shared_ptr和std::tr1:shared_ptr出现在boost之后,boost库中的实现更丰富一些
tr1:是C++ Technical Report 1 namespace的缩写,介于C++03和C++11标准之间,之后被归并到C++11标准中
-
namespace为Gym
-
Space.sample()
-
class environment
- action_space()
- observation_space()
- observation_parse()
- reset()
- step()
- monitor_start()
- monitor_stop()
#ifndef __GYM_H__
#define __GYM_H__
// Caffe uses boost::shared_ptr (as opposed to std::shared_ptr), so do we.
#include <boost/shared_ptr.hpp>
#include <vector>
namespace Gym {
struct Space {
enum SpaceType {
DISCRETE,
BOX,
} type;
std::vector<float> sample(); // Random vector that belong to this space
std::vector<int> box_shape; // Similar to Caffe blob shape, for example { 64, 96, 3 } for 96x64 rgb image.
std::vector<float> box_high;
std::vector<float> box_low;
int discreet_n;
};
struct State {
std::vector<float> observation; // get observation_space() to make sense of this data
float reward;
bool done;
std::string info;
};
class Environment {
public:
virtual boost::shared_ptr<Space> action_space() =0;
virtual boost::shared_ptr<Space> observation_space() =0;
virtual void reset(State* save_initial_state_here) =0;
virtual void step(const std::vector<float>& action, bool render, State* save_state_here) =0;
virtual void monitor_start(const std::string& directory, bool force, bool resume) =0;
virtual void monitor_stop() =0;
};
class Client {
public:
virtual boost::shared_ptr<Environment> make(const std::string& name) =0;
};
extern boost::shared_ptr<Client> client_create(const std::string& addr, int port);
} // namespace
#endif // __GYM_H__
Cartpole
1. 输入,输出与模型
1.1. state, reward, done, {} = step(self, action)
-
Action:
Num Action 0 Push cart to the left 1 Push car to the right -
Observation (state):
Num Observation Min Max 0 Cart Position -4.8 4.8 1 Cart Velocity -Inf Inf 2 Pole Angle -24 deg 24 deg 3 Pole Velocity At Tip -Inf Inf -
Reward: 1 for every step taken, including the termination step.
-
Done:
或 超出上下界时返回True。steps_beyond_done用于处理temination step的reward(None:pole未倒或处在termination step;>=0:中止)。
1.2. Model:
-
Variables
Var value Gravity 9.8 Cart mass 1.0 Pole mass 0.1 Pole’s length 0.5 Second between state updates 0.2 Force 10(right) or -10(left) -
Kinematic model
-
Kinematics integrator:
2. 初始化
每次随机初始化所有状态在区间
Pendulum
1. 输入,输出与模型
1.1. state, reward, done, {} = step(self, action)
-
Action:
max_torqe = 2.
Num Action 0 [-max_torque, max_torque] -
Observation (state):
Num Observation Min Max 0 -1 1 1 -1 1 2 -8 8 -
Reward:
, calculated by is the input torque , vertical up is 0. -
Done: Always False
1.2. Model:
-
Variables
Var value Gravity 10.0 Pendulum mass 1 Pendulum’s length 1 Second between state updates 0.05 Input torque Clampped to ensure the input is between Angular velocity Clampped to -
Kinematic model
-
Kinematics integrator:
-
Normalize
2. 初始化
每次随机初始化
Frozenlake
1. 两个版本(尺寸)
- 4x4大小
env = gym.make(‘FrozenLake-v0’).unwrapped
从S出发到G。
S(0) F(1) F(2) F(3) F(4) H(5) F(6) H(7) F(8) F(9) F(10) H(11) H(12) F(13) F(14) G(15) - 8x8大小 env = gym.make(‘FrozenLake8x8-v0’).unwrapped
2. S, A, R
-
Actions
Num Action 0 Move Left 1 Move Down 2 Move Right 3 Move Up -
Reward 0 for each step, 0 for falling in the hole, 1 for reaching the final goal
3. env.P[s][a]
- env.P[s][a] = [probability, next_state, reward, teminal]
- 处于掉进洞中的的state时(如11),p永远为1,下一状态不变(如仍为11),reward为0,terminal为true
- 处于终点时(15),p永远为1,下一状态不变(仍未15),reward为0,terminal为true
4. env.step
- Gym general return: (next_state, reward, done, prob) 下一个状态,reward, done,和详细信息
开发
创建新gym的基本文件框架
- 结构框图
新gym的environment框架
def __init__(self)
def reset(self)
def step(self, action)
def render(self, mode="human")
gym的渲染:
- classic control和box2d的env采用classic_control/rendering.py进行渲染,使用的是pyglet的API
- class Viewer
- get_array 获取当前图片?
- class Viewer
与gym进行交互
-
调用utils.play https://github.com/openai/gym/blob/master/gym/utils/play.py
env = gym.make('Breakout-v0') gym.utils.play.play(env, zoom=3)
-
绑定按键与action
-
Restart is
enter
, and pause isspace