openai

API

需要引用boost库
使用了引用计数指针boost::shared_ptr
- 与boost::shared_ptr类似的有std::shared_ptr和std::tr1::shared_ptr std::shared_ptr和std::tr1:shared_ptr出现在boost之后，boost库中的实现更丰富一些 tr1：是C++ Technical Report 1 namespace的缩写，介于C++03和C++11标准之间，之后被归并到C++11标准中
  
  智能指针，它们功能一样，都有引用计数的功能，即智能指针被拷贝后会增加引用计数，这样就不用担心之前的智能指针销毁资源而导致拷贝后的智能指针不能用。
namespace为Gym
Space.sample()
class environment
- action_space()
- observation_space()
- observation_parse()
- reset()
- step()
- monitor_start()
- monitor_stop()

#ifndef __GYM_H__
#define __GYM_H__
// Caffe uses boost::shared_ptr (as opposed to std::shared_ptr), so do we.
#include <boost/shared_ptr.hpp>
#include <vector>

namespace Gym {

struct Space {
    enum SpaceType {
        DISCRETE,
        BOX,
    } type;

    std::vector<float> sample();  // Random vector that belong to this space

    std::vector<int>   box_shape; // Similar to Caffe blob shape, for example { 64, 96, 3 } for 96x64 rgb image.
    std::vector<float> box_high;
    std::vector<float> box_low;

    int discreet_n;
};

struct State {
    std::vector<float> observation; // get observation_space() to make sense of this data
    float reward;
    bool done;
    std::string info;
};

class Environment {
public:
    virtual boost::shared_ptr<Space> action_space() =0;
    virtual boost::shared_ptr<Space> observation_space() =0;

    virtual void reset(State* save_initial_state_here) =0;

    virtual void step(const std::vector<float>& action, bool render, State* save_state_here) =0;

    virtual void monitor_start(const std::string& directory, bool force, bool resume) =0;
    virtual void monitor_stop() =0;
};

class Client {
public:
    virtual boost::shared_ptr<Environment> make(const std::string& name) =0;
};

extern boost::shared_ptr<Client> client_create(const std::string& addr, int port);

} // namespace

#endif // __GYM_H__

Cartpole

Source code @ Github

1. 输入，输出与模型

1.1. state, reward, done, {} = step(self, action)

Action:

Num Action
0 Push cart to the left
1 Push car to the right
Observation (state):

Num Observation Min Max
0 Cart Position -4.8 4.8
1 Cart Velocity -Inf Inf
2 Pole Angle -24 deg 24 deg
3 Pole Velocity At Tip -Inf Inf
Reward: 1 for every step taken, including the termination step.
Done:

或超出上下界时返回True。steps_beyond_done用于处理temination step的reward（None：pole未倒或处在termination step；>=0：中止）。

Num	Action
0	Push cart to the left
1	Push car to the right

Num	Observation	Min	Max
0	Cart Position	-4.8	4.8
1	Cart Velocity	-Inf	Inf
2	Pole Angle	-24 deg	24 deg
3	Pole Velocity At Tip	-Inf	Inf

1.2. Model:

Variables

Var value
Gravity 9.8
Cart mass 1.0
Pole mass 0.1
Pole’s length 0.5
Second between state updates 0.2
Force 10(right) or -10(left)
Kinematic model
Kinematics integrator:

Var	value
Gravity	9.8
Cart mass	1.0
Pole mass	0.1
Pole’s length	0.5
Second between state updates	0.2
Force	10(right) or -10(left)

2. 初始化

每次随机初始化所有状态在区间。

Pendulum

Source code @ Github

1. 输入，输出与模型

1.1. state, reward, done, {} = step(self, action)

Action: max_torqe = 2.

Num Action
0 [-max_torque, max_torque]
Observation (state):

Num Observation Min Max
0 -1 1
1 -1 1
2 -8 8
Reward: , calculated by is the input torque , vertical up is 0.
Done: Always False

Num	Action
0	[-max_torque, max_torque]

Num	Min	Max
0	-1	1
1	-1	1
2	-8	8

1.2. Model:

Variables

Var value
Gravity 10.0
Pendulum mass 1
Pendulum’s length 1
Second between state updates 0.05
Input torque Clampped to ensure the input is between
Angular velocity Clampped to
Kinematic model
Kinematics integrator:
Normalize

Var	value
Gravity	10.0
Pendulum mass	1
Pendulum’s length	1
Second between state updates	0.05
Input torque	Clampped to ensure the input is between
Angular velocity	Clampped to

2. 初始化

每次随机初始化，。

Frozenlake

Source code @ Github Wiki

1. 两个版本（尺寸）

4x4大小 env = gym.make(‘FrozenLake-v0’).unwrapped 从S出发到G。
S(0) F(1) F(2) F(3)
F(4) H(5) F(6) H(7)
F(8) F(9) F(10) H(11)
H(12) F(13) F(14) G(15)
8x8大小 env = gym.make(‘FrozenLake8x8-v0’).unwrapped

2. S, A, R

Actions

Num Action
0 Move Left
1 Move Down
2 Move Right
3 Move Up
Reward 0 for each step, 0 for falling in the hole, 1 for reaching the final goal

Num	Action
0	Move Left
1	Move Down
2	Move Right
3	Move Up

3. env.P[s][a]

env.P[s][a] = [probability, next_state, reward, teminal]
- 处于掉进洞中的的state时（如11），p永远为1，下一状态不变（如仍为11），reward为0，terminal为true
- 处于终点时(15)，p永远为1，下一状态不变（仍未15），reward为0，terminal为true

4. env.step

Gym general return: (next_state, reward, done, prob) 下一个状态，reward， done，和详细信息

开发

创建新gym的基本文件框架

结构框图

新gym的environment框架

def __init__(self)
def reset(self)
def step(self, action)
def render(self, mode="human")

gym的渲染：

classic control和box2d的env采用classic_control/rendering.py进行渲染，使用的是pyglet的API
- class Viewer
  - get_array 获取当前图片？

与gym进行交互

调用utils.play https://github.com/openai/gym/blob/master/gym/utils/play.py
```
env = gym.make('Breakout-v0')
gym.utils.play.play(env, zoom=3)
```
绑定按键与action
Restart is enter, and pause is space

Z's learning note

Explorer

openai

API

Cartpole

1. 输入，输出与模型

1.1. state, reward, done, {} = step(self, action)

1.2. Model:

2. 初始化

Pendulum

1. 输入，输出与模型

1.1. state, reward, done, {} = step(self, action)

1.2. Model:

2. 初始化

Frozenlake

1. 两个版本（尺寸）

2. S, A, R

3. env.P[s][a]

4. env.step

开发

创建新gym的基本文件框架

新gym的environment框架

gym的渲染：

与gym进行交互

Table of Contents

Backlinks

S(0)	F(1)	F(2)	F(3)
F(4)	H(5)	F(6)	H(7)
F(8)	F(9)	F(10)	H(11)
H(12)	F(13)	F(14)	G(15)

Z's learning note

Explorer

openai

API §

Cartpole §

1. 输入，输出与模型 §

1.1. state, reward, done, {} = step(self, action) §

1.2. Model: §

2. 初始化 §

Pendulum §

1. 输入，输出与模型 §

1.1. state, reward, done, {} = step(self, action) §

1.2. Model: §

2. 初始化 §

Frozenlake §

1. 两个版本（尺寸） §

2. S, A, R §

3. env.P[s][a] §

4. env.step §

开发 §

创建新gym的基本文件框架 §

新gym的environment框架 §

gym的渲染： §

与gym进行交互 §

Table of Contents

Backlinks

API

Cartpole

1. 输入，输出与模型

1.1. state, reward, done, {} = step(self, action)

1.2. Model:

2. 初始化

Pendulum

1. 输入，输出与模型

1.1. state, reward, done, {} = step(self, action)

1.2. Model:

2. 初始化

Frozenlake

1. 两个版本（尺寸）

2. S, A, R

3. env.P[s][a]

4. env.step

开发

创建新gym的基本文件框架

新gym的environment框架

gym的渲染：

与gym进行交互