API

  • 需要引用boost库

  • 使用了引用计数指针boost::shared_ptr

    • 与boost::shared_ptr类似的有std::shared_ptr和std::tr1::shared_ptr std::shared_ptr和std::tr1:shared_ptr出现在boost之后,boost库中的实现更丰富一些 tr1:是C++ Technical Report 1 namespace的缩写,介于C++03和C++11标准之间,之后被归并到C++11标准中

      智能指针,它们功能一样,都有引用计数的功能,即智能指针被拷贝后会增加引用计数,这样就不用担心之前的智能指针销毁资源而导致拷贝后的智能指针不能用。

  • namespace为Gym

  • Space.sample()

  • class environment

    • action_space()
    • observation_space()
    • observation_parse()
    • reset()
    • step()
    • monitor_start()
    • monitor_stop()
#ifndef __GYM_H__
#define __GYM_H__
// Caffe uses boost::shared_ptr (as opposed to std::shared_ptr), so do we.
#include <boost/shared_ptr.hpp>
#include <vector>

namespace Gym {

struct Space {
    enum SpaceType {
        DISCRETE,
        BOX,
    } type;

    std::vector<float> sample();  // Random vector that belong to this space

    std::vector<int>   box_shape; // Similar to Caffe blob shape, for example { 64, 96, 3 } for 96x64 rgb image.
    std::vector<float> box_high;
    std::vector<float> box_low;

    int discreet_n;
};

struct State {
    std::vector<float> observation; // get observation_space() to make sense of this data
    float reward;
    bool done;
    std::string info;
};

class Environment {
public:
    virtual boost::shared_ptr<Space> action_space() =0;
    virtual boost::shared_ptr<Space> observation_space() =0;

    virtual void reset(State* save_initial_state_here) =0;

    virtual void step(const std::vector<float>& action, bool render, State* save_state_here) =0;

    virtual void monitor_start(const std::string& directory, bool force, bool resume) =0;
    virtual void monitor_stop() =0;
};

class Client {
public:
    virtual boost::shared_ptr<Environment> make(const std::string& name) =0;
};

extern boost::shared_ptr<Client> client_create(const std::string& addr, int port);

} // namespace

#endif // __GYM_H__

Cartpole

Source code @ Github

1. 输入,输出与模型

1.1. state, reward, done, {} = step(self, action)

  • Action:

    NumAction
    0Push cart to the left
    1Push car to the right
  • Observation (state):

    NumObservationMinMax
    0Cart Position-4.84.8
    1Cart Velocity-InfInf
    2Pole Angle-24 deg24 deg
    3Pole Velocity At Tip-InfInf
  • Reward: 1 for every step taken, including the termination step.

  • Done:

    超出上下界时返回True。steps_beyond_done用于处理temination step的reward(None:pole未倒或处在termination step;>=0:中止)。

1.2. Model:

  • Variables

    Varvalue
    Gravity 9.8
    Cart mass 1.0
    Pole mass 0.1
    Pole’s length 0.5
    Second between state updates 0.2
    Force 10(right) or -10(left)
  • Kinematic model

  • Kinematics integrator:

2. 初始化

每次随机初始化所有状态在区间


Pendulum

Source code @ Github

1. 输入,输出与模型

1.1. state, reward, done, {} = step(self, action)

  • Action: max_torqe = 2.

    NumAction
    0[-max_torque, max_torque]
  • Observation (state):

    NumObservationMinMax
    0-11
    1-11
    2-88
  • Reward: , calculated by is the input torque , vertical up is 0.

  • Done: Always False

1.2. Model:

  • Variables

    Varvalue
    Gravity 10.0
    Pendulum mass 1
    Pendulum’s length 1
    Second between state updates 0.05
    Input torque Clampped to ensure the input is between
    Angular velocity Clampped to
  • Kinematic model

  • Kinematics integrator:

  • Normalize

2. 初始化

每次随机初始化


Frozenlake

Source code @ Github Wiki

1. 两个版本(尺寸)

  • 4x4大小 env = gym.make(‘FrozenLake-v0’).unwrapped 从S出发到G。
    S(0)F(1)F(2)F(3)
    F(4)H(5)F(6)H(7)
    F(8)F(9)F(10)H(11)
    H(12)F(13)F(14)G(15)
  • 8x8大小 env = gym.make(‘FrozenLake8x8-v0’).unwrapped

2. S, A, R

  • Actions

    NumAction
    0Move Left
    1Move Down
    2Move Right
    3Move Up
  • Reward 0 for each step, 0 for falling in the hole, 1 for reaching the final goal

3. env.P[s][a]

  • env.P[s][a] = [probability, next_state, reward, teminal]
    • 处于掉进洞中的的state时(如11),p永远为1,下一状态不变(如仍为11),reward为0,terminal为true
    • 处于终点时(15),p永远为1,下一状态不变(仍未15),reward为0,terminal为true

4. env.step

  • Gym general return: (next_state, reward, done, prob) 下一个状态,reward, done,和详细信息

开发

创建新gym的基本文件框架

  • 结构框图

新gym的environment框架

  • def __init__(self)
  • def reset(self)
  • def step(self, action)
  • def render(self, mode="human")

gym的渲染:

  • classic control和box2d的env采用classic_control/rendering.py进行渲染,使用的是pyglet的API
    • class Viewer
      • get_array 获取当前图片?

与gym进行交互