Flappy bird played by MPC controller.

1. Ben and Philip’s code:

  • Ben’s blog and Philip’s blog
  • 核心代码为 mip.py
  • 运行配置: python 3.7, cvxpy, gurobi(申请免费license,运行其安装后目录下的pysetup.bat配置python API)。否则prob.solve需要去掉sovler=“GUROBI”参数。 pygame中在左上角,正半轴水平向右,正半轴竖直向下
path = cvx.Variable((N, 2)) # initialize the y pos and y velocity
flap = cvx.Variable(N-1, boolean=True) # initialize the inputs, whether or not the bird should flap in each step

path是实际受控制器影响的状态量,为(竖直方向位置和速度)。flap是控制量,在每个时间点是否向上飞。系统模型

PIPEGAPSIZE  = 100 # gap between upper and lower pipe
PIPEWIDTH = 52
BIRDWIDTH = 34
BIRDHEIGHT = 24
BIRDDIAMETER = np.sqrt(BIRDHEIGHT**2 + BIRDWIDTH**2) # the bird rotates in the game, so we use it's maximum extent

上下管道之间的距离,管道宽度均为常数

def solve(playery, playerVelY, lowerPipes):
    pipeVelX = -4 # speed in x
    playerAccY    =   1   # players downward accleration
    playerFlapAcc =  -14   # players speed on flapping

因为鸟与管道是相对运动,所以pipeVelX也就是鸟前进的速度(常数) lowerpipes: 下方管道的上界(下方红色十字),因为上下管道距离为常数所以上方管道位置(蓝十字)也已知

y = path[:,0]
vy = path[:,1]
 
c = [] # init constraint list
c += [y <= GROUND, y >= SKY] # constraints for sky and ground
c += [y[0] == playery, vy[0] == playerVelY] # initial conditions

普适的contstraints包括:鸟不能越过上下边界,及时刻的位置和速度(给定)

 for t in range(N-1): # look ahead
    dt = t//15 + 1 # let time get coarser further in the look ahead
    x -= dt * pipeVelX # update x
    xs += [x] # add to list
    c += [vy[t + 1] ==  vy[t] + playerAccY * dt + playerFlapAcc * flap[t] ] # add y velocity constraint, f=ma
    c += [y[t + 1] ==  y[t] + vy[t + 1]*dt ] # add y constraint, dy/dt = a
    pipe_c, dist = getPipeConstraintsDistance(x, y[t+1], lowerPipes) # add pipe constraints
    c += pipe_c
    obj += dist

前面的dt较小,后面的dt较大。在前面进行相对密集的采样保持估计的精度,后面降低采样率扩展整个估计的长度,作者在blog里进行了解释。

This technique works pretty well. It doesn’t quite run in real time with the lookahead set to a distance that allows it to succeed. We used a neat trick to improve the speed and look ahead distance. The model’s time step increases with look ahead time. In other words, the model is precise for its first few time steps, and gets less careful later in its prediction. The thinking is that this allows it to make approximate long term plans about jump timing without over-taxing the solver.

为匀速,由上一时刻的,重力,及是否向上飞决定(不太确定为什么playerFlapAcc没有乘以dt,但是测试时发现如果乘了的话效果不好)。。这些均以constraints的形式表示(优化时需要先满足/计算这些参数)

def getPipeConstraintsDistance(x, y, lowerPipes):
    constraints = [] # init pipe constraint list
    pipe_dist = 0 # init dist from pipe center
    for pipe in lowerPipes:
        dist_from_front = pipe['x'] - x - BIRDDIAMETER
        dist_from_back = pipe['x'] - x + PIPEWIDTH
        if (dist_from_front < 0) and (dist_from_back > 0):
            constraints += [y <= (pipe['y'] - BIRDDIAMETER)] # y above lower pipe
            constraints += [y >= (pipe['y'] - PIPEGAPSIZE)] # y below upper pipe
            pipe_dist += cvx.abs(pipe['y'] - (PIPEGAPSIZE//2) - (BIRDDIAMETER//2) - y) # add distance from center
    return constraints, pipe_dist

此外还需满足的约束为鸟在管道间时不会撞到上下管。并计算鸟的中心到管开口中心的距离

objective = cvx.Minimize(cvx.sum(cvx.abs(vy)) + 100* obj)

objective方程为在所有采样点的y方向速度之和及鸟中心距管开口中心距离之和(后者更重要一些) 在满足上述约束的前提下优化objective,得出动作序列,取第一个采样点的动作,完成一次MPC的过程。