Model predictive control

tag:#mpc#rl


Types of MPC

MPC typeDetail
Linear MPC
None-linear MPC
Economic MPC
Tube MPCcheck paper Robust model predictive control using tubes
MPC with RL

Key components:

ItemDetail
系统模型System modeldescribe the relationship between input and states
cost functiona function take input and states to get a cost,包括running cost(每一步的cost)和terminal cost(最终cost)
限制条件 Constraintsthe input and states should be within a range
优化器 Optimizertune the input within all constraints to get the minimum (maximum) cost.

Process

  • The tuned input is a sequence, which predict into the future. But the system only take the first input for the next time step, measure new system states, and repeat the prediction to generate a new input.

Characteristics:

  • Main drawabck: need of identifying the open-loop model offline

Linear MPC (CDC’19 workshop, slides 1)

Linear predict model:

    Eq. of LMPC

  • 与线性控制模型的关系: 若把线性控制模型写为,那么下一状态为

Performance index (without constraint)

  • Performance index defined as:

  • Optimization:

    Derivation

    ,其中 为初始状态,为给定值,为给定(所以也固定),为常量 To optimizing the MPC: QP:

Performance with constraint

  • Add constraints to enforce:

    Derivation

    of linear MPC可得 与上面类似,写作 可得

    可见两组条件都可以写作线性形式 把两组矩阵上下拼接一下,合成一个矩阵等式

    由上可知两个条件均可以写为线性形式: 所以此时需要优化的方程为:

Tracking, disturbances, and delay

Tracking

Disturbances

  • Measured disturbance (from input)

  • Get QP:

Feasibility, convergence and Stability

Convergence

  • Use the Value function (其中为terminal constraint) as a Lyapunov-like function。

Linear time-varying MPC (CDC’19 slides No. 2a)

LPV (Linear parameter-varying) models

  • Linear, parameters change with time, with disturbance

  • Get QP:

LTV (linear time-varying) model

  • Model change over time (: prediction horizon)

  • The measure distubance is embedded in the model(?)

  • Get QP:

Non-linear MPC

Model:

  • Model

  • Constraints:

Performance index:

  • Performance index

  • Need to check what this is

is reference (course slides 2, p26)

Solver

Kalman filter fits here?!

Single shooting vs. multiple shooting (CDC’19 slides No. 2a, p14)

  • Single shooting: From integrate the system on the whole horizon continuous trajectory

  • Multiple shooting: From integrate the system on each interval separately discontinuous trajectory

  • RTI Real time iteration

  • Comparison:

    Multiple shootingSingle shooting
    Unstable systembetter
    Initialization of states
    at intermediate nodes
    better
    QP/NLPbigger
    opt. vars
    opt. vars

Economic MPC (CDC’19 slides No. 2b)

Hybrid and stochastic MPC (CDC’19 slides No. 3)

  • If you have multiple mode?
  • MIP

MPC with RL (CDC’19 slides No. 4)

MPCRL
policyoptimal policy
Value

LQR case:

value为,reward为
optimal value


Q-valueoptimal action-value


Reinforcement learning

Policy evaluation (include value evaluation/prediction and q evaluation/control)

  • Monte Carlo

    • prediction
    • control
  • Temporal Difference

    • prediction: TD(0)
    • control: SARSA/Q-learning SARSA: Q-learning:

Policy optimization

  • Greedy policy updates

    • Value evaluation: 用于model-based的optimization。
    • Q evaluation:用于model-free的optimization。
  • -greedy

  • Exploration vs. exploitation

    • Greedy in the limit with infinite exploration, e.g., -greedy with 0

Abstract/generalize

  • Curse of dimensionality
  • Function approximation (I think they combine this with the Q-learning part. Or maybe just use Q-learning as an example?)

Q-learning

MPC

MPC-based RL

  • Learn the true Q-function with MPC Gros’ paper @ TAC2020
  • Enforcing Safety
  • Safe RL
  • Safe Q-learning
  • Safe Actor-Critic RL
  • MPC sensitivies (Differentiate MPC?)
  • Realtime NMPC and RL
  • Mixed-Integer problems

Adding safty constraints

  1. Adding extra terms to quadratic function Linear quadratic function optimazing the cost to minimium. By adding extra penalty to the cost function to make the cost
  2. Mixed-integer programming solver.

something to figure out

  1. Mixed-integer quadratic programming (MIQP)
  2. PWA piecewise affine approximation

Ref

** Model Predictive Control: from the Basics to Reinforcement Learning**

CDC’19 workshop on MPC

Sequential Linear Quadratic Optimal Control for Nonlinear Switched Systems Fast nonlinear Model Predictive Control for unified trajectory optimization and tracking