1. [50%]Synthesis of deceptive strategies in reachability games with action misperception

Abhishek N. Kulkarni et al., IJCAI 2020

  • Main problem to solve: two player (P1, P2) plays a turn-based game, P2 knows part of P1’s action and try to prevent P1 from reaching the goal. P1和P2轮流进行游戏。 P1的目标是达到目的地,P2的目的是阻止P1达成目标。开始阶段P2只知道部分P1的action,P1方设法利用P2的信息不完善取得优势,P2则在交互中发现未知的P1的action。
  • hyper game [Bennett 1977]
  • DASW: Deceptive Almost-Sure Winning

1.1. Game on graph

  • Action sets of P1 and P2 be and . A turn-based game on graph is the tuple: , where
    • is the set of states partitioned into P1’s state and P2’s satates . P1 chooses an action when and P2 chooses an action when .
    • is set of action for P1 and P2.
    • is a deterministic transition function that maps a state and an action to a successor state.
    • is a set of final states.
  • is the -th state-action pair. is from step to step .
  • is a run (state-history) is the action-history
  • reachability objectives: 怎么理解里面的等式? 整个过程中曾经出现过的state,若 ,则P1赢(final states出现在过程中),否则P2赢。
  • Zielonka’s Algorithm
  • Concurrent reachability games
Example here
  • P1 state , P2 state
  • The objective for P1 is to reach the final state from the initial state
  • P1 action , P2 action

1.2. Action misperception and information asymmetry

  • P1 has complete information about and . (Misperceives) P1’s action set is . P1知道所有可能的action ,但是其只能执行的一部分。 P2 only konws P2只知道它自己可以执行的action ,通过观察逐步得知中的action。 Both P1 and P2 have complete information about the game state-space , transition function and the final states .
  • P1’s perceptual game is identical to the ground-truth game: P2’s peceptual game is a game under misperception:
  • P1 will compute a conservative strategy, because she assume P2 will react to all (P2 has extra strategy for ) P1认为P2针对以外但属于的action有额外的策略,因此会采取保守策略。
  • Other questions:
    • P2 observes P1 playing an action , which P2 did not believe to be in P1’s action set? P2观察到P1执行了其不知道的action,将其加入自己的perceived action set
    • P2 might be capable of complex inference by observing P1 can perform an action , infer that P1 must be capable of action and . P2可以进行推理,若action 两步构成,则推测出更多P1可以执行的action

2. [30%]Secure-by-synthesis network with active deception and temporal logic specifications

Jie Fu et al., Arxiv 2020

  • Main idea: a deterministic, turn-based game arena

  • The limitation of attack graphs: They do not take into account the possible defense mechanisms that can be used online by a security system during an active attack.

  • Check Stackelberg games, Nash games, Bayesian games.

  • Game graph of the individual games being played by the attacker and defender:

    1. A transition system, called arena, which captures all possible interactions over multiple stages of the game.
    2. A labeling function that related an outcome-a sequence of states in the game graph-to properties specified in logic.
    3. The temporal logic specifications as the players’ Boolean objectives.
  • Attack graph: Network conditions, (attacker and defender’s) own states, and transition function that captures the pre- and post-conditions of actions given states.

    • A pre-condition: a logical formula that needs to be satisfied for an action to be taken.
    • A post-condition: a logical formula describes the effect of an action in the network system.

2.1. Turn-based game arena tuple

    • (resp., ) is the set of actions of P1 (resp., P2)
    • is a dterministic transition function that maps a state-action pair to a next state
    • is the set of atomic propositions.
    • is the labeling function that maps each state to a set

2.2. Payoffs

Linear temporal logic

  • next

  • Until

  • Level-1 vs. level-2

    • Level-1: A two-player hypergame, a pair
    • Level-2: A two-player hypergame, a pair P1 perceives the interaction as a level-1 hypergame and P2 perceives the interaction as game . P1能观察并和整个游戏交互,P2只能观察/交互其中的一部分。

3. [50%]A receding-horizon MDP approach for performance evaluation of moving target defense in networks

Zhentian Qian et al., Arxiv 2020

项目解释
环境/游戏类型回合(题设)多回合
Follower/attacker知识(题设)每回合开始时更新MDP,会侦测到defender布置的IDS(以一定正确概率)
策略(题设)0. 侦测当前states
1. 对attacker用基于MDP的PAG建模
2. 在模型的基础上通过risk-sensitive finite-horizon planning和Iterative re-plan the attack strategy using a receding horizon framework选择此回合的策略
Defender/leader知识(题设)本文中没有关于attacker的相关知识
策略随机在MDP的状态中布置IDS,调整的参量为IDS的数量或IDS位置的重新分布频率
  • Main work: 先对attacker建模,将模型应用于设定场景中,再通过改变defender的变量,衡量场景中不同defender参数的效果。

    1. Understand how the attacker behaves given the uncertainty: Attacker将实时检测并重新规划抵达目标的路径以避免被发现(repeatedly perform reconnaissance to learn about the locations of intrusion detection systems and re-plan optimally to reach the target while avoiding detection) 在uncertainty下如何对attacker建模,分为两步:
      1. Model the network with dynamic defense as a time-varying probabilistic attack graph, which is an MDP with time-varying probabilistic transition function and reward function. 模型基于MDP,probabilistic transition function和reward function是随时间变化的。 用PAG(probabilistic attack graph)建立graphical security model。 此MDP中何为probabilistic:的几率成功转移到下一状态的几率转移失败留在现状态
      2. Solve the attack strategy using
        1. Risk-sensitive finite-horizon planning
        2. Iterative re-plan the attack strategy using a receding horizon framework.
    2. 衡量defender strategy的标准:probability of successful and stealthy attack。 变量: 1. IDS的数量。 2. IDS shuffle的频率。
  • 一般使用IDS (Intrusion Detection System)的策略:通常defender并不知道attacker是否存在,只是布置pre-defined security protocols。(固定位置/随机位置) 研究问题:如何评估proactive defense strategy在attacker攻击成功之前检测到attacker的效力?

  • Security model类型:

    • attack graphs
    • attack-defence trees
    • probabilistic attack graphs for Moving Target Defense (MTD)
Method
  • Problem formulation:

    • The attacker may success with , or fail with (PAG/Probabilistic Attack Graph).
    • The defender randomly select some nodes for IDS (with some distribution, also depends on time ).
    • The attacker knows PAG, does not know how defender arrange the IDS (actual deployment).
    • The attacker can scan for IDS (observation of the deployment).
  • Attacker’s behavior modeling:

    • Risk-sensitive planning:
      • 某个policy的期望价值的直观理解:,discount只对immediate reward应用一次(与RL有所区别)?
      • 最优policy应最大化
      • Primal Linear Program:对于以任意状态起始,最小化总的预期收益(每个起始状态概率乘以从此状态开始能获得的最大预期收益)?
      • Dual Linear Program:
    • Receding-horizon attack planning:
      • 何时attacker被IDS所捕获;捕获后结果:
        • Attacker应尽量避开IDS(当作障碍)。We treat these nodes (equipped with IDSs) as obstacles which the attacker is to avoid.
        • “sink”: an absorbing states with zero reward
        • 若状态存在可转移的下一状态),且布置了IDS(),那么attacker会陷入sink state。 When attacker exploits a vulnerability that has a positive probability to reach a node with IDS, then it will reach a sink state with probability one- that is, it is detected
      • Attacker如何应对:
        • 若attacker总是知道IDS的布置,可以doing nothing或者攻击没有IDS的host
        • 若IDS随机布置,应用本文算法1
          • 假设attacker knows exactly the sequence of locations for IDSs sampled over its planning horizon
        • 本文的attack planning algorithm不能learn and predict the changes in the network。
    • [待整理]MDP: , is the initial state distribution. Immedediate reward function . is a constant for finite horizon length The terminal reward sink state: for , , if and , then . 下一个状态为IDS node时,不管攻击是否成功,均会被检测到。与此对应的
    • [待整理] : expectation. 用对数的原因应该与的分布,softmax Bellman operator,以及entropy-regulated optimal policy有关?Softmax: CMU课件 用Dual linear program进行优化
    • [待整理]算法:
      1. 先进行netscan获取,初始化新的MDP的具体参数(reward)
      2. 对policy进行优化
      3. 根据优化后的policy执行一步action
      4. 系统判定是否被IDS侦测到
      5. 若未侦测到,则回到第一步,开始新的一个回合

ref to check:

  • attack graph
  • dynamic game models

4. [10%]Opportunicstic synthesis in reactive games under information asymmetry

Jie Fu, Allerton 2019 Abhishek N. Kulkarni et al., Arxiv 2019

  • Main idea
  • : the robot (a controlled agent), : its adversary (an uncontrolled environment agent)
  • Deterministic finite automaton (DFA):
    • : the set of states
    • : the set of input symbols, is the set of atomic propositions
    • : a deterministic transition function状态转移
    • is the initial state初始状态
    • : the set of accepting states状态转移后的后续状态
    • Complete DFA: for every state and for every input symbol the transition function is defined. 对于任意初始状态和action,总有后续状态 Incomplete DFA: can be made complete by introducing a sink state and directing all undefined transitions into the sink state.
  • Transition system (TS): , the interaction between the robot and its adversary, two-player turn based.
    • : set of states partitioned on the basis of the turn of and
    • : set of actions for and respectively.
    • : the deterministic transition function.
    • : a set of atomic propositions.
    • : the labeling function
  • Reactive game:
    • : transition function

5. [50%]Identifying stealthy attackers in a game theoretic framework using deception

Anjon Basak et al., AG 2019

5.1. 总结:

  • 在Security game中,attacker会试图避免被发现或者被识别(detection and identification,表征为特定轨迹),尝试混淆自己和其它attacker的特征。从defender的角度,如何通过带有特定目的的optimizing defensive deception actions(例如如何设置honeypot)去尽早的识别特定的attacker。
  • 相较于其它事后分析attacker的方法,通过attacker行动前对环境的设置以便于获取更多信息
  • 一套关于proactive deception的framework:
    • A multi-stage game theoretic model to strategically deploy honeypots to force the attacker to reveal his type early.
    • Show case study
    • Present a general model, a basic solution algorithm, and simulation results

5.2. 工作1:model (section4)

项目解释
环境/游戏类型回合单一回合
环境Node(state), value/cost, exploit(action), deterministic.(详情参见后续)
Follower/attacker知识知道整个网络拓朴,知道每个节点类型,但是不知道节点是否是由honeypot伪装而来
策略多个attacker类型中的某一种(在game中不会变化)
抵达指定目的(某个特定state,或特定类型的state),且cost最低,轨迹尽可能与其它类型的attacker重合
Defender/leader知识已知所有attacker类型,但是不知道具体是哪种
策略
  • Node , goal node (类比state)
  • Value and cost of a node, goal with high value(类比reward)
  • Exploit (edge between nodes)(类比action)。
  • node-exploit-node与MDP的state-action-state的区别:
    • Node 存在某种形式从的连接(如某种网络拓扑),一个attacker具有一些exploit的能力 ,在处可以执行某些exploit ,那么如果,则attacker可以通过执行任意,通过付出cost 到达并获得在此处的reward (如果存在)。 即首先需要存在连接,其次需要有可执行(attacker和所在node同时具备)的exploit,才可能执行。 (一般网络攻防中应该是下一个node存在相应的exploit,不过从模型的角度理解效果是等价的)
    • State-action-state的连接中,一个action对应一条edge(一般来说),一般一个action去往不同的state
  • N类attacker type(),唯一defender
    • Attacker:
      • Complete knowledge about the network(类比environment)
      • Single deterministic policy to reach the goal
      • Some exploits
      • 无法区分一个node是否是honeypot
    • Defender:
      • 通过在environment中放置honeypot以deploy deception
      • The configurations of the honeypots为:从real node T中随机选取(为什么随机选取?
  • Game process:
    • Start: attacker随机选取一个类型
    • 回合:defender有从开始到的所有历史作为参照。依据策略布置个honeypot。attacker随时获知网络的更新并规划新的optimal plan。 若多组plan的cost相同,则attacker会选取其中have the maximum commom overlapping length的。
    • End:attacker caught by the deception或者reaches to its goal node
  • APT: advanced persistent threat高级长期威胁****

5.3. 工作2:Three case studies

5.3.1. Attackers with same exploits but different goals

attacker attacker
exploits
goal
plan绿线红线
  • 添加了一个连接了的decoy,使得直接在第一步时认为自己已经到达终点,区分了
  • 否则将在第六步区分出

5.3.2. Attackers with shared exploits and different goals

attacker attacker
exploits
goal
plan绿线红线
  • 添加了一个连接了的decoy,使得认为从的路径的代价比原路径小,区分了
  • 否则将在第四步区分出

5.3.3. Attackers with shared exploits but same goals

attacker attacker
exploits
goal
plan绿线红线
  • 添加了一个连接了的decoy,使得认为经过的路径的代价比原路径小,区分了
  • 否则无法区分

5.4. Defender’s and attacker’s deceison making

  • 三组实验
  1. 不同 density of edges and shared vulnerablilities between nodes对defender何时能检测出attacker类型的影响
  2. Same edge density and shared vulnerabilities between nodes (fixed to 40%),attackers之间shared exploits不同
  3. 不同defender’s techniques:
    1. minimizing maximum overlap: min-max-overlap
    2. minimizing maximum expected overlap: min-max-exp-overlap
    3. minimizing entropy: min-entropy

6. [0%]A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy

Jeffrey Pawlick et al., ACM computing surveys 2019