security

1. [50%]Synthesis of deceptive strategies in reachability games with action misperception

Abhishek N. Kulkarni et al., IJCAI 2020

Main problem to solve: two player (P1, P2) plays a turn-based game, P2 knows part of P1’s action and try to prevent P1 from reaching the goal. P1和P2轮流进行游戏。 P1的目标是达到目的地，P2的目的是阻止P1达成目标。开始阶段P2只知道部分P1的action，P1方设法利用P2的信息不完善取得优势，P2则在交互中发现未知的P1的action。
hyper game [Bennett 1977]
DASW: Deceptive Almost-Sure Winning

1.1. Game on graph

Action sets of P1 and P2 be and . A turn-based game on graph is the tuple: , where
- is the set of states partitioned into P1’s state and P2’s satates . P1 chooses an action when and P2 chooses an action when .
- is set of action for P1 and P2.
- is a deterministic transition function that maps a state and an action to a successor state.
- is a set of final states.
is the -th state-action pair. is from step to step .
is a run (state-history) is the action-history
reachability objectives: 怎么理解里面的等式? 整个过程中曾经出现过的state，若，则P1赢（final states出现在过程中），否则P2赢。
Zielonka’s Algorithm
Concurrent reachability games

Example here

P1 state , P2 state
The objective for P1 is to reach the final state from the initial state
P1 action , P2 action

1.2. Action misperception and information asymmetry

P1 has complete information about and . (Misperceives) P1’s action set is . P1知道所有可能的action ，但是其只能执行的一部分。 P2 only konws P2只知道它自己可以执行的action ，通过观察逐步得知中的action。 Both P1 and P2 have complete information about the game state-space , transition function and the final states .
P1’s perceptual game is identical to the ground-truth game: P2’s peceptual game is a game under misperception:
P1 will compute a conservative strategy, because she assume P2 will react to all (P2 has extra strategy for ) P1认为P2针对以外但属于的action有额外的策略，因此会采取保守策略。
Other questions:
- P2 observes P1 playing an action , which P2 did not believe to be in P1’s action set? P2观察到P1执行了其不知道的action，将其加入自己的perceived action set 。
- P2 might be capable of complex inference by observing P1 can perform an action , infer that P1 must be capable of action and . P2可以进行推理，若action 由和两步构成，则推测出更多P1可以执行的action 。

2. [30%]Secure-by-synthesis network with active deception and temporal logic specifications

Jie Fu et al., Arxiv 2020

Main idea: a deterministic, turn-based game arena
The limitation of attack graphs: They do not take into account the possible defense mechanisms that can be used online by a security system during an active attack.
Check ~~Stackelberg games~~, Nash games, Bayesian games.
Game graph of the individual games being played by the attacker and defender:
1. A transition system, called arena, which captures all possible interactions over multiple stages of the game.
2. A labeling function that related an outcome-a sequence of states in the game graph-to properties specified in logic.
3. The temporal logic specifications as the players’ Boolean objectives.
Attack graph: Network conditions, (attacker and defender’s) own states, and transition function that captures the pre- and post-conditions of actions given states.
- A pre-condition: a logical formula that needs to be satisfied for an action to be taken.
- A post-condition: a logical formula describes the effect of an action in the network system.

2.1. Turn-based game arena tuple

- (resp., ) is the set of actions of P1 (resp., P2)
- is a dterministic transition function that maps a state-action pair to a next state
- is the set of atomic propositions.
- is the labeling function that maps each state to a set

2.2. Payoffs

Linear temporal logic

next
Until
Level-1 vs. level-2
- Level-1: A two-player hypergame, a pair
- Level-2: A two-player hypergame, a pair P1 perceives the interaction as a level-1 hypergame and P2 perceives the interaction as game . P1能观察并和整个游戏交互，P2只能观察/交互其中的一部分。

3. [50%]A receding-horizon MDP approach for performance evaluation of moving target defense in networks

Zhentian Qian et al., Arxiv 2020

项目	解释
环境/游戏类型	回合（题设）	多回合
Follower/attacker	知识（题设）	每回合开始时更新MDP，会侦测到defender布置的IDS（以一定正确概率）
Follower/attacker	策略（题设）	0. 侦测当前states 1. 对attacker用基于MDP的PAG建模 2. 在模型的基础上通过risk-sensitive finite-horizon planning和Iterative re-plan the attack strategy using a receding horizon framework选择此回合的策略
Defender/leader	知识（题设）	本文中没有关于attacker的相关知识
Defender/leader	策略	随机在MDP的状态中布置IDS，调整的参量为IDS的数量或IDS位置的重新分布频率

Main work: 先对attacker建模，将模型应用于设定场景中，再通过改变defender的变量，衡量场景中不同defender参数的效果。
1. Understand how the attacker behaves given the uncertainty: Attacker将实时检测并重新规划抵达目标的路径以避免被发现（repeatedly perform reconnaissance to learn about the locations of intrusion detection systems and re-plan optimally to reach the target while avoiding detection）在uncertainty下如何对attacker建模，分为两步：
  1. Model the network with dynamic defense as a time-varying probabilistic attack graph, which is an MDP with time-varying probabilistic transition function and reward function. 模型基于MDP，probabilistic transition function和reward function是随时间变化的。用PAG（probabilistic attack graph）建立graphical security model。此MDP中何为probabilistic：的几率成功转移到下一状态，的几率转移失败留在现状态。
  2. Solve the attack strategy using
    1. Risk-sensitive finite-horizon planning
    2. Iterative re-plan the attack strategy using a receding horizon framework.
2. 衡量defender strategy的标准：probability of successful and stealthy attack。变量： 1. IDS的数量。 2. IDS shuffle的频率。
一般使用IDS (Intrusion Detection System)的策略：通常defender并不知道attacker是否存在，只是布置pre-defined security protocols。（固定位置/随机位置）研究问题：如何评估proactive defense strategy在attacker攻击成功之前检测到attacker的效力？
Security model类型:
- attack graphs
- attack-defence trees
- probabilistic attack graphs for Moving Target Defense (MTD)

Method

Problem formulation:
- The attacker may success with , or fail with (PAG/Probabilistic Attack Graph).
- The defender randomly select some nodes for IDS (with some distribution, also depends on time ).
- The attacker knows PAG, does not know how defender arrange the IDS (actual deployment).
- The attacker can scan for IDS (observation of the deployment).
Attacker’s behavior modeling:
- Risk-sensitive planning：
  - 某个policy的期望价值的直观理解：，discount只对immediate reward应用一次（与RL有所区别）？
  - 最优policy应最大化
  - Primal Linear Program：对于以任意状态起始，最小化总的预期收益（每个起始状态概率乘以从此状态开始能获得的最大预期收益）？
  - Dual Linear Program：
- Receding-horizon attack planning：
  - 何时attacker被IDS所捕获；捕获后结果：
    - Attacker应尽量避开IDS（当作障碍）。We treat these nodes (equipped with IDSs) as obstacles which the attacker is to avoid.
    - “sink”: an absorbing states with zero reward
    - 若状态存在可转移的下一状态（），且布置了IDS（），那么attacker会陷入sink state。 When attacker exploits a vulnerability that has a positive probability to reach a node with IDS, then it will reach a sink state with probability one- that is, it is detected
  - Attacker如何应对：
    - 若attacker总是知道IDS的布置，可以doing nothing或者攻击没有IDS的host
    - 若IDS随机布置，应用本文算法1
      - 假设attacker knows exactly the sequence of locations for IDSs sampled over its planning horizon
    - 本文的attack planning algorithm不能learn and predict the changes in the network。
- [待整理]MDP: , is the initial state distribution. Immedediate reward function . is a constant for finite horizon length The terminal reward sink state: for , , if and , then . 下一个状态为IDS node时，不管攻击是否成功，均会被检测到。与此对应的
- [待整理] : expectation. 用对数的原因应该与的分布，softmax Bellman operator，以及entropy-regulated optimal policy有关？Softmax: CMU课件用Dual linear program进行优化
- [待整理]算法:
  1. 先进行netscan获取，初始化新的MDP的具体参数(reward)
  2. 对policy进行优化
  3. 根据优化后的policy执行一步action
  4. 系统判定是否被IDS侦测到
  5. 若未侦测到，则回到第一步，开始新的一个回合

ref to check:

attack graph
dynamic game models

4. [10%]Opportunicstic synthesis in reactive games under information asymmetry

Jie Fu, Allerton 2019 Abhishek N. Kulkarni et al., Arxiv 2019

Main idea
: the robot (a controlled agent), : its adversary (an uncontrolled environment agent)
Deterministic finite automaton (DFA):
- : the set of states
- : the set of input symbols, is the set of atomic propositions
- : a deterministic transition function状态转移
- is the initial state初始状态
- : the set of accepting states状态转移后的后续状态
- Complete DFA: for every state and for every input symbol the transition function is defined. 对于任意初始状态和action，总有后续状态 Incomplete DFA: can be made complete by introducing a sink state and directing all undefined transitions into the sink state.
Transition system (TS): , the interaction between the robot and its adversary, two-player turn based.
- : set of states partitioned on the basis of the turn of and
- : set of actions for and respectively.
- : the deterministic transition function.
- : a set of atomic propositions.
- : the labeling function
Reactive game:
- : transition function

5. [50%]Identifying stealthy attackers in a game theoretic framework using deception

Anjon Basak et al., AG 2019

5.1. 总结：

在Security game中，attacker会试图避免被发现或者被识别（detection and identification，表征为特定轨迹），尝试混淆自己和其它attacker的特征。从defender的角度，如何通过带有特定目的的optimizing defensive deception actions（例如如何设置honeypot）去尽早的识别特定的attacker。
相较于其它事后分析attacker的方法，通过attacker行动前对环境的设置以便于获取更多信息
一套关于proactive deception的framework：
- A multi-stage game theoretic model to strategically deploy honeypots to force the attacker to reveal his type early.
- Show case study
- Present a general model, a basic solution algorithm, and simulation results

5.2. 工作1：model (section4)

项目	解释
环境/游戏类型	回合	单一回合
环境/游戏类型	环境	Node(state), value/cost, exploit(action), deterministic.（详情参见后续）
Follower/attacker	知识	知道整个网络拓朴，知道每个节点类型，但是不知道节点是否是由honeypot伪装而来
Follower/attacker	策略	多个attacker类型中的某一种（在game中不会变化）抵达指定目的（某个特定state，或特定类型的state），且cost最低，轨迹尽可能与其它类型的attacker重合
Defender/leader	知识	已知所有attacker类型，但是不知道具体是哪种
Defender/leader	策略

Node , goal node （类比state）
Value and cost of a node, goal with high value（类比reward）
Exploit (edge between nodes)（类比action）。
node-exploit-node与MDP的state-action-state的区别：
- Node 与存在某种形式从到的连接（如某种网络拓扑），一个attacker具有一些exploit的能力，在处可以执行某些exploit ，那么如果，则attacker可以通过执行任意，通过付出cost 到达并获得在此处的reward （如果存在）。即首先需要存在连接，其次需要有可执行（attacker和所在node同时具备）的exploit，才可能执行。（一般网络攻防中应该是下一个node存在相应的exploit，不过从模型的角度理解效果是等价的）
- State-action-state的连接中，一个action对应一条edge（一般来说），一般一个action去往不同的state
N类attacker type（），唯一defender
- Attacker：
  - Complete knowledge about the network（类比environment）
  - Single deterministic policy to reach the goal
  - Some exploits
  - 无法区分一个node是否是honeypot
- Defender:
  - 通过在environment中放置honeypot以deploy deception
  - The configurations of the honeypots为：从real node T中随机选取（为什么随机选取？）
Game process:
- Start: attacker随机选取一个类型
- 第回合：defender有从开始到的所有历史作为参照。依据策略布置个honeypot。attacker随时获知网络的更新并规划新的optimal plan。若多组plan的cost相同，则attacker会选取其中have the maximum commom overlapping length的。
- End：attacker caught by the deception或者reaches to its goal node
APT: advanced persistent threat高级长期威胁****

5.3. 工作2：Three case studies

5.3.1. Attackers with same exploits but different goals

	attacker	attacker
exploits
goal
plan	绿线	红线

添加了一个连接了和的decoy，使得直接在第一步时认为自己已经到达终点，区分了和。
否则将在第六步区分出和。

5.3.2. Attackers with shared exploits and different goals

	attacker	attacker
exploits
goal
plan	绿线	红线

添加了一个连接了和的decoy，使得认为从到到的路径的代价比原路径小，区分了和。
否则将在第四步区分出和。

5.3.3. Attackers with shared exploits but same goals

	attacker	attacker
exploits
goal
plan	绿线	红线

添加了一个连接了和的decoy，使得认为经过的路径的代价比原路径小，区分了和
否则无法区分和。

5.4. Defender’s and attacker’s deceison making

三组实验

不同 density of edges and shared vulnerablilities between nodes对defender何时能检测出attacker类型的影响
Same edge density and shared vulnerabilities between nodes (fixed to 40%)，attackers之间shared exploits不同
不同defender’s techniques:
1. minimizing maximum overlap: min-max-overlap
2. minimizing maximum expected overlap: min-max-exp-overlap
3. minimizing entropy: min-entropy

other possible reading probe design

6. [0%]A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy

Jeffrey Pawlick et al., ACM computing surveys 2019

Z's learning note

Explorer

security_game

1. [50%]Synthesis of deceptive strategies in reachability games with action misperception

1.1. Game on graph

1.2. Action misperception and information asymmetry

2. [30%]Secure-by-synthesis network with active deception and temporal logic specifications

2.1. Turn-based game arena tuple

2.2. Payoffs

3. [50%]A receding-horizon MDP approach for performance evaluation of moving target defense in networks

4. [10%]Opportunicstic synthesis in reactive games under information asymmetry

5. [50%]Identifying stealthy attackers in a game theoretic framework using deception

5.1. 总结：

5.2. 工作1：model (section4)

5.3. 工作2：Three case studies

5.3.1. Attackers with same exploits but different goals

5.3.2. Attackers with shared exploits and different goals

5.3.3. Attackers with shared exploits but same goals

5.4. Defender’s and attacker’s deceison making

6. [0%]A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy

Table of Contents

Backlinks

Z's learning note

Explorer

security_game

1. [50%]Synthesis of deceptive strategies in reachability games with action misperception §

1.1. Game on graph §

1.2. Action misperception and information asymmetry §

2. [30%]Secure-by-synthesis network with active deception and temporal logic specifications §

2.1. Turn-based game arena tuple §

2.2. Payoffs §

3. [50%]A receding-horizon MDP approach for performance evaluation of moving target defense in networks §

4. [10%]Opportunicstic synthesis in reactive games under information asymmetry §

5. [50%]Identifying stealthy attackers in a game theoretic framework using deception §

5.1. 总结： §

5.2. 工作1：model (section4) §

5.3. 工作2：Three case studies §

5.3.1. Attackers with same exploits but different goals §

5.3.2. Attackers with shared exploits and different goals §

5.3.3. Attackers with shared exploits but same goals §

5.4. Defender’s and attacker’s deceison making §

6. [0%]A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy §

Table of Contents

Backlinks

1. [50%]Synthesis of deceptive strategies in reachability games with action misperception

1.1. Game on graph

1.2. Action misperception and information asymmetry

2. [30%]Secure-by-synthesis network with active deception and temporal logic specifications

2.1. Turn-based game arena tuple

2.2. Payoffs

3. [50%]A receding-horizon MDP approach for performance evaluation of moving target defense in networks

4. [10%]Opportunicstic synthesis in reactive games under information asymmetry

5. [50%]Identifying stealthy attackers in a game theoretic framework using deception

5.1. 总结：

5.2. 工作1：model (section4)

5.3. 工作2：Three case studies

5.3.1. Attackers with same exploits but different goals

5.3.2. Attackers with shared exploits and different goals

5.3.3. Attackers with shared exploits but same goals

5.4. Defender’s and attacker’s deceison making

6. [0%]A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy