Uninformed search and informed search (Based on WPI RBE 3002)

搜索的关键点

  • 损失函数cost function
  • 停止条件
  • FIFO

  • following the general template for graph search, discards any new path to a state already in the frontier or explored set

  • 拓展过程中发现目标即结束算法 (Artificial Intelligence: A Modern Approach, Third Edition. Stuart Russell and Peter Norvig, p81):

    Note that as soon as a goal node is generated, we know it is the shallowest goal node because all shallower nodes must have been generated already and failed the goal test.

  • 有条件的全局最优: (Artificial Intelligence: A Modern Approach, Third Edition. Stuart Russell and Peter Norvig, p81, 82):

    Now, the shallowest goal node is not necessarily the optimal one; technically, breadth-first search is optimal if the path cost is a nondecreasing function of the depth of the node. The most common such scenario is that all actions have the same cost.

  • LIFO
  • 因为搜索中将环打开成为树,深度优先无法保证最优 (Artificial Intelligence: A Modern Approach, Third Edition. Stuart Russell and Peter Norvig, p86):

    depth-first search is not optimal

  • Problem: how can we find the global optimal path? Although we know distance between connected nodes, we have to try every path to find out the shortest path, which cost a lot of time. (If you find the best path without exploring the whole map, you are lucky. And how can you guarantee that?)

  • Solve: If we have some knowledge between nodes that are not directly connected, AND we are OK with a good path (not the best but not too bad). We can use the knowledge to trade for the searching time. (Time Vs. Optimal)

  • If we have the following road map (number on this map is the cost/distance of traveling from one city to the other):

  • And we have great-circle distance (theoretical distance) from any city to New Bedford (use it as heuristic if we need it).

    CityDistance(km)
    Fitchburg (FB)127
    Worcester (Woo)101
    Putnam (PN)87
    Lowell (LW)116
    Framinham (FH)83
    Providence (PD)45
    Boston (BOS)81
    Plymouth (PM)42
    New Bedford (NB)0
  • Imaging if we start from Fitchburg, to visit New Bedford.

  • Define our cost function as the overall length we traveled from Worcester to current city (): (function might stand for cost-to-go).

  • Define our heuristic function is given by the above (great-circle distance from current city to the target).

  • is the function to help us to decide where to go next (among all frontiers, explore at the node with lowest ). and is weight for the cost and the heuristic. They are provided by given algorithm.

  • Legend for the graph:

Uniform cost search (a variant of Dijikstra)

  • Equation: Only the cost is used.

  • 停止条件Stop condition: 当算法在目标点寻找其分支时,而不是在扩展分支时目标点作为分支出现。这个要求保证了搜索结果为全局最优 重复检验frontier中已有的节点,如果有更有路径时将其替换 (Artificial Intelligence: A Modern Approach, Third Edition. Stuart Russell and Peter Norvig, p84):

    The first is that the goal test is applied to a node when it is selected for expansion rather than when it is first generated. The reason is that the first goal node that is generated may be on a suboptimal path. The second difference is that a test is added in case a better path is found to a node currently on the frontier

  • Step 1: Worcester is the cloest one.

-Step 2: Now we add where we can go from Worcester to the graph, and realize that Lowell is the one with the lowest cost (explore next)

  • Step 3: We explore from Lowell, and realized that we can visit Framingham with a lower cost from Worcester. So we add Boston, ignore the path from Lowell to Framingham.

  • Step 4: We can only get to Boston from Framingham, but there is a shorter path. So we will no longer explore from Framingham. Now we start from Boston.

  • Step 5: Ignore path from Boston to Providence. Start from Providence.

  • Step 6: From Providence we can get to New Bedford with a cost of 160. Cost at Putnam is lower than current path to New Bedford, we need try to visit Putnam as well.

  • Step 7: From Putnam we can only get to Providence with a higher cost. Now we know the cost and best path (just go backward) to New Bedford

  • Equation: Only the heuristic is used.

  • Step 1: Worcester looks closer to New Bedford.

  • Step 2: Providence looks closer to New Bedford.

  • Step 3: We find New Bedford. (Note that from Worcester to Putnam is closer/less total cost than from Providence to Putnum, so we remove that path)

  • 在一定条件下可以实现全局最优 (Artificial Intelligence: A Modern Approach, Third Edition. Stuart Russell and Peter Norvig, p93):

    provided that the heuristic function h(n) satisfies certain conditions, A∗ search is both complete and optimal.

  • Equation: Both the cost and the heuristic are used.

  • 停止条件: 与uniform cost search类似,第一次扩展到目的地时不直接停止算法,要继续搜索直到目的地成为要扩展的节点时才结束 (Artificial Intelligence: A Modern Approach, Third Edition. Stuart Russell and Peter Norvig, p93):

    The algorithm is identical to UNIFORM-COST-SEARCH except that A∗ uses g + h instead of g.

  • Step 1: Worcester is always the best considering both cost and heuristic!

  • Step 2: Then we explore from Providence.

  • Step 3: From providence to Putnam cost higher than from Worcester to Putnam, so we ignore this path. We found New Bedford with the lowest . With that we could end our search.

Conclusion:

  • In the example:

    MethodFunctionStepFound path cost
    Uniformcost7160
    Greedy3160 (by luck)
    A*3160 (by luck)
  • In general

    Methodtimeoptimal path
    UniformcostSlowestGuaranteed
    GreedyUsually the fastestCannot guaranteed
    A*Fast, balancedConditionally guaranteed
  • What can we benefit from a heuristic function?

    • A heuristic function is an estimate of the system model. 启发函数的本质是对系统的估计,通过对系统的建模或某种猜测将多步运算进行简化(如查表,二次方程),从而引导更快的完成搜索。 例:将两个城市间经纬度用于启发函数的计算,将求两个城市之间的距离需要进行的连续的加法及求最小用查出两个城市的经纬度代入二次方程进行替代,从而完成近似。 E.g., if we know the latitude and altitude of two cities, we could calculate their great circle distance. Hopefully, the highway connects the two cities are pretty straight.
    • A good heuristic function provides a good estimation between any given nodes. So giving the starting node and any nodes that not directly connected to the starting node, we can use this knowledge to facilitate the search, assign higher searching priority to nodes with better potential. 启发函数的作用是辅助对frontier中的节点进行排序,将更有希望在最优路径中的节点提到搜索列表的前面
    • Applying heuristic function cannot guarantee to get the global optimal path due to the error between the estimation and the map (model vs. real system). 启发函数不能保证全局最优,但是可以通过搜索结果对启发函数进行优化,再进行搜索。通过数次迭代后得出全局最优解。
    • The global optimal path can be achieved by:
      1. Luck.
      2. Improve the heuristic function after each search. The current search result can be used to do so. After a finite number of iterations, the search will converge to the global optimal one.

路径规划演示

-Github - PathPlanning

思考:搜索与MDP的异同

  • 搜索有三个基本要素:node,edge,cost;以及一个额外的要素estimation method/data(heuristic)。 MDP的基本组成部分为

从MDP的角度看搜索问题:

  1. 搜索可以看作是一个discrete MDP。
  2. State即为node。如node Worcester,可以看作at the state “Worcester”。
  3. Action即为edge。如上例中任意一个城市最多与其它四个城市相连接,不妨将action set定义为:将与父节点相连接的城市按字母顺序排列,分别为移动至排列中的第一、二、三、四个城市。如与Worcester相连的城市为。若相连城市少于四个时,标记多出的action为不可执行。可以类比于Gridworld中向N/S/W/E移动。
  4. 。上面讨论的搜索问题中,从一个城市移动到另一个城市不会受概率因素影响(没有机率走错到其它城市),是deterministic MDP。
  5. Reward与cost等价,可认为reward为负cost,将求最小变为求最大。
  6. MDP tuple 一例为

Value iteration