Artificial Intelligence

Cesare (MINGJU LI) Dec 20, 2018

General Information

This is a notes for the AI lesson of POLIMI 2018-2019.

Chapter 1: Introduction. The whole chapter

Chapter 2: Intelligent agents. The whole chapter

Chapter 3: Solving problems by searching. Sections 3.1-3.6

Chapter 5: Adversarial search. Sections 5.1-5.5, 5.7-5.8

Chapter 6: Constraint propagation. Sections 6.1-6.3

Chapter 7: Logical agents. Sections 7.1, 7.3-7.5, 7.6.1

Chapter 10: Classical planning. Sections 10.1, 10.2, 10.4

Chapter I INTRODUCTION

1.1 What is AI

Four definitions:

Thinking Humanly: Coginitive Science-首先我们需要知道如何定义“Humanly”，也就是人类究竟是如何思考的。在这种AI的定义中，我们寄希望于构造一个machine使其能够按照人类思考的方式进行思考，然而目前这个课题本事也是充满谜团和未知的。
Thinking Rationally: The “laws of thought” Approach-这一定义涉及到Logic部分的内容，将Knowledge用Formal Language进行表述，从而使机器能够对这些Knowledge进行推导和解决。
Acting Humanly: Turing Test-假如一个机器可以欺骗人类让人误以为对方是真正的人类，那么我们就认为这个机器达到了真正的智能的水平。
Acting Rationally: The Rational Agent Approach-每一个agent都有一定的acts，我们定义【A rational agent is one that acts so as to achieve the best outcome or, when there is uncertainty, the best expected outcome】。根据这个定义，correct inferences对一个rational agent是格外重要的，因为这是使得agent能够到达他预设的goal的必要条件。

1.2 The Foundations of AI

Philosophy: 知识从哪儿来？意识是怎样从大脑中诞生的？…
Mathematics: 我们怎样通过formal rules推导新的正确的结论？通过计算可以得出什么？…
Economics: 为了最大化收益应该怎样做结论？（决策论）…
Neuroscience: 大脑是如何处理信息的？…
Psychology: …
Computer Engineering: …
Control theory and cybernetics: …
Liguistics :…

1.3 The History of AI:

Knowledge Bases Systems/Neural Network/…

1.4 The state of art: Some Real Applications

Chapter II INTELLIGENT AGENTS

2.1 Agents and Environments

AGENT: An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.

AGENT FUNCTION: Mathematically speaking, we say that an agent’s behavior is described by the agent function that maps any given percept sequence to an action.

AGENT PROGRAM: The agent function for an artificial agent will be implemented by an agent program.

The agent function is an abstract mathematical description; the agent program is a concrete implementation, running within some physical system.

2.2 Good behavior: the concepts of retionality

A rational agent is one that does the right thing, but what does it mean to do the right thing?

By considering the consequences of the agent’s behavior. When an agent is plunked down in an environment, it generates a sequence of actions according to the percepts it receives. This sequence of actions causes the environment to go through a sequence of states. If the sequence is desirable, then the agent has performed well. This notion of desirability is captured by a performance measure that evaluates any given sequence of environment states.

Obviously, there is not one fixed performance measure for all tasks and agents; typically, a designer will devise one appropriate to the circumstances.

Four PRECONDITIONS of rationality

The performance measure that defines the criterion of success.
The agent’s prior knowledge of the environment.
The actions that the agent can perform.
The agent’s percept sequence to date.

DEFINITION OF A RATIONAL AGENT: For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has.

Rationality is not the same as perfection. Rationality maximizes expected performance, while perfection maximizes actual performance

To the extent that an agent relies on the prior knowledge of its designer rather than on its own percepts, we say that the agent lacks autonomy. A rational agent should be autonomous—it should learn what it can to compensate for partial or incorrect prior knowledge.

2.3 The Nature of Environments

In our discussion of the rationality of the simple vacuum-cleaner agent, we had to specify the performance measure, the environment, and the agent’s actuators and sensors. We group all these under the heading of the task environment. For the acronymically minded, we call this the PEAS (Performance, Environment, Actuators, Sensors) description.

Properties of environments:

Fully observable vs partially observable: If an agent’s sensors give it access to the complete state of the environment at each point in time, then we say that the task environ- ment is fully observable.

Single agent vs multiagent: For example, an agent solving a crossword puzzle by itself is clearly in a single-agent environment, whereas an agent playing chess is in a two- agent environment.

Deterministic vs stochastic: If the next state of the environment is completely deter- mined by the current state and the action executed by the agent, then we say the environment is deterministic; otherwise, it is stochastic.

Episodic vs sequential: Many classification tasks are episodic. For example, an agent that has to spot damaged parts on an assembly line bases each decision on the current part, regardless of previous decisions; In sequential environments, on the other hand, the current decision could affect all future decisions.

Static vs dynamic: If the environment can change while an agent is deliberating, then we say the environment is dynamic for that agent; otherwise, it is static.

Discrete vs continuous: The discrete/continuous distinction applies to the state of the environment, to the way time is handled, and to the percepts and actions of the agent.

2.4 The Structure of Agents

The job of AI is to design an agent program that implements the agent function — the mapping from percepts to actions. We assume this program will run on some sort of computing device with physical sensors and actuators—we call this the architecture:

agent = architecture + program

Simple reflex agents: These agents select actions on the basis of the current percept, ignoring the rest of the percept history. (e.g. 一个设计好的conversation agent，每当输入“你好”的时候，输出“你好”，但是这个机器人实际上并没有计算之前说了多少次你好，哪怕是输入一万次“你好”也会输出“你好”，而不是“你有完没完”。)
Model-based reflex agents: The most effective way to handle partial observability is for the agent to keep track of the part of the world it can’t see now. That is, the agent should maintain some sort of internal state that depends on the percept history and thereby reflects at least some of the unobserved aspects of the current state. An agent that uses such a model is called a model-based agent.(e.g. 接上例提到的conversation agent，如果我们设计一个model—如一个FSA，当输入超过三次的“你好”后，会输出“你有完没完”，输入超过四次“你好”后输出null)
Goal-based agents: Knowing something about the current state of the environment is not always enough to decide what to do. The agent needs some sort of goal information that describes situations that are desirable.(e.g. 比如设计一个用于解决推箱子问题的AI，它的goal是为了将所有的箱子按照一定的规则推到指定的位置)
Utility-based agents: Goals alone are not enough to generate high-quality behavior in most environments. For example, many action sequences will get the taxi to its destination (thereby achieving the goal) but some are quicker, safer, more reliable, or cheaper than others. Goals just provide a crude binary distinction between “happy” and “unhappy” states. An agent’s utility function is essentially an internalization of the performance measure.
Learning agents: Improving with experiencing

All agents can improve their performance through learning.

Chapter III SOLVING PROBLEMS BY SEARCHING

How an agent can find a sequence of actions that achieves its goals when no single action will do.

3.1. Problem-solving agents

Intelligent agents are supposed to maximize their performance measure. Achieving this is sometimes simplified if the agent can adopt a goal and aim at satisfying it.

A problem is defined by five components:

The initial state that the agent starts in.
A description of the possible actions available to the agent. Given a particular state s, ACTIONS(s) returns the set of actions that can be executed in s.
A description of what each action does; the formal name for this is the transition model, specified by a function RESULT(s,a) that returns the state that results from doing action a in state s.
The goal test, which determines whether a given state is a goal state.
A path cost function that assigns a numeric cost to each path.

A solution to a problem is an action sequence that leads from the initial state to a goal state. Solution quality is measured by the path cost function, and an optimal solution has the lowest path cost among all solutions.

3.2 Examples

3.3 Searching for Solutions

Having formulated some problems, we now need to solve them. A solution is an action sequence, so search algorithms work by considering various possible action sequences. The possible action sequences starting at the initial state form a search tree with the initial state at the root; the branches are actions and the nodes correspond to states in the state space of the problem.

Then we need to consider taking various actions. We do this by expanding the current state; that is, applying each legal action to the current state, thereby generating a new set of states.

Search algorithms all share this basic structure; they vary primarily according to how they choose which state to expand next, which is called search strategy.

Graph Search vs Tree Search: 在tree search中，我们可能会重复的陷入某一个redundant loop，而在graph search中，我们会记录我们所经过的node，确保我们不会浪费computing power在那些没有意义或者更加昂贵的solution上。

Measuring problem-solving performances: 4-VALUES

Completeness: Is the algorithm guaranteed to find a solution when there is one?
Optimality: Does the strategy find the optimal solution?
Time complexity: How long does it take to find a solution?
Space complexity: How much memory is needed to perform the search?

3.4 Uninformed search strategies

This section covers several search strategies that come under the heading of uninformed search (also called blind search). The term means that the strategies have no additional information about states beyond that provided in the problem definition. All they can do is generate successors and distinguish a goal state from a non-goal state. (想象身处于一个迷宫之中，你知道这个迷宫是有一个出口的，但是你并不确定这个出口具体的方位，你能做的就是盲目的前后左右去探索迷宫直至找到出口为止。)

Breadth-first search

Breadth-first search is a simple strategy in which the root node is expanded first, then all the successors of the root node are expanded next, then their successors, and so on. In general, all the nodes are expanded at a given depth in the search tree before any nodes at the next level are expanded.

Breadth-first search is optimal if the path cost is a nondecreasing function of the depth of the node. The most common such scenario is that all actions have the same cost. And of course it is complete.

The time complexity and space complexity for it are both O(b^d), b means the branching factor for each node, and d means the depth of the solution found. The memory requirements are a bigger problem for breadth-first search than is the execution time. One might wait 13 days for the solution to an important problem with search depth 12, but no personal computer has the petabyte of memory it would take. Fortunately, other strategies require less memory.

Uniform cost search

When all step costs are equal, breadth-first search is optimal because it always expands the shallowest unexpanded node. By a simple extension, we can find an algorithm that is optimal with any step-cost function. Instead of expanding the shallowest node, uniform-cost search expands the node n with the lowest path cost g(n). This is done by storing the frontier as a priority queue ordered by g.

Accoring to this description, it is not difficult to reach the conclusion that uniform cost search is optimal and complete

Uniform-cost search does not care about the number of steps a path has, but only about their total cost. Therefore, it will get stuck in an infinite loop if there is a path with an infinite sequence of zero-cost actions.

Depth first search

Depth-first search always expands the deepest node in the current frontier of the search tree. The search proceeds immediately to the deepest level of the search tree, where the nodes have no successors. As those nodes are expanded, they are dropped from the frontier, so then the search “backs up” to the next deepest node that still has unexplored successors.

And we could find that breadth-first-search uses a FIFO queue, depth-first search uses a LIFO queue.

The properties of depth-first search depend strongly on whether the graph-search or tree-search version is used. The graph-search version, which avoids repeated states and redundant paths, is complete in finite state spaces because it will eventually expand every node. The tree-search version, on the other hand, is not complete (because that it may fall in a infinite redundant loop).

The advantage of DFS is the space complexity. For a graph search, there is no advantage, but a depth-first tree search needs to store only a single path from the root to a leaf node, along with the remaining unexpanded sibling nodes for each node on the path.

And DFS is complete, but not optimal assurance.

Depth-limited search

The embarrassing failure of depth-first search in infinite state spaces can be alleviated by supplying depth-first search with a predetermined depth limit l. That is, nodes at depth l are treated as if they have no successors. This approach is called depth-limited search. The depth limit solves the infinite-path problem.

Iterative deepening depth-first search

Iterative deepening search (or iterative deepening depth-first search) is a general strategy, often used in combination with depth-first tree search, that finds the best depth limit. It does this by gradually increasing the limit—first 0, then 1, then 2, and so on—until a goal is found. This will occur when the depth limit reaches d, the depth of the shallowest goal node.

Like depth-first search, its memory requirements are modest: O(bd) to be precise. Like breadth-first search, it is complete when the branching factor is finite and optimal when the path cost is a nondecreasing function of the depth of the node.

In general, iterative deepening is the preferred uninformed search method when the search space is large and the depth of the solution is not known.

Bidirectional search

The idea behind bidirectional search is to run two simultaneous searches—one forward from the initial state and the other backward from the goal—hoping that the two searches meet in the middle.

3.5 Informed (Heuristic) Search Strategies

This section shows how an informed search strategy—one that uses problem-specific knowledge beyond the definition of the problem itself—can find solutions more efficiently than can an uninformed strategy.(尽管我们依然身在迷宫之中，但是我们已经知道迷宫的出口在东北方向，那么直觉上我们会觉得东北方的路会有更大的可能能够带我们走出迷宫。)

The evaluation function f(n) is construed as a cost estimate, so the node with the lowest evaluation is expanded first. Which means that it is the cost from the initial state to the goal state, even if this cost is estimated a lot.

The heuristic function h(n) is estimated cost of the cheapest path from the state at node n to a goal state.

Greedy best first search

Greedy best-first search tries to expand the node that is closest to the goal, on the grounds that this is likely to lead to a solution quickly.

And here in this search f(n)=h(n). (这也就是Greedy Best Search与A*的最大区别，我们只考虑哪个next state是最优的，无论从当前的state到这个next state的path cost是多少。)

A* search: Minimizing the total estimated solution cost

The most widely used form of best-first search is called A search. It evaluates nodes by combining *g(n), the cost to reach the node, and h(n), the cost to get from the node to the goal: f(n)=g(n)+h(n)

And here f(n) = estimated cost of the cheapest solution through n. Thus if we are trying to find the cheapest solution, a reasonable thing is to find the lowest f(n).

Conditions for optimality: Admissibility and consistency

The first condition we require for optimality is that h(n) be an admissible heuristic. An admissible heuristic is one that never overestimates the cost to reach the goal. Because g(n) is the actual cost to reach n along the current path, and f (n) = g(n) + h(n), we have as an immediate consequence that f(n) never overestimates the true cost of a solution along the current path through n.

A second, slightly stronger condition called consistency (or sometimes monotonicity) is required only for applications of A to graph search. A heuristic h(n) is consistent if, for every node n1 and every successor n2 of n1 generated by any action a, the estimated cost of reaching the goal from n2 is no greater than the step cost of getting to n2 plus the estimated cost of reaching the goal from n1: *h(n1) ≤ c(n1, a, n2) + h(n2)

A has the following properties: the tree-search version of A is optimal if h(n) is admissible, while the graph-search version is optimal if h(n) is consistent.

That A search is complete, optimal, and optimally efficient among all such algorithms is rather satisfying. Unfortunately, it does not mean that A is the answer to all our searching needs. The catch is that, for most problems, the number of states within the goal contour search space is still exponential in the length of the solution. And A requires a high amount of space usage. This may be the biggest disadvantage of A.

Memory-bounded heuristic search.

The simplest way to reduce memory requirements for A is to adapt the idea of iterative deepening to the heuristic search context, resulting in the iterative-deepening A (IDA*) algorithm.

The main difference between IDA and standard iterative deepening is that the cutoff used is the f-cost (g + h*) rather than the depth; at each iteration, the cutoff value is the smallest f-cost of any node that exceeded the cutoff on the previous iteration. RBFS is a great example in this search method.

RBFS is similar to that of a recursive depth-first search, but rather than continuing indefinitely down the current path, it uses the f limit variable to keep track of the f-value of the best alternative path available from any ancestor of the current node. If the current node exceeds this limit, the recursion unwinds back to the alternative path. As the recursion unwinds, RBFS replaces the f-value of each node along the path with a backed-up value—–the best f-value of its children. In this way, RBFS remembers the f-value of the best leaf in the forgotten subtree and can therefore decide whether it’s worth reexpanding the subtree at some later time. RBFS is somewhat more efficient than IDA, but still suffers from excessive node re-generation. (RBFS的优点在于，A是一种在结构上与BFS有某些相同之处的算法，在探索树的过程中，如果足够深的话将会需要存储O(b^d的nodes，这往往超过了我们能够容忍的space complexity，故我们采用RBFS，因为RBFS按照相同的原理选择下一个探索的点，但是一旦RBFS发现当前的路线的cost不是最优，就会unwind(先朝向root的方向然后再向下找当前最优cost的位置)，到达当前的最优的node，这样似乎没有产生什么好处，但是恰恰相反，在unwind的过程中，RBFS会更新路径上的点的h，因为这些h欺骗了RBFS的感情，当前的路径并不是最优的！同时RBFS也并不储存哪些点已经被expand了，下次如果revisit了这个点，那么重新计算就是了，RBFS就是使用这种方法来牺牲time来换取space。)

Learning to search better

3.6 Heuristic Functions

In this part, we are going to discuss how to find a heuristic function and the importance of a good heuristic function.

3.7 SUMMARY

Before an agent can start searching for solutions, a goal must be identified and a well- defined problem must be formulated.

• A problem consists of five parts: the initial state, a set of actions, a transition model describing the results of those actions, a goal test function, and a path cost function. The environment of the problem is represented by a state space. A path through the state space from the initial state to a goal state is a solution.

• Search algorithms treat states and actions as atomic: they do not consider any internal structure they might possess.

• A general TREE-SEARCH algorithm considers all possible paths to find a solution, whereas a GRAPH-SEARCH algorithm avoids consideration of redundant paths.

• Search algorithms are judged on the basis of completeness, optimality, time complexity, and space complexity. Complexity depends on b, the branching factor in the state space, and d, the depth of the shallowest solution.

• Uninformed search methods have access only to the problem definition. The basic algorithms are as follows:

Breadth-first search expands the shallowest nodes first; it is complete, optimal for unit step costs, but has exponential space complexity.
Uniform-cost search expands the node with lowest path cost, g(n), and is optimal for general step costs.
Depth-first search expands the deepest unexpanded node first. It is neither com- plete nor optimal, but has linear space complexity. Depth-limited search adds a depth bound.
Iterative deepening search calls depth-first search with increasing depth limits until a goal is found. It is complete, optimal for unit step costs, has time complexity comparable to breadth-first search, and has linear space complexity.
Bidirectional search can enormously reduce time complexity, but it is not always applicable and may require too much space.

• Informed search methods may have access to a heuristic function h(n) that estimates the cost of a solution from n.

The generic best-first search algorithm selects a node for expansion according to an evaluation function.
Greedy best-first search expands nodes with minimal h(n). It is not optimal but is often efficient.
A∗ search expands nodes with minimal f (n) = g(n) + h(n). A∗ is complete and optimal, provided that h(n) is admissible (for TREE-SEARCH) or consistent (for GRAPH-SEARCH). The space complexity of A∗ is still prohibitive.
RBFS (recursive best-first search) and SMA∗ (simplified memory-bounded A∗) are robust, optimal search algorithms that use limited amounts of memory; given enough time, they can solve problems that A∗ cannot solve because it runs out of memory. [https://www.slideshare.net/navidasadi1/an-example-of-using-rbfs-recursive-best-first-search-algorithm]

• The performance of heuristic search algorithms depends on the quality of the heuristic function. One can sometimes construct good heuristics by relaxing the problem defi- nition, by storing precomputed solution costs for subproblems in a pattern database, or by learning from experience with the problem class.

Chapter V: ADVERSARIAL SEARCH

We try to plan ahead in a world where other agents are planning against us.

5.1 Games

Games is a problem that each agents needs to consider the actions of other agents and how they affact itw own welfare. Due to the great search space, the search algorithms introduced in Chapter III may not be applicable.

We begin with a definition of the optimal move and an algorithm for finding it. We then look at techniques for choosing a good move when time is limited. Pruning allows us to ignore portions of the search tree that make no difference to the final choice, and heuristic evaluation functions allow us to approximate the true utility of a state without doing a complete search. Section 5.5 discusses games such as backgammon that include an element of chance; we also discuss bridge, which includes elements of imperfect information because not all cards are visible to each player. Finally, we look at how state-of-the-art game-playing programs fare against human opposition and at directions for future developments.

We first consider games with two palyer, whom we call MAX and MIN. MAX moves first and then they take turns until the game is over.

In formal language, It could be described as

S0: The initial state, which specifies how the game is set up at the start.
PLAYER(s): Defines which player has the move in state s.
ACTIONS(s): Returns the set of legal moves in state s.
RESULT(s,a): The transition model, which defines the result of a move, which is another state.
TERMINAL-TEST(s): A terminal test, which is true when the game is over and false otherwise. States where the game has ended are called terminal states.
UTILITY(s, p): A utility function (also called an objective function or payoff function), defines the final numeric value for a game that ends in terminal state s for a player p. In chess, the outcome is a win, loss, or draw, with values +1, 0, or 1. Some games have a wider variety of possible outcomes; Chess is zero-sum. “Constant-sum” would have been a better term, but zero-sum is traditional and makes sense if you imagine each player is charged an entry fee of 1/2, and end with 0 or 1.

The formal definition above help us to build a tree, while in general games, this tree would be too large to build. And in fact the main idea is that we don’t need to know the complete tree, we just need to build tree examining enough nodes to make a smart move.

5.2 Opttimal decisions in games

Given a game tree, the optimal strategy can be determined from the minimax value of each node n, which we write as MINIMAX(n). This values of a terminal state is just its utility. Furthermore, given a choice, MAX prefers to move to a state of maximum value, while MIN prefers a state of minimum value.

[Here in the above picture, s means node n]

The minimax algorithm

minimax algorithm为每个state s都计算其MINIMAX(s)，从最底下的ternimal states开始，首先通过terminal state计算utility，然后向上根据min或者max更新父节点的utility，直至更新完完整的树，这种方法虽然在实际中是行不通的，但是本章所有的方法其实都建立在minimax algorithm的基础上。

Mutipleplayer games (>=2)

Actually this is vaery same like the minimax algorithm, while in each nodes we assign a vector (length equal to the number of players). And each level we choose the max/min by the vectors due to the values in certain dimension.

5.3 Alpha-Beta Pruning

minimax的问题在于game states的数量实在是过于庞大(exponential)，但、是其实我们可以通过pruning有效的减小树的规模，在adversial search中常用的方法是apha-beta pruning。对于一棵标准的minimax树，我们可以去除那些永远都不会影响到我们最终决定的子树。我们在上面的minmax的例子中不难看出，minimax的tree是我们需要每一个terminal state的utility，然后才得以建立的complete search tree。而alpha beta pruning的核心就是，假如我（MAX）目前的最好选择是3，那么某个action a3后如果对手有一种选择可能导致2，那么我们根本不需要考虑这个action a3，因为对手不是笨蛋，一定会做出那种让我们得到2的action a’，所以我们就将这个action a3给prune掉了（即不考虑掉了）。

link1 for Alpha-Beta Pruning

link2 for Alpha-Beta Pruning

5.4 Imperfect Real-Time Decisions

在minimax算法中，我们需要创造一棵完整的树，尽管alpha-beta剪枝帮助我们缩减了这棵树的规模，但是通常情况下这棵树的深度依然过于庞大。所以在本节中，我们使用一个heuristic evaluation function来将那些non-terminal nodes转化成terminal leaves。

Evaluation function:evaluation function的作用是对于一个given position of game(state)，返回一个estimation of the expected utility。

5.5 Stochastic Games

在实际中，我们常常需要面对受概率影响的随机情况，因此我们无法像是在之前的情况中一样构建树，我们需要构建的是一棵随机树。

为了作出最优的决定，我们需要为每个nodes计算expecti-minimax values。换而言之，我们的策略可以概括为尽人事知天命。

5.7 State Of The Art Game Programs

5.8 Alternative Approaches

精准的计算optimal decision在大部分的情况下是无法处理的，实际中所有的算法都或多或少的进行了一定程度的近似和假设。

5.9 Summary

• A game can be defined by the initial state (how the board is set up), the legal actions in each state, the result of each action, a terminal test (which says when the game is over), and a utility function that applies to terminal states.

• In two-player zero-sum games with perfect information, the minimax algorithm can select optimal moves by a depth-first enumeration of the game tree.

• The alpha–beta search algorithm computes the same optimal move as minimax, but achieves much greater efficiency by eliminating subtrees that are provably irrelevant.

• Usually, it is not feasible to consider the whole game tree (even with alpha–beta), so we need to cut the search off at some point and apply a heuristic evaluation function that estimates the utility of a state.

• Many game programs precompute tables of best moves in the opening and endgame so that they can look up a move rather than search.

• Games of chance can be handled by an extension to the minimax algorithm that eval- uates a chance node by taking the average utility of all its children, weighted by the probability of each child.

Chapter VI: CONSTRAINT SATISFACTION PROBLEMS

This chapter describes a way to solve certain problems more efficiently. We used a factored representation for each state: a set of variables, each has a value. And a problem is solved when each variable has a value that satisfies all the constraints. A problem described this way is called a constraint satisfcation problem, CSP in short.

CSP search algorithms take advantage of the structure of states and use general-purpose rather than problem-specific heuristics to enable the solution of complex problems.(我们不需要根据每一个问题去设计一个heuristic function，我们只需要使用general的least remaining values策略进行搜索即可) The main idea is to eliminate large portions of the search space all at once by identifying variable/value combinations that violate the constraints.

6.1 Defining CSP

A constraint satisfaction problem consists of three components, X, D, and C :

X is a set of variables, {X1,…,Xn}.
D is a set of domains, {D1, . . . , Dn}, one for each variable.
C is a set of constraints that specify allowable combinations of values.

6.2 Constraint propagation: inference in CSPs

In regular state-space search, an algorithm can do only one thing: search. In CSPs there is a choice: an algorithm can search (choose a new variable assignment from several possibilities) or do a specific type of inference called constraint propagation: using the constraints to reduce the number of legal values for a variable, which can reduce the legal values for another variables.

THe key idea is local consistency, If we treat each variable as a node in a graph and each binary constraint as an arc, then the process of enforcing local consistency in each part of the graph causes inconsistent values to be eliminated throughout the graph.

Node Consistency

A single variable (corresponding to a node in the CSP network) is node-consistent if all the values in the variable’s domain satisfy the variable’s unary constraints. (e.g.在地图染色问题中，我们使用红黄蓝将不同的区域染色，假如我们将一个区域A染色为红色，那么对于一个和A相邻的区域B来说，对于B的Domain本来是{红，黄，蓝}三种选择，但是为了使B node consistent,也就是对于B domain中任何一个value都满足我们的constraint-“相邻的区域颜色不同”，那么我们需要将B的domain修改为{黄，蓝}，这个时候我们说B是node consistent，如果对于graph中的每一个点都node cnsistent，那么我们说这个graph是node consistent的)

Arc Consistency

A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary constraints. More formally, Xi is arc-consistent with respect to another variable Xj if for every value in the current domain Di there is some value in the domain Dj that satisfies the binary constraint on the arc (Xi,Xj). (e.g. 比如一个对于variable X和Y的constraint：Y=X²，为了使X相对于Y arc consistent，那么我们可以将X的domain设为{0,1,2,3}，将Y的domain设为{0,1,4,9})

The most popular algorithm for arc consistency is called AC-3. To make every variable arc-consistent, the AC-3 algorithm maintains a queue of arcs to consider.[https://stackoverflow.com/questions/28257422/ac-1-ac-2-and-ac-3-algorithms-arc-consistency]

AC-3: Initially, the queue contains all the arcs in the CSP. AC-3 then pops off an arbitrary arc (Xi , Xj ) from the queue and makes Xi arc-consistent with respect to Xj . If this leaves Di unchanged, the algorithm just moves on to the next arc. But if this revises Di (makes the domain smaller), then we add to the queue all arcs (Xk,Xi) where Xk is a neighbor of Xi. We need to do that because the change in Di might enable further reductions in the domains of Dk, even if we have previously considered Xk. If Di is revised down to nothing, then we know the whole CSP has no consistent solution, and AC-3 can immediately return failure. Otherwise, we keep checking, trying to remove values from the domains of variables until no more arcs are in the queue. At that point, we are left with a CSP that is equivalent to the original CSP—they both have the same solutions—but the arc-consistent CSP will in most cases be faster to search because its variables have smaller domains. (AC-3可以如下描述，我们对于一个graph而言，有诸多的node和arc需要去考虑，对于每一个node都对应一个domain，那么我们建立一个queue包含所有的arc，我们从queue中挑选一个arc (X_i,X_j)，检查这个arc是否consistent，如果不consistent，那么我们就修改domain，如果X_i的domain D_i被修改，那么就将所有的arc (X_k,X_i), k is a neighbor of i,添加进我们的queque中。如果D_i为空，证明当前CSP没有consistent的solution，而AC-3也会返回一个failure，不然的话我们就可以一直进行这个过程，直到queue中不包含任何arc为止，此时我们得到的一个图，其拓扑结构没有任何改变，但是对于一个node而言，他的D_i极大的减小了，这为我们后续的搜素提供了便利。)

Path Consistency

Consider the map-coloring problem on Australia, but with only two colors allowed, red and blue. Arc consistency can do nothing because every variable is already arc consistent: each can be red with blue at the other end of the arc (or vice versa). But clearly there is no solution to the problem. we need at least three colors for them alone.

Arc consistency tightens down the domains (unary constraints) using the arcs (binary constraints). To make progress on problems like map coloring, we need a stronger notion of consistency. Path consistency tightens the binary constraints by using implicit constraints that are inferred by looking at triples of variables.

A two-variable set {Xi,Xj} is path-consistent with respect to a third variable Xm if, for every assignment {Xi = a, Xj = b} consistent with the constraints on {Xi , Xj }, there is an assignment to Xm that satisfies the constraints on {Xi , Xm } and {Xm , Xj }. This is called path consistency. (e.g. 我们上面提到了，在地图染色问题中，假如每一个node的domain都是{红，黄}，显然这个graph是arc consistent的，但是这个CSP问题也很明显是unsolvable，但是如果我们用path consistency去检验，那么我们对于三个相邻的区域A，B，C，我们可以看到{A,B}和C不是path consistency的，当A和B各取到红、黄的时候，对于C没有一个assignment可以满足我们的constraint。)

K-consistency

Stronger forms of propagation can be defined with the notion of k-consistency. A CSP is k-consistent if, for any set of k − 1 variables and for any consistent assignment to those variables, a consistent value can always be assigned to any kth variable. (任意选取K-1个nodes，对于第K个node我们总能为他assign一个value使其consistent)。

Global Constraints

Remember that a global constraint is one involving an arbitrary number of variables (but not necessarily all variables). Global constraints occur frequently in real problems and can be handled by special-purpose algorithms that are more efficient than the general-purpose methods described so far.

6.3 Backtracking search for CSPs

For a real Sudoku example, we could apply a standard depth limited search. While it is quite obvious that the states are too large to build.(CSP的核心思路就是，在知道我们当前的state后，如当前的格子是1，我们不应该探索那些不可能的choice，比如同一列中没有必要再去探索1的可能性，因为根据数独的性质，每行的数字都是唯一的)

The term backtracking search is used for a DFS that chooses values for one variables at a time, and backtracks when a variable has no legal values left to assign.

Just like we improve uninformed search, we could apply some techniques to improve the performance of backtracking with following directions:

Which variable should be assigned next (SELECT-UNASSIGNED-VARIABLE), and in what order should its values be tried (ORDER-DOMAIN-VALUES)?
What inferences should be performed at each step in the search (INFERENCE)?
When the search arrives at an assignment that violates a constraint, can the search avoid repeating this failure?

Value and value ordering

Which variable should be assigned next?

One of the most important step in Backtracking problems is that select unassigned variables and assign it certain values. And the most important strategy is called the minimum remaining value (MRV) search. It is also called the most constraint variable heuristic. The general idea is to choose the variable with the fewest legal values.

Another idea is known as degree heuristic. It attempts to select the variable that is involved in the largest number of constraints on other unassigned values.(比如在地图染色问题中，因为对于每一个node的domain都是{红，黄，蓝}，所以很难根据MRV做出选择，而实际上，我们会优先选择那些会对其neighbor施加更多的约束的node，比如更靠中心的区域，或者与更多区域接邻的区域)

In what order should its values be tried?

Once a variable has been selected, wthe algorithm has to decide on the order in which to examine values (constraints propagation). And the lest-constraining-value heuristic can be effective in some cases. It prefers the value that rule out/eliminate the fewest choices for the neighboring variables in the constraint graph. (为未来留出更多的可能性)

Interleaving search and inference

So far we have seen how AC-3 and other algorithms can infer reductions in the domain of variables before we begin the search. But inference can be even more powerful in the course of a search: every time we make a choice of a value for a variable, we have a brand-new opportunity to infer new domain reductions on the neighboring variables.

One of the simplest forms of inference is called forward checking. Whenever a variable X is assigned, the forward-checking process establishes arc consistency for it: for each unassigned variable Y that is connected to X by a constraint, delete from Y ’s domain any value that is inconsistent with the value chosen for X.(这与AC-3不同，AC-3检测的是整个graph中的arc consistency，然而forward checking只对于当前选择的node X进行arc consistency检测，删除那些与X通过constraint相连的nodes Y种的illegal values)

For many problems the search will be more effective if we combine the MRV heuristic with forward checking.(而在上图中，显然没有使用MRV)

Although forward checking detects many inconsistencies, it does not detect all of them. The problem is that it makes the current variable arc-consistent, but doesn’t look ahead and make all the other variables arc-consistent. For example, consider the third row of above picture. It shows that when WA is red and Q is green , both NT and SA are forced to be blue. Forward checking does not look far enough ahead to notice that this is an inconsistency: NT and SA are adjacent and so cannot have the same value.

The algorithm called MAC (for Maintaining Arc Consistency (MAC)) detects this inconsistency. After a variable Xi is assigned a value, the INFERENCE procedure calls AC-3, but instead of a queue of all arcs in the CSP, we start with only the arcs (Xj,Xi) for all Xj that are unassigned variables that are neighbors of Xi. From there, AC-3 does constraint propagation in the usual way, and if any variable has its domain reduced to the empty set, the call to AC-3 fails and we know to backtrack immediately.

Intelligent Backtracking: Looking backward

The BACKTRACKING-SEARCH algorithm in above picture has a very simple policy for what to do when a branch of the search fails: back up to the preceding variable and try a different value for it. This is called chronological backtracking because the most recent decision point is revisited. In this subsection, we consider better possibilities.

A more intelligent approach to backtracking is to backtrack to a variable that might fix the problem—a variable that was responsible for making one of the possible values of SA impossible. To do this, we will keep track of a set of assignments that are in conflict with some value for SA. The set (in this case {Q=red,NSW =green,V =blue,}), is called the conflict set for SA. The backjumping method backtracks to the most recent assignment in the conflict set; in this case, backjumping would jump over Tasmania and try a new value for V . This method is easily implemented by a modification to BACKTRACK such that it accumulates the conflict set while checking for a legal value to assign. If no legal value is found, the algorithm should return the most recent element of the conflict set along with the failure indicator. (正如同它的名字描述的那样，intelligent backtracking是一种基于backtracking更加智能的方法，在普通的back tracking中，我们知道每当侦测到一个failure，我们backtrack回到上一层，但是这样往往也不能够得到正确的solution，而intelligent backtracking是一种“更聪明的”backtracking方法，它能够backtrack到导致了当前矛盾的value assignment，然后重新search)

Chapter VII: LOGICAL AGENTS

How the intelligence of humans is achieved—not by purely reflex mechanisms but by processes of reasoning that operate on internal representations of knowledge. In AI, this approach to intelligence is embodied in knowledge-based agents.

7.1 Knowledge-based agents

The central component of a knowledge-based agent is its knowledge base, or KB. A knowledge base is a set of sentences. Each sentence is expressed in a language called a knowledge representation language and represents some assertion about the world. Sometimes we dignify a sentence with the name axiom, when the sentence is taken as given without being derived from other sentences.

There must be a way to add new sentences to the knowledge base and a way to query what is known. The standard names for these operations are TELL and ASK, respectively. Both operations may involve inference—that is, deriving new sentences from old. Inference must obey the requirement that when one ASKs a question of the knowledge base, the answer should follow from what has been told (or TELLed) to the knowledge base previously.

Like every agents, a KB agents takes a percept as input and returns an action. The agent maintains a knowledge base KB, which is known as knowledge base.

Each time the agent is called, it does 3 things:

it tells the KB what it perceives.
it asks the knowledge base what actions it should take
it tells the KB which action has been chosen and executes the action.

7.3 Logic

syntax:语言必须要遵守的规则

semantics:语句的实际含义(defines the truth of each sentence in the world)

model: Informally, we may think of an example, having x men and y women sitting at a table playing bridge, and the sentence x + y = 4 is true when there are four people in total. Formally, the possible models are just all possible assignments of real numbers to the variables x and y. Each such assignment fixes the truth of any sentence of arithmetic whose variables are x and y. If a sentence α is true in model m, we say that m satisfies α or m is a model of α. We use the notation M(α) to mean the set of all models of α.

logical entailment [α |= β]: α |= β if and only if, in every model in which α is true, β is also true. (α |= β iff M ( α ) ⊆ M ( β ) .)

entailment vs inference:In understanding entailment and inference, it might help to think of the set of all consequences of KB as a haystack and of α as a needle. Entailment is like the needle being in the haystack; inference is like finding it.

Logical inference

Model checking

An inference algorithm that derives only entailed sentences is called sound or truth-preserving. Soundness is a highly desirable property. An unsound inference procedure essentially makes things up as it goes along—it announces the discovery of nonexistent needles.

The property of completeness is also desirable: an inference algorithm is complete if it can derive any sentence that is entailed. For real haystacks, which are finite in extent, it seems obvious that a systematic examination can always decide whether the needle is in the haystack.

7.4 Propositional Logic: A very simple logic

Syntax

The syntax of propositional logic defines the allowable sentences. The atomic sentences consist of a single proposition symbol. Each symbol stands for a propositional that can be true or false.

Complex sentences are constructed from simpler sentences, by parentheses and logical connectives(¬,∧,∨,⇒,⇔).

Semantics

7.5 Propositional Theorem Proving

So far, we have shown how to determine entailment by model checking: enumerating models and showing that the sentence must hold in all models. In this section, we show how entailment can be done by theorem proving: applying rules of inference directly to the sentences in our knowledge base to construct a proof of the desired sentence without consulting models.

Logical equivalence: two sentences α and β are logically equivalent if they are true in the same set of models. [α≡β iff α|=β and β|=α]

Validity: A sentence is valid if it is true in all models. For example, the sentence P ∨ ¬P is valid.

For any sentences α and β, α |= β if and only if the sentence (α ⇒ β) is valid.

Satisfiability: A sentence is satisfiable if it is true in, or satisfied by some model.

Inference and proof

Modus Ponens: α ⇒ β, α then β is true

Proof by resolution

The current section introduces a single inference rule, resolution, that yields a complete inference algorithm when coupled with any complete search algorithm.

Conjunctive normal form: a set of sentences connected by AND

to show that KB |= α, we show that (KB ∧ ¬α) is unsatisfiable.

Forward&Backward chaining

The forward-chaining algorithm determines if a single proposition symbol q is entailed by a knowledge base of definite clauses. It begins from known facts (positive literals) in the knowledge base. If all the premises of an implication are known, then its conclusion is added to the set of known facts. For example, if L1,1 and Breeze are known and (L1,1 ∧ Breeze) ⇒ B1,1 is in the knowledge base, then B1,1 can be added.

It is easy to see that forward chaining is sound: every inference is essentially an application of Modus Ponens. Forward chaining is also complete: every entailed atomic sentence will be derived. Forward chaining is an example of the general concept of data-driven reasoning—that is, reasoning in which the focus of attention starts with the known data.

The backward-chaining algorithm, as its name suggests, works backward from the query. If the query q is known to be true, then no work is needed. Otherwise, the algorithm finds those implications in the knowledge base whose conclusion is q. Backward chaining is a form of goal-directed reasoning. It is useful for answering specific questions such as “What shall I do now?” and “Where are my keys?”.

Chapter X: CLASSICAL PLANNING

10.1 Definition of classical planning

The problem-solving agent of Chapter 3 can find sequences of actions that result in a goal state. But it deals with atomic representations of states and thus needs good domain-specific heuristics to perform well.

The hybrid propositional logical agent of Chapter 7 can find plans without domain-specific heuristics because it uses domain-independent heuristics based on the logical structure of the problem.

In response to this, planning researchers have settled on a factored representation — one in which a state of the world is represented by a collection of variables. We use a language called PDDL, the Planning Domain Definition Language.

States: Each state is represented as a conjunction of fluents that are ground, functionless atoms. For example, a state in a package delivery problem might be At(Truck1,Melbourne) ∧ At(Truck2,Sydney). Database semantics is used: the closed-world assumption means that any fluents that are not mentioned are false, and the unique names assumption means that Truck1 and Truck2 are distinct.

Closed world/ Open world assumption

Actions: they are described by a set of action schema, for each action, it corresponds to one action schema, consists of Precondition and effect. For example, here is an example of flying a plane from one location to another:

We say that action a is applicable in state s if the preconditions are satisfied by s.

The result of executing action a in state s is defined as a state s which is represented by the set of fluents formed by starting with s, removing the fluents that appear as negative literals in the action’s effects (what we call the delete list or DEL(a)), and adding the fluents that are positive literals in the action’s effects (what we call the add list or ADD(a)):

RESULT(s, a) = (s − DEL(a)) ∪ ADD(a)

For example, with the action Fly(F22,SFO,JFK), we would remove At(F22,SFO) and add At(F22,JFK). (这表示一个动作“F22从SFO飞到了JFK”，这个动作必须在其对应的PRECONDITION都被满足的时候才是applicable的，也就是说在动作开始前需要满足“At(F22,SFO),Plane(F22),Airport(SFO),Airport(JFK)”，这意味着诸如Fly(Ironman, SFO, JFK)或者Fly(F22, Moon, Sun)的action是不行的，同时在动作结束后，F22已经抵达了JFK，我们需要修改状态，将At(F22,SFO)去掉，并且将At(F22,JFK)添加进来。)

Initial State: the initial state is a conjunction of ground atoms.

10.2 Algorithms for Planning as State-Space Search

Forward (progression) state-space search

Forward search is prone to exploring irrelevant actions. Consider the noble task of buying a copy of AI: A Modern Approach from an online bookseller. Suppose there is an action schema Buy(isbn) with effect Own(isbn). ISBNs are 10 digits, so this action schema represents 10 billion ground actions. An uninformed forward-search algorithm would have to start enumerating these 10 billion actions to find the one action that leads to the goal.(对于诸如如何将大象塞进冰箱里这种问题，我们也是很难去解决的，因为在每一个state都有太多的action需要去进行search，过于庞大的state space导致了这种search方法在大多数时候表现的并不好。)

Backward (regression) relevant-states search

In regression search we start at the goal and apply the actions backward until we find a sequence of steps that reaches the initial state. It is called relevant-states search because we only consider actions that are relevant to the goal (or current state).(这种search方法也有其局限性，那就是我们在知道goal state的情况下，需要知道goal state g的前一个state g’是什么样子的，从而一步一步backward search找到那条从goal state朝向initial state的path，但是比如8-queens问题，我们并不知道goal state的前一个state是什么样子的，所以用这种方法在该类问题上是很困难的。)

Heuristics for planning

Ignore preconditions heuristic: drops all the preconditions from actions
Ignore delete lists heuristic: create a relaxed version of original problem (goal state包含其他的一些axioms)
Decomposition: create a set of independent subgoals

10.4 Other Classical Planning Approaches

Currently we have several solutions for classical planning, like translating into a Boolean satisfiability problem，or into CSPs, etc. And actually in the real practice we have more choices.

APPENDIX

Monte Carlo Tree Search

[https://www.zhihu.com/question/41176911]