We will resume our book review from previous post shortly but first lets look into
Zero Sum games as described in Dasgupta et al 2006.
We represent various conflict situation in life by matrix
games. The schoolyard game rock-paper-scissors is specified by the payoff
matrix as illustrated here. There are two players called Row and Column and they each pick a move from
{r, p, s}. They then lookup the matrix entry corresponding to their moves, and
Column pays row this amount.
For eg:
G = Column
r
|
p
|
s
|
|
r
|
0
|
-1
|
1
|
p
|
1
|
0
|
-1
|
s
|
-1
|
1
|
0
|
Row
Let the two play the game repeatedly. If Row always make the
same move, Column will quickly catch on because it will play the countermove,
winning every time. So Row should mix things up. It can model by allowing row
to have a mixed strategy, in which on each turn she plays r with probability
x1, p with x2 and s with x3. This strategy is represented by a vector x and
similarly Column’s strategy is represented by a vector y.On any given round, there is an xi, yj chance that Row and Column will play ith and jth moves. The expected average payoff is Sigma(I,j) Gij.Prob(xi,yj) which is equal to
Sum-over-I-and-j (Gij.xi.yj)
Since its Rows gain and Columns loss, Row wants to maximize
and Column wants to minimize it. If Row plays the completely random strategy where xi is each 1/3 and if column plays r, then the expected payoff is the weighted average 1/3*0+1/3*(1)+1/3(-1) = 0 Thus by playing the completely random strategy, row forces an expected payoff of zero, no matter what column does. This means that the column cannot get a negative payoff. Symmetrically, column can play the same and force Row to have a zero payoff.
Thinking about this differently, lets review the following two cases:
First, row announces her strategy, and then Column pick his.
First, column announces his strategy, and then Row chooses hers.
Now we would expect the player going next to fully exploit the known strategy of the first to get the desired payoff. But this is amazingly not the case. If both play optimally, then it doesn’t hurt a player to announce his or her strategy in advance. This remarkable property is a consequence of – and in fact equivalent to – linear programming duality.
(The duality theorem is mentioned this way: If a linear program has a bounded optimum, then so does its dual, and the two optimum values coincide
To visualize duality, we imagine a string connecting the nodes that is stretched taut. In other words, the dual problem is to stretch s and t as far apart as possible, subject to the constraint that the endpoint of any edge {u,v} are separated by a distance of at most Wuv)
Imagine a presidential election scenario, in which there are two candidates for office, and the moves they make correspond to campaign issues on which they can focus (the initials stand for economy, society, morality, and tax cut).
The payoff entries are millions of votes lost by Column.
G m t
e 3 -1
s -2 1
If Row announces that she will play the mixed strategy of x = (½, ½ ), what should the column do ? Since m incurs loss of ½ and t incurs loss of 0, Column should have a pure strategy of (0,1)
This can be generalized to Pick (x1, x2) that maximizes min (3x1-2x2, -x1 + x2) which is the payoff from Column's best response to x.
This choice of xi's gives Row the best possible guarantee about her expected payoff. And we
will now see that it can be found by an LP! The main trick is to notice that for fixed x1 and x2
the following are equivalent:
z = min(3x1 - 2x2, -x1 + x2)
max z
z <= 3x1 -2x2
z <= -x1 + x2
and row needs to choose x1 and x2 to maximize this z
max z
-3x1 + 2x2 + z < = 0
x1 -x2 + z <= 0
3 / 3
x1 + x2 = 1
x1,x2 >= 0
Symmetrically we can compose the linear equations in terms of y
and the critical observation is that these two LPs are dual to each other.
#codingexercise
Double
GetProductNthEveryThirdPower (Double [] A,Double p)
{
If ( A== null) return 0;
Return A.GetProductNthEveryThirdPower(p);
}
No comments:
Post a Comment