Let us look at performing a random walk between terms and context.
We discussed in the previous post that terms and context are different for machine data than for human text. The context in the case of machine data is the metadata fields. A random walk connects the terms and the context in a bipartite graph.
A random walk is a special kind of Markov chain. The latter has the property that it has time homogeneous transition probabilities. A random walk has an additional property that it has space homogeneous transition probabilities. This calls for a structure where px,y = Px+z,y+z for any translation z. A random walk is given by px,y = p(y - x)
In a undirected connected graph, the degree of each vertex $i = number of neighbors of i
For the same graph, a random walk is defined with following transition probabilities
pij = { 1 / degree(i), if j is a neighbor of i
{ 0 otherwise
The stationary distribution of RWG is given by pi(i) = C degree(i) where C is a constant.
This comes from the fact that pi(i) pij = pi(j) pji since both are equal to C. On each hand of the equation the first term is the stationery distribution and the second term is the inverse of degree(i).
This concludes that the stationary distribution is proportional to the degree(i). This is easy to work out with a sample connected graph of five vertices arranged inside a square. Moreover, the equations suggest that RWG is a reversible chain. The constant C is found by normalization. Given that the degress add up to twice the number of edges, we have C = 1/(2 |E| )
There are several ways to express transition probabilities resulting in different RWGs
p(x) = C2^ (-|x1| - ... - |xd|)
p(x) = pj if x = ej, j = 1, ... d
= qj if x = -ej, j = 1, ... d
0 otherwise
and all of the p(x) add upto 1.
if d = 1 then p(1) = p, p(-1) = q and 0 otherwise where p + q = 1 . This is the drunkards walk which is homogeneous in time and space and given that it must pass through all intermediate nodes because each hop is 1, the transitions are also skip free.
We discussed in the previous post that terms and context are different for machine data than for human text. The context in the case of machine data is the metadata fields. A random walk connects the terms and the context in a bipartite graph.
A random walk is a special kind of Markov chain. The latter has the property that it has time homogeneous transition probabilities. A random walk has an additional property that it has space homogeneous transition probabilities. This calls for a structure where px,y = Px+z,y+z for any translation z. A random walk is given by px,y = p(y - x)
In a undirected connected graph, the degree of each vertex $i = number of neighbors of i
For the same graph, a random walk is defined with following transition probabilities
pij = { 1 / degree(i), if j is a neighbor of i
{ 0 otherwise
The stationary distribution of RWG is given by pi(i) = C degree(i) where C is a constant.
This comes from the fact that pi(i) pij = pi(j) pji since both are equal to C. On each hand of the equation the first term is the stationery distribution and the second term is the inverse of degree(i).
This concludes that the stationary distribution is proportional to the degree(i). This is easy to work out with a sample connected graph of five vertices arranged inside a square. Moreover, the equations suggest that RWG is a reversible chain. The constant C is found by normalization. Given that the degress add up to twice the number of edges, we have C = 1/(2 |E| )
There are several ways to express transition probabilities resulting in different RWGs
p(x) = C2^ (-|x1| - ... - |xd|)
p(x) = pj if x = ej, j = 1, ... d
= qj if x = -ej, j = 1, ... d
0 otherwise
and all of the p(x) add upto 1.
if d = 1 then p(1) = p, p(-1) = q and 0 otherwise where p + q = 1 . This is the drunkards walk which is homogeneous in time and space and given that it must pass through all intermediate nodes because each hop is 1, the transitions are also skip free.
No comments:
Post a Comment