Saturday, July 19, 2014

Today we will look at the Markov chain a little bit more to answer the following questions:
Will the random walk ever return to where it started from ?
If yes, how long will it take ?
If not, where will it go.
If we take the probability that the walk (started at 0) attains value 0 in n steps as u(n) for even n, we now want to find the probability f(n) that the walk will return to 0 for the first time in n steps.
Let Xn, n >= 0 be Markov with transition probabilities pij. Let St (t >= 0) be independent from Xn.  Sn can be considered a selection.
The theorem for the calculation of f(n) is stated this way:
Let n be even and u(n) = P0 (Sn = 0). P0 denotes the return to zero.
Then f(n) = P0( S1 != 0,  ... Sn-1 != 0, Sn = 0) = u(n)/ n - 1
And here's the proof:
Since the random walk cannot change sign before becoming zero,
f(n) = P0(S1 > 0,  ... Sn-1 > 0, Sn = 0) + P0( S1 < 0, ... Sn-1 < 0, Sn = 0)
        which comprises of two equal terms.
Now,
        P0 (Sn-1 > 0, Sn = 0 ) = P0 (Sn = 0) P(Sn -1 > 0 | Sn = 0) which is the use of conditional probability.
we know that the first term is u(n) and given that the Sn-1 = 1 or -1 with equal probability. So the last term is 1/2. Sn =  0. By the Markov property at time n-1 we can omit the last event.
So f(n) can be rewritten as 2 u(n) 1/2 P0(S1 >  0, S2 > 0 ... Sn-2 > 0 | Sn-1 = 1)
and the last term by ballot theorem is 1/(n-1) which proves that f(n) = u(n)/n-1


To complete the post on Random walk for developers as discussed in the previous two, we briefly summarize the goal and the process.
Random walk is a special case of a process called Markov Chain where the future events are conditionally dependent on the present and not the past events. It is based on random numbers that assign a real number to the events based on probabilities. The probabilities to move between a finite set of states (sometimes two - forward and backward as in the case of a drunkard walking in a straight line) is called transition probabilities. Random walk leads to diffusion.
The iterations in a random walk are done based on the equation px,y = px+z,y+z for any translation z where px,y is the transition probability in space S.
Random walks possesses certain other properties
- time homogeneity
 - space homogeneity
and sometime skip-free transitions which we have already discussed
The transition probabilities are the main thing to be worked out in a random walk.
Usually this is expressed in terms of ranges such as
p(x) = pj if xj = ej  where j = 1 to d in a d-dimensional vector defined space.
       = qj if xj = -ej
       = 0
We have since how this simplifies to a forward and backward motion in  the case of a drunkard's linear walk.
The walk is just iterative calculating a new position based on the outcome of transition probabilites.
The walk itself may be performed k times to average out the findings (hitting time) from each walk.
Each step traverses the bipartite graph.
http://1drv.ms/1nf79KL

Friday, July 18, 2014

Continuing from the previous post ...

The number of steps taken between any two nodes is called hitting time. The hitting time between an unclassified and domain term and can be averaged over all the walks that connect these two nodes.
Since the walks are selected based on transition probabilities, the most probable paths are selected first. The same pair of nodes can be connected with many paths and the same unclassified term can be connected to many other domain terms.
The contextual similarity of a classification pair n,m can then be described as the relative frequency of the hitting of those two nodes and other normalized nodes linked to that start node.
This is calculated as Contextual Similarity L(n,m) = H(n, m) / Sigma-i(H(n,m)).
We can also look at Stability of a random walk or a Markov chain in general.
Stability refers to the convergence of probabilities as the series becomes infinite. When the recurrence is positive and the series is not reducible, the average (called Cesaro average) 1/n Sum (PXk = x) converges to pi (x), as n -> infinity.
Stability is interesting because a Markov chain is a simple model of a stochastic dynamical system that remains within a small region. For example, when a pendulum swings, it finally comes to a stable position with dissipation of energy. Even if there were no friction, it would be deemed stable because it cannot swing too far away.
What the stability tells us is that when a Markov chain has certain properties ( irreducibility, positive recurrence, unique and stationary distribution pi) , the n-step transition matrix converges to a matrix with rows all equal to pi. This is called the fundamental stability theorem.
Stability works based on coupling.
Coupling refers to the various methods for constructing a combination of the two random variables. If the random variables are independent, then they can be combined in a straightforward manner taking co-occurrence. Coupling helps us define a third Markov chain Z from an arbitrary distribution X and  a stationary distribution Y where Z is X prior to the meeting with Y and Z is Y after the meeting point. This then shows that the transition matrix converges to all rows as pi.
By choosing the most probable paths, the random walk follows the preferred state transitions. Thus while not all paths may end within the predetermined steps, we know that when it does, it would have chosen the higher transition probabilities.

A simple random walk has equal probabilities to move from one state to another.
To implement a simple random walk in (x,y) dimension, we can have a naive one like this:
for ( i = 1; i < 2^n; i++ )
     if move in x-did
         x[i] = x[i-1] + sample(step, 1);
         y[i] = y[i-1];
     else
         x[i] = x[i-1];
         y[i] = y[i-1] + sample(step,1);
     print(x,y);

We can have the following metrics in a simple random walk:
first return time to zero
total number of visits to a state.
For all the random walks, we can have metrics like
total number of paths between two nodes
total number of paths in which a node appears


Thursday, July 17, 2014

We mentioned that Random walk is a special case of a Markov chain. To understand Markov chain, we take the example of a mouse in a cage with two cells - the first has fresh cheese and the second has stinky cheese. If at time n, the mouse is in cell 1, then at time n+1, the mouse is in either cell1 or cell 2. Let us say the probability to move from cell 1 to cell 2 is alpha and the reverse is beta.  In this case, alpha is much smaller than beta.  Then the transition diagram shows transitions as follows:
1->1 with a  probability of 1- alpha
1 -> 2 with a probability of alpha
2 -> 1 with a probability of beta
and 2->2 with a probability of 1 - beta.
The mouse makes the choice of staying or moving with equal probability, the time it takes for the mouse to make the first move from 1 to 2 is  the mean of the binomial distribution = 1/alpha.
Thus the moves are described in terms of random variables Xi. Let T denote the subset of integers in a  sequence of random variables {Xn : n belongs to T}.  The Markov property is then defined as follows:
any n belonging to T, the future process Xm  (m > n) is independent of past process Xm (m < n) and conditionally dependent on Xn.
In the case of the mouse, the probability of the move is regardless of where the mouse was at earlier times i.e the future is independent of the past given the present.
This dependence of the future on the present is easy to generalize with random numbers instead of deterministic objects. The states are countable and can be denoted by state space S and since the moves are between states, Xn is called a Markov chain.
The Markov property talks about all before and after processes so (Xn+1 .... Xn+m) and (X0, ... Xn-1) are independent conditionally on Xn.
The walk based on a Markov chain is thus dependent on transition probabilities. If the transition probabilities are defined, then the walk is just iterative.
The joint probability distribution can be expressed in terms of two functions -
one for the initial states for each i in the state space S
and the other for the transitions pi,j(n, n+1) = P(Xn+1 = j | Xn = i), i, j belongs to S, n >= 0
We can generalize the transition from one step to more than one steps with
 pi,j(m,n) = P(Xn = j | Xm = i).
We call Markov chain- time homogeneous when the single step transition probability does not depend on n
In the case of the mouse, the probability of move is independent of where it was at earlier times. Each single step transition probability, denoted by just P, is the same no matter how many times it is repeated and the transition from P(m,n) = P * P * P ... n-m times.
In a random walk, we add a few more properties we discussed earlier.
Now let us consider a random walk on the bipartite graph starting at an unclassified term and arriving at another semantically equivalent domain term. The walker starts from any node and then moves to any other node with transition probability Pij.
Pij is the normalized probability of the co-occurrence counts of the terms and the corresponding context.The transition probability between any two nodes i,j is defined as Pi,j = Wi,j / Sum(Wik) for all k.
The terms are either unclassified or domain and cannot be connected with each other except through context.
Each walk uses the transition probabilities for the move.
Each walk starts at some unclassified term and ends at a domain term or exceeds the maximum number of moves without finishing.
Several walks upto say K times can be initiated.

Wednesday, July 16, 2014

Let us look at performing a random walk between terms and context.
 We discussed in the previous post that terms and context are different for machine data than for human text. The context in the case of machine data is the metadata fields. A random walk connects the terms and the context in a bipartite graph.
A random walk is a special kind of Markov chain. The latter has the property that it has time homogeneous transition probabilities. A random walk has an additional property that it has space homogeneous transition probabilities. This calls for a structure where px,y = Px+z,y+z for any translation z. A random walk is given by px,y = p(y - x)
In a undirected connected graph, the degree of each vertex $i = number of neighbors of i
For the same graph, a random walk is defined with following transition probabilities
 pij = { 1 / degree(i), if j is a neighbor of i
         {  0 otherwise
The stationary distribution of RWG is given by pi(i) = C degree(i) where C is a constant.
This comes from the fact that pi(i) pij = pi(j) pji  since both are equal to C. On each hand of the equation the first term is the stationery distribution and the second term is the inverse of degree(i).
This concludes that the stationary distribution is proportional to the degree(i). This is easy to work out with a sample connected graph of five vertices arranged inside a square. Moreover, the equations suggest that RWG is a reversible chain. The constant C is found by normalization.  Given that the degress add up to twice the number of edges, we have C = 1/(2 |E| )

There are several ways to express transition probabilities resulting in different RWGs
p(x) = C2^ (-|x1| - ... - |xd|)

p(x) = pj if x = ej, j = 1, ... d
        = qj if x = -ej, j = 1, ... d
           0   otherwise
and all of the p(x) add upto 1.

if d = 1 then p(1) = p, p(-1) = q and 0 otherwise where p + q = 1 . This is the drunkards walk which is homogeneous in time and space and given that it must pass through all intermediate nodes because each hop is 1, the transitions are also skip free.

Machine data has several traits that are different from texts and discourses. When we search machine data we see fields such as messages, timestamps, applications, source, sourcetype, host, sizes, counts, protocols and protocol data formats etc. In other words, there is a set of domain terms that is finite and covers a large portion of the logs seen.
There is a high number of repetitions of these domain terms due to the fact that machines and repeaters often spit out the same set of messages at different times. At the same time, there are messages that have a high degree of variation in content such as web traffic capture. But these two different types are well delineated in terms of flows.
Machine data therefore can be separated into three different types:
Data that doesn't vary a lot in terms of content and terms
Data that varies a lot in terms of content only, metadata is still same
Data that mixes the two above
For the the first two cases, when they constitute a bulk of messages, we can treat them differently. The mixture of the two cases amounts to noise and can be ignored for the moment. We will treat this as any other arbitrary text later.
Predictive patterns in the data help with analysis so the first two case are interesting and lend themselves to field extraction, summarization etc.
Terms extracted from the data are already falling into a frequency table that is skewed as opposed to the concordance from human text.  So the selection of these terms is different from that of regular text.
Moreover context for the data can be described in terms of the metadata fields. These fields are predetermined or even specified by the forwarders of the data. The indexers that store the raw and searchable data have both data and metadata from a wide variety of sources.
Association between the terms and the context let us look for those that can be considered the semantics of the machine data. These terms are special in that they tag the data in a special way. Finding these terms is helpful in knowing what the machine data is about.
We can associate the terms and the context with bipartite graphs.
When we perform a random walk to connect the terms and the context, we are performing a special case of Markov chain which is iterative. In a random walk, we follow  a stochastic process with random variables X1, X2, ..., Xk such that X1 = 0 and X i+1 is a vertex chosen uniformly at random from the neighbors of Xi. The number pv,w,k(G) is the probability that a random walk of length k starting at v ends at w. If the edges are considered in terms of an ohm resistor, then the resistance between a point to infinity is finite when the graph is transient. We will  review Aldous and Fill book on random walks on graph next.

Tuesday, July 15, 2014

I will resume my previous post but I want to take a short break to discuss another software application. In some earlier posts I discussed a fiddler like application for mobile devices. In particular I was looking for a fiddler like application for its devices. I pointed out a proxy switching code for IoS available on the net. Changing the proxy helps with the packet capture.
When we talked about the viability of a fiddler like application for mobile devices, we wanted to to use it to test the APIs.
Before continuing any further, I have a work item to look at machine data semantics and I will return shortly.