Cluster computing

Sunday, March 30, 2014

In today's post, we bring up a C# application with the SDK we talked about but not the service broker example I gave earlier. we focus on the SDK itself, specifically on the inputs to write an event. Did you know that you can color code the event display with how you specify the events. I didn't until recently. I will go over some of these features and explore them with the SDK.
But first I want to mention that most of Splunks features are also available directly from the management service REST APIs. For example, we can give
curl -k -u admin:changeme https://localhost:8089/services/apps/local/myapp \ -d description="My application"
Splunk SDKs provide a layer over the REST APIs so we can directly instantiate the objects and prepare them. Knowing the REST APIs is still helpful in cases where the APIs are not available for a feature in a particular SDK or when they are marshalling parameters to the REST APIs and you are interested in finding out what is set and what isn't. By the way there is an SDK for most of the languages for web development but that also means that not all SDKs may have the same capabilities. The REST APIs provide a uniform consistent layer for all the features and are also documented similar to the SDKs. There's a reference PDF file available that describes most of the endpoints.
The endpoints are categorized as data, admin etc and we will see a little more on these shortly but I want to bring up that the REST APIs are easy to use with tools like curl and Fiddler.
Splunk's REST APIs are available for access control, applications, clusters, configurations, deployment, Indexes, inputs, knowledge, license and output. Indexes, input and output are something we will cover first.
We will also look at Search REST API. Searches can be adhoc or saved. Saved searches can be scheduled for runs. There's also auto-complete functionality as well.
I will return to this post shortly. I want to cover distributed search.

Friday, March 28, 2014

Today I will continue my discussion on thread level software instrumentation
On window s there thread local storage available. Here we create a buffer and keep all our stack based data we collect. To collect CPU ticks, we read the system clock in the constructor and again in the destructor of an object we put on the stack. The diff is the time spent executing code and can be cumulated with the running total on the thread to see how much time each takes. Note that its ideal to keep this counter on the scheduler. since threads could attach and detach from the currently executing run.
Similarly it is easier to have a base method for allocations and deletions and they can be take the stack trace when the corresponding system calls are made. This then gives us a histogram of stack traces for allocations and we can use that to find the biggest users of memory or the most frequent users.
Having a thread local storage is desirable on any platform especially given the functionality available via language specific provisions such as C++ 11 that introduces thread_local. Creating a pointer sized variable enables allocation of an arbitrary buffer reserved for this thread which allows thread specific instrumentation that is not normally covered from routines that place a variable on the stack. Undeniably the latter is required too. For example, crash logs are written this way. But here we specify what else we can add to crash log notifications via thread local storage.
The first thing we said is historical stack traces that are not captured by as-of-this-point-of-time stack trace by stack local variables.
When do we take this stack trace, for every exception encountered. Even if not all of them are crash notations.The suggestion is that by keeping track of the last few exceptions at any given time, when we get a dump, it gives us another data point of interest.
We add an exception handler to the chain of handlers that each thread maintains. The first of these we will customize to take a stack trace, put it on our thread local ring buffer, and return 'keep looking' so that other handlers including the most recent frame's own handler to get a chance to handle the exception.
Why is this better than logging ? It's complimentary to logging. Dumps and logs are both relevant for diagnosis and often every bit of information helps.
One consideration for the ring buffer is that it could be implemented as a skip list for fast traversal.
Creating a ring buffer on each thread is one part. It could be used for different purposes.
using it for storing exceptions encountered us one such purpose. Usually the operating systems meet this purpose and using it for applications specific tasks is preferable. One such purpose could be to list the last user query or commands
For now we will just focus on creating a ring buffer for every thread on all platforms.
We do this whenever we start a thread or have a callback for initialization routines. There should be fewer of those through out the source code.

After discussing about MSMQ and how to monitor messages (not journaling) on the queues without requiring any additional switches or trigger services on a production instance, I stumbled on and found the tracing for MSMQ under logman query providers and validated that we don't seem to have trace where each message arrival is traced. Hence I found another argument point in favor of what I would like to build for my application.
In any case, here's some more context on how I stumbled on this finding that I want to write an MSMQ monitor for my application.
First, I'm increasingly studying high memory usage in select scenarios with my application and in this context I wanted to find out non-invasive options for production systems of customers. So I came across our windows tools namely logman/tracelog/xperf:
We can turn on private heap tracing on windows either with logman (an out-of-box tool : we just type 'logman query providers' http://msdn.microsoft.com/en-us/library/ms751538(v=vs.110).aspx) or xperf from Windows Performance Analysis Center (http://msdn.microsoft.com/en-us/performance/default.aspx)
tracelog -start heapTrace -f c:\temp\heapTrace.etl -heap -pids 1 <pid_of_splunkd>
tracelog -stop heapTrace
OR xperf -start HeapSession -heap -PidNewProcess "myapp.exe" -BufferSize 1024 -MinBuffers 128 -MaxBuffers 128 -stackwalk HeapAlloc+HeapRealloc
xperf -stop HeapSession -stop -d c:\temp\xperf_heap.etl
we could also explore stackwalking with xperf -help stackwalk and xperfview -> Get Stack counts by type
Xperf allows us to view heap allocation information in addition to kernel info such as CPU and with stacktraces at the time of allocation! More here : http://msdn.microsoft.com/en-us/library/windows/hardware/hh162920.aspx
And if tracing doesn't work we use Procdump to take a fulldump of our process on production systems without stopping the process http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx
If we had the luxury we could use gflags, appverifier, pageheap and umdh

But anyways getting back to my discussion on my application. Here's what I wanted to try writing next after MSMQ monitor:
These will be thread based monitors for resource usages and circular ring buffer of stacktrace collections
A circular ring buffer of stack trace collections per scheduler or node is very helpful in diagnosing a variety of information. In terms of memory used, we use only pointer sized bytes for each frame ptr and ten of those for each backtrace and say ten to fifty of those buffers in the circular ring buffer should not add upto much.
The stack trace capture to circular ring buffer could be triggered by anything - exceptions, memory allocation and deallocation etc.
In fact the cpu usages are also easy to watch, we read cputicks and += the delta that we are on and off the stack.

Thursday, March 27, 2014

Today I will cover a software component I'm looking into that consolidates the configuration and administration of the dynamic atom feeds from different model generators that I mentioned in the previous to previous post. When a user makes a REST request to the server, the server does some checks to determine the response to return to the user. In our case, we are looking at atom feeds from different REST handlers all of which conform to this security and configuration mechanism. We will study this component in some depth and will try to come up with potential improvements.
First this component is called the AdminManager. This component is responsible for managing the server side model generation that is made available for consumption by various interfaces such as the UI and the command line Interface. In a way, this component automates access and configuration of the models that the server generates.
Let us take a closer look at this component. There's several things to mention.
We begin with access control. This component lists some common actions for all the handlers to respond to different REST endpoints. These are listed as create, edit, remove, list, members, reload, new, move, enable, disable and doc. More than one action can be checked together with a bit mask. The access control can be checked with capabilities. There are two capabilities specified. These are read and write. Actions translate to capabilities that could be demanded or set by handlers. Handlers can be enabled / disabled. Results from the handlers can be filtered. Actions and capabilities/permissions determine the type of response. We don't block execution, we just return error or results.
Handlers all derive from a base class. The base class has methods for the actions involved. Each derived handler can override these methods corresponding to each action to produce the result they want. If the overrides are not specified, they default to error. There are flags for some of the actions to see if these actions are possible such as if you want to move an item. There is also a custom action possible that handlers can use to extend there functionality. The actions can return results in JSON or XML format. Applications and Users alike that can utilize these REST handlers can be specified with wild cards. It can also take a list of features. These features could be considered a pre-requisite for the handlers to execute.
Access control itself involves several checks. We discuss these now.
First supported actions are filtered. Whether a user can read or write is determined based on the capability for the corresponding read or write permission. The user is checked for the permission involved. This is not necessarily an RBAC access control but merely a check against the privileges for the user. The system user for instance has all the privileges.The capabilities a user has is determined from the license manager for instance or by the explicitly added capabilities. This is also necessary from audit purposes. If the auditing is not necessary, then the check against capabilities could always return true. On the other hand if we are to audit all denies, we should leave a trail in the audit log.
Actions and capabilities are mapped and determine the access for the most part.
Next we talk about ACLs and how they are set and determined. Setting ACLs is itself a custom action and requires write capability. Owner, sharing and permissions are specified when setting the ACL. The rule of thumb here has been to have just enough checks in place and nothing too complex or flexible to be called an overkill. That shouldn't limit us from making it flexible for extensions in the future. Today we require owner, sharing and permissions for a resource, but tomorrow we could have not just read and write permissions but permissions such as Full control, modify, read and execute, read and write, special permissions etc. Today we have explicit privileges, tomorrow they could be chained or inherited. Today we have auditing for users, tomorrow we could differentiate, users, groups, apps, service accounts etc.Today we have a builtin user for system, tomorrow we could have many different builtin users and groups. Today when we mean share we share all permissions, tomorrow we could set the permission levels to whom we share with.
Now coming back to determines the lines of authentication, authorization and auditing in the calls for processing REST requests.
First is the initialization step. The capabilities and evaluation rules are added from a configuration file. In this configuration file, for a particular endpoint, the capabilites and evaluation rules will be mentioned when the handler was first registered with this mechanism.
The capabilites could be for example read, write, search. etc. The rule could be the endpoint qualifier to match.
Note that this configuration file is a great start for re-organization. When there is enough handlers registered to see the kind of capabilities, matches, actions, customizations specified, the new architecture could be used to make it easier for the components to specify what they want.
Further more, most of the configuration variations will come in handy to provision the central governance that this handler meets. If there are enough handlers specified in the AdminManager configuration file, this is a low hanging fruit for a rewrite.
At this point we are able to filter the supported actions based on the configuration information and the passed in user context and the look up of the associated user's capabilites. This filtering of supported actions is where the authorization line is crossed.
After the initialization step and depending on the authorization, we can ACL resources as we have described earlier.
Next we evaluate the custom actions. Actions can have arguments. These can be used together with the actions to differentiate the kind of response required.
Next we validate the arguments, their presence and values as specified in the configuration.
At this point the authorization is complete and we are ready to execute the handler.

We look at Warshall algorithm for transitive closure. In a directed or undirected graph, we ask if there is a path from node I to mode j. That is transitive closure. As we may recall from previous posts, we represent a graph with an adjacency matrix. For a graph with N nodes, an adjacency matrix answers the question whether there is an edge between node I and J. For both directed and undirected graphs a value of 0 in the adjacency matrix indicates there is no edge. A transitive closure finds whether there is a path between node I and j. Here we could apply a brute force method of enumerating N ^2 pairs of vertices and try to find a path for each pair. Fortunately, Warshall algorithm answers that better by using dynamic programming particularly the step of saving the previously found paths so that we can efficiently compute the transitive closure. Here for node I and J we ask if there is a path through node 1. Next we ask if there's a path through node 1 and 2. Next we ask if there's a path through node 1,2 and 3 and so on.
Warshalls' algorithm has applications in network routing and reachability problems. Its also used to eliminate states.
The algorithm is elaborated as follows:
Algorithm Warshall(A[1..n, 1..n])
{
   R-0 is initialized with the adjacency matrix ; // R-0 is the node to itself.
   for k = 1 to n do
for i = 1 to n do
for j = 1 to n do
          R-k[i, j] = R-(k-1) [i, j] OR
                             (R-(k-1) [i, k] AND R-(k-1) [k,j])
                             // path from i to k thru 1,2 .. k -1
                            // and path from k to j thru 1,2, .. k-1
return R-(n)
}

Wednesday, March 26, 2014

I came across an interesting design pattern for web applications that I hadn't seen before. Basically the model and the view are decoupled. The model can vary for the same view so the view is reused and the models are different. The routes and the models are registered together so they vary depending on the purpose and these are maintained in a configuration file but the same view can load any of these models.
There is a single dispatcher that take care of finding the handler for the model. The handler has methods to generate the content or the model. This is then converted to a json document that is then rendered by the web interface. The web interface is written in javascript so all the rendering of the view and the highlighting is taken care of on the client side.
Instead of the conventional model view controller with explicit routes, this is a fairly short way to get server data to client without any sacrifice of form or functionality of web interface. And its extensible too given the type of models supported. Data is packaged in an Atom entry and there's one entry for each configuration item.
In terms of operations, the get and post are handled as actions requiring different capabilities. These are managed by an access control mechanism called the admin manager
The access control is exercised based on user and resources. The web interface makes purely REST based calls. In this way the interface is a separate layer and the server does most or all of the processing

Dijkstra's algorithm for single source shortest path is a greedy algorithm. Here we find the shortest path from a single source in a weighted say directed edge graph where the weights are positive. In the undirected graph, there are two directed edges replaced by a single edge. There could be multiple paths with different weights but the shortest path is one with the smallest weight. To find the shortest oath, we relax the edges that is we take the minimum cost of the edge joining two vertices u and v and set that ad the edge weight. If the shortest path consists of edges v1, v2, v3....vn, any contiguous subset if these is also a shortest path. We now look at the possibility of a shorter path.
Each new shortest path extends an earlier one. So we build the shortest path in the ascending order of costs. When we add a vertex, we see the cost. If there are no edges to that vertex the cost is infinity.
In the algorithm We first initialize the costs. When we pick a source vertex we do tge following for each of the other vertices.
We choise a vertex u that is not already in the set and has a minimum cost. Then we mark it as included in the shortest path. For each of the neighbors of u, we determine the cost from the set s and with the edge u to w and then relax the edge by picking the minimum costs between vertices.
In this way we include all vertices one by one in ascending order of weights