Cluster computing

Saturday, March 1, 2014

In today's blog post we review CLI commands for Splunk. Again the command line option is complimentary to User Interface and works well for management operations and scriptability.
The command line option to enable Splunk to startup in debug mode is
splunk start splunkd --debug;
if we were to clean all data , we could say
splunk clean all
or splunk clean [option] where eventdata, globaldata, userdata or inputdata can be specified in option
In addition we can disable one or more components such as
splunk disable app, boot-start, deploy-client, deploy-server, dist-search, index, listen, local-index, perfmon, web-server, web-ssi.
and followed with a way to toggle them back on with
splunk enable app, boot-start, deploy-client, deploy-server, dist-search, index, listen, local-index, perfmon web-server, web-ssi.
splunk display command will display the status for each of these.
splunk list is different from splunk display. The list command lists all configurations and setting or the collections.The display option is only the state of that feature.
The CLI commands provide options for working with data. These include:
splunk import userdata
and splunk export userdata, eventdata
If we made a change to the filter the data, such as with a conf file we can granularly enable or disable them such as with the following:
splunk reload ad, auth, deploy-server, index, monitor, registry-script, tcp, udp, perfmon, wmi
For most of these components when we enable or disable them, we can check the logs to set that they have indeed been enabled or disabled.
The CLI commands provide a way to work with the logs. we do this with
splunk find log
When the commands are applied to many different machines for a cluster, the CLI provides a way to do this.
For example, we can type splunk apply cluster-bundle to apply (make active) a cluster bundle to all the peers in the cluster. To check the status on all the peers, the splunk show cluster-bundle-status command can be used at the master. For silent apply, we can say ./splunk apply cluster-bundle --skip-validation --answer-yes. The CLI list command also provides other cluster related options such as splunk list cluster-config, cluster-generation, cluster-peers and cluster-buckets. The splunk rtsearch and splunk search commands are also available to search. Note that the splunk rt-search command is used for searching events before they are indexed and to preview reports as the events stream in. The command arguments are similar between the traditional search and the rt-search commands. App is used to specify an app context to run the search, batch is used to specify handle updates in preview mode, detach triggers an asynchronous search and displays the job id and ttl for the search, header indicates whether to display a header in the table output mode, max_time indicates the length of time in seconds that a search job runs before it is finalized, maxout indicates the maximum number of events to return or send to stdout, output indicates how to display the job such as rawdata, table, csv and auto, timeout denotes the length of time in seconds that a search job is allowed to live after running. Defaults to 0 which means the job is canceled immediately after it is run. and wrap that indicates whether to wrap lines that exceed terminal width.
The splunk commands provide output that can be used to view the success or failure of your command. Typically an error message is provided when it fails.

The Event loop discussed in the previous post runs continuously and uses a Time-heap data structure which we discuss today. we maintain the time-outs on a heap, the pending heap. When we want to run them, we put them on the doubly linked list mentioned earlier. The timeouts are in a heap because it is an efficient way to organize them by order of expiring timeouts. In this implementation I'm looking at, the left and the right children point to similar events with timeouts greater than the parent. The left differs from the right only in the timeoutcount. if the timeout count is even, it is steered to the right.
whenever new elements are inserted they are bubbled up and down the tree as necessary. After every add or delete, the tree is fixed up.
Timeouts have a link pointer that indicates which list it belongs to. Note that when the timeouts are on the list we reuse the storage for the left and right pointers to be the forward and backward pointers in the doubly linked list
As a test to the Timeout heap, we could make sure that the events are added once and in the correct position on the pending queue and that they are removed from the pending when they are on the expired list to be run and that they are added back to the pending heap and in the correct position for re-triggering. Their position as left or right should remain unchanged if no new nodes were added or deleted.
By doing the test on a controlled heap we know that the events are processed in sequence. In the real world, the events should be guaranteed to be executed if their timeout expired but their sequence could be arbitrary. The heaps are supposed to be short and therefore none of them is expected to be infinite.

Friday, February 28, 2014

I found an implementation of an event loop and would like to compare it with an overlapped IO. Both code can work on Windows but we will leave the nuances of using it on any one platform. The semantics of an Event Loop is that some events are added and at the end of a timeout these events are triggered. The timeout is determined if the events needs to be triggered the next time and if so the timeslice to wait, otherwise zero. Several helper threads can wait on a set of poll able resources. The threads are kicked off by the loop, then waited on and then shut down. The wait is for the timeout determined. If this is successful, we will then recalibrate the before, timeout and after for the next round.
If the wait set was not successful, then it was probably because events got added or removed, so we check all the states and start all over.
There are a set of three lists maintained : a doubly linked list that is not full , a doubly linked list that is full, and a doubly linked list that is empty. The waiter thread moves between these queues.
In an overlapped IO this is very different. Where the events are serviced by different producers and consumers and can even associate it with multiple poll able resources.
There is no prediction to which order the events will get triggered in either case but one proceed in terms of beats and another proceeds in terms of the items in the IO.

In the post on using the SQL Server service broker as a modular input for Splunk, we introduced a technique but we now describe the complete solution. We mentioned that we could read messages from the Broker by opening a SqlConnection and executing a SQL statement. For every such message received we can create a Splunk modular input event wrapper and send off the data to Splunk.
The program implements the Splunk script object and implements the methods that are required to take the configuration from the user, apply the configuration and extract events. These events are extracted from the message broker as mentioned above.
The sampling rate is determined by the number of messages We fork a thread to process the messages for the designated queue. The config determines the queue names, the sampling intervals etc. The default sampling interval is zero.
We invoke the CLI command object from the SDK. Specifically, we say Command.Splunk("Search").
And then we add the parameter we want to search with. we can check the cli.Opts to see if our search command and parameter were added. After defining the command object, we then create a job object to invoke it. We do it with the following:
var service = Service.Connect(cli.Opts)
var jobs = service.GetJobs();
var job = jobs.Create((string)cli.Opts["search"]);
We wait till the job is done. This we can check with the done flag on the job.
We retrieve the results of the jobs with the job.getResults command which returns a stream. We can then open a ResultsReaderXml on this stream to retrieve the events.

Thursday, February 27, 2014

Today we discuss more on modular input. In this case, we talk about the properties we can set on the modular input data structure. One of the properties is the index. This talks about which index the events should goto. There us a default index but its preferable to store out events by their own index. The second property is the stanza name a unique source type for these kind of events. In fact the stanza is user specified and the events are often retrieved based on these literals.

The idea behind modular inputs seems to be to envelop the events to be differentiated from others. These help even before parsing and indexing have taken place.
When we use the script object with the input we are able to govern the flow of these events. We increase it slowly from a trickle to a deluge if we wanted by varying the polling interval or the sampling rate

In addition to differentiating and regulating the flow, the modular inut and scripts can filter On raw data before they are sent over. For example the inbound and outbound messages may not only be differentiated but one selected over another

A Splunk app that uses modular input uses the script object. It uses such things as methods to run and stream events. Examples are provided in the Splunk documentation but I would like to discuss how to integrate it with a service. A script conforms to an executable much like the service. Only in this case it follows the conventions of Splunk Apps. Otherwise both the script and the service perform data collection. While the service may have methods to process the incoming and outgoing messages, the script has a method for streaming events. In addition the script is setup for polling to monitor the changes and to validate the user input. In essence a modular input script is suited to interface with the user for the configuration of the monitor and to set up the necessary index, queue, stanza etc for the events to be written to. With this framework, no matter what the events are, they can join the pipeline to Splunk.
The StreamingEvents method invokes a while loop to poll or monitor the changes at regular intervals. This method sleeps for a few intervals thus relieving the CPU between polling. Sleeping helps particularly with otherwise high CPU usage on uniprocessors and on earlier systems such as what Sustaining Engineering sees occasionally with customers. Introducing the sleep at the correct place during polling alleviates this. Even a Sleep(0) will be sufficient.
In the case of the service that monitors the service broker queue, sample code on MSDN, it has the same run method that the Splunk Script object has and it also begins with loading the configuration and it forks thread to watch each queue. Each thread for its lifetime opens up a SQL connection object and retrieves the service broker message in a loop based on polling just like the extract events method of the Splunk object.
Thus the easiest way to integrate the service broker method is to use the service broker queue reading logic for all specified queues in the extract events method of the modular input.

Wednesday, February 26, 2014

One of the ways we can look at logging on Windows for any component is with WPP Tracing. This is true for any logs including system components, device drivers, applications, services and any registered trace provider. The trace providers usually are found by their GUID that they register with or that information is extracted from the pdb. The WindowsDDK ships a tool called traceview that can collect and display these traces.
This tool may not be up-to-date on the trace log format but we can easily convert the trace captured in a .etl log file by using eventvwr->open saved log and saving it to the newer format.
Here 's an example of how the logs look like :
00111375 drv 7128 9388 2 111374 02\26\2014-15:59:34:202 Driver::Registry call back: filter out event due to machine or user path set in config. operation = QueryValueKey
The events are displayed we cause we have the formatting for it. This is usually contained in the trace file format maintained by the providers or part of their pdbs. If we don't have the formatting information, the events look something like this:
00111386 Unknown 7128 9388 2 111385 00\00\ 0-00:00:00:00 Unknown( 40): GUID=bbd47d81-a1f8-551f-b37f-8ce988bb02f2 (No Format Information found).
This may not mean that we can use the same fields as we see in TraceView to use with the filters in the event viewer filter. The latter is maintaining its own filter fields, attributes and levels
The event viewer logs have several features.
First off it conforms to a template that's universally recognized. And it identifies events by their source, ids etc.
Second, it can collect a variety of logs, application, system and security. These provide a sink for all the event tracing information on the system. These can be saved and viewed offline.
Third, eventvwr can connect to remote computers and display the events from the logs there.
This is critical when it comes to viewing information across machines.
If our interest is only in filtering certain events, the logman tool can come helpful in filtering events based on provider guid. There are some other options available as well such as to start stop update and delete data collector, to query the data collectors properties and to import or export the XML file.