Cluster computing: SIEM continued ...

The intelligence in event monitoring comes with a good quality model that has been honed on a large data set from a variety of sources and tuned to behave very well to the data in its domain. Even streams of continuous events have a historical section and an active front for new events. By periodically running the data on the ever-increasing historical sections, a model can be improved to provide insights into the events as they occur. This is still not real-time in that it is still executing on a historical batch of events. Streaming algorithms that catch up on the historical events and adapt to newer events while being used for predictions is not as prevalent as the batch-oriented or model-prediction approach, but it opens the possibility for a new form of analysis and much of the machine data is suitable for a stream abstraction. This does not prevent us from using a model-prediction approach where we can make the model as sophisticated as necessary and it can be used to make both short-term and long-term predictions. The Microsoft Time-Series algorithm is an example of using historical data to continuously make predictions on incoming data. It has two forms where one is used for short term predictions and another for long term predictions. It is even possible to blend the algorithms. The short-term-algorithm is an autoregressive tree model for representing periodic time-series data while the long-term algorithm is an autoregressive integrated moving average.

The challenge with event monitoring lies with the type of data and the integration of the test tool rather than the analysis or prediction with the intelligence mentioned above. Data comes in many forms, from a variety of sources, and is received by systems that are deployed in one of several ways. Let us take a few examples of each. The systems that receive the data are deployed in one of the following options – SaaS is the predominant deployment option with well over 80% of the cases, followed by the on-premises deployment option and then by privately hosted deployment which is in the same ballpark of popularity. Privately hosted and multi-tenancy options follow next with only so much lagging behind the other options. Each of these deployment options requires versatility from the intelligence added to monitoring. Managed full-service solutions are always preferred over the complexity of those with more functionally rich suites. It is even typical to see 0.5 to 1.5 human resource assigned to the administration of these solutions.

The type of data varies far more in number than that of the deployment options. Data or alerts are almost universally collected for the monitoring of the performance of applications and services in DevOps. This type of data pertains to performance-related events. Time Series events such as metrics also constitute a type of data demanding their own solution stack. Log files or access records are the third types of data that justify their own index and analysis stacks if they are not archived or lying around. Security-related events are also time-series, but they are treated differently from others for the alerts they need to raise. Transaction related events such as those from customers determine application performance but hold a special significance for the business as opposed to the internal-facing operational data. Internal configuration or topology events form yet another class of data. Anything that is not transactional is treated as Unstructured data by default. This classification of data comprises those that are treated in batches, micro-batches, and streaming mode. Data that is collected from agents for systems such as by telemetry, instrumentation, or as proprietary byte codes form a different type. Text data that is exported out of disparate systems is usually in the form of comma-separated values. Web requests and responses including those made from client-side scripts using the browser also form their own class because they are usually in the clear or encrypted per request. The data sent over the web by the Internet-of-Things forms a different type. SIEM data is different from others. Network traffic such as flow-related data or packets, or wire data forms their own classes. Web proxies have gained a lot of popularity as a gateway to services and are of interest both for analytics as well as troubleshooting purposes. Infrastructure as a code or software-defined stacks represent a type of data that require integrations because of the tools dedicated to them. Only a quarter of the hundreds of tools available for these data types are put to the integration with automation.

Cluster computing

Monday, February 1, 2021

SIEM continued ...

No comments:

Post a Comment