Cluster computing

Saturday, September 21, 2019

Aspects of monitoring feature in a containerization framework.
This article describes the salient features of tools used for monitoring containers hosted on an orchestration framework. Some products are dedicated for this purpose and strive to make it easier for administrators to help with this use-case. They usually fall in two categories – one for a limited set of built-in metrics that help with, say, the master to manage the pods and another that gives access to custom metrics which helps with, say, horizontal scaling of resources.
Most of the metrics products such as Prometheus for Kubernetes orchestration framework fall in the second category. The first category is rather lightweight and served over the http using resource APIs
The use of metrics is a common theme and the metrics are defined in a JSON format with key value pairs and timestamps. They might also carry additional descriptive information that might aid the handling of metrics. Metrics are evaluated by expressions that usually look at a time-based window which gives slices of data points. These data points allow the use of calculator functions such as sum, min, average and so on.
Metrics often follow their own route to a separate sink. In some deployments of the Kubernetes orchestration framework, the sink refers to an external entity that know how to store and query metrics. The collection of metrics at the source and the forwarding of metrics to the destination follow conventional mechanisms that are similar to logs and audit events both of which have their own sinks.
As with all agents and services on a container, a secret or an account is required to control the accesses to resources for their activities. This role-based access control, namespace and global naming conventions is a prerequisite for any agent.
The agent has to run continuously forwarding data with little or no disruption. Some orchestrators facilitate this with the help of concepts similar to a Daemonset that run endlessly. The deployment is verified to be working correctly when a standard command produces an output same as a pre-defined output. Verification of monitoring capabilities becomes part of the installation feature.
The metrics comes helpful to be evaluated against thresholds that trigger alerts. This mechanism is used to complete the monitoring framework which allows rules to be written with expressions involving thresholds that then raise suitable alerts. Alerts may be delivered via messages or email or any other form of notification services. Dashboards and mitigation tools may be provided from the product providing full monitoring solution.
Almost all activities of any resource in the orchestrator framework can be measured. These include the core system resources which may spew the data to logs or to audit event stream. The option to combine metrics, audit and logs effectively rests with the administrator rather than the product designed around one or more of these.
Specific queries help with the charts for monitoring dashboards. These are well-prepared and part of a standard portfolio that help with the analysis of the health of the containers.

Friday, September 20, 2019

We looked at a few examples of applying audit data for the overall product. In all these cases, the audit dashboards validate the security and integrity of the data.

The incident review audit dashboard provides an overview of the incidents associated with users.
The Suppression audit dashboard provides an overview of notable event suppression activity.
The Per-Panel Filter Audit dashboard provides information about the filters currently in use in the deployment.
The Adaptive Response Action Center dashboard provides an overview of the response actions initiated by adaptive response actions, including notable event creation and risk scoring.
The Threat Intelligence Audit dashboard tracks and displays the current status of all threat and generic intelligence sources.
The product configuration health dashboard is used to compare the latest installed version of the product to prior releases and identity configuration anomalies.

The Data model audit dashboard displays the information about the state of data model accelerations in the environment. Acceleration here refers to a speed up of data models that represent extremely large datasets where certain operations such as pivots become faster with the use of data-summary backed methods.
The connectors audit report on hosts forwarding data to the product. This audit is an example for all other components that participate in data handling
The data protection dashboard reports on the status of the data integrity controls.
Audit dashboards provide a significant opportunity to show complete, rich, start-to-finish user session activity data in real-time. These include all access attempts, session commands, data accessed, resources used, and many more. Dashboards can also be compelling and intuitive for Administrator intervention to user experience. Security information and event management can be combined from dedicated systems as well as application audit. This helps to quickly and easily resolve security incidents. The data collection can be considered as tamper-proof which makes the dashboard the source of truth.

Thursday, September 19, 2019

Let us look at a few examples of applying audit data for the overall product. In all these cases, the audit dashboards validate the security and integrity of the data. The audit data must be forwarded and the data should not be tampered with.
The incident review audit dashboard provides an overview of the incidents associated with users. It displays how many incidents are associated with a specific user. The incidents may be selected based on different criteria such by status, by user or by kind or other forms of activity. Recent activities also help determine the relevance of the incidents.
The Suppression audit dashboard provides an overview of notable event suppression activity. This dashboard shows how many events are being suppressed, and by whom, so that notable event suppression can be audited and reported on. Suppression is just as important as an audit of access to resources.
The Per-Panel Filter Audit dashboard provides information about the filters currently in use in the deployment.
The Adaptive Response Action Center dashboard provides an overview of the response actions initiated by adaptive response actions, including notable event creation and risk scoring.
The Threat Intelligence Audit dashboard tracks and displays the current status of all threat and generic intelligence sources. As an analyst, you can review this dashboard to determine if threat and generic intelligence sources are current, and troubleshoot issues connecting to threat and generic intelligence sources.
The ES configuration health dashboard is used to compare the latest installed version of the product to prior releases and identity configuration anomalies. The dashboard can be made to review against specific past versions.
The Data model audit dashboard displays the information about the state of data model accelerations in the environment. Acceleration here refers to a speed up of data models that represent extremely large datasets where certain operations such as pivots become faster with the use of data-summary backed methods.
The connectors audit report on hosts forwarding data to the product. This audit is an example for all other components that participate in data handling
The data protection dashboard reports on the status of the data integrity controls.

Wednesday, September 18, 2019

There are a few caveats to observe when processing events as opposed to other resources with webhooks.

Events can come as a cascade, we cannot cause undue delay in the processing of any one of them. Events can cause the generation of other events. This would require stopping additional events from generating if the webhook is responsible for transforming and creating new events that cause those additional events. The same event can occur again, if the transformed event triggers the original event or if the transformed events ends up again in the webhook because the transformed event is also of the same kind as the original event. The webhook has to skip the processing of the event it has already handled.

The core resources of Kubernetes all are subject to the same verbs of Get, create, delete, etc. However, events are far more numerous and occur at a fast clip than some of the other resources. This calls for handling and robustness equivalent to the considerations in a networking protocol or a messaging queue. The webhook is not designed to handle such rate or latency even though it might skip a lot of the events. Consequently, selector labels and processing criteria become all the more important.

The performance consideration of the event processing by webhooks aside, a transformation of an event by making an http request to an external service also suffers from outbound side considerations such as the timeouts and retries on making the http request. Since performance of the webhooks of events has been called out above as significantly different from the few and far in between processing for other resources, the webhook might have to make do with lossy transformation as opposed to a persistence-based buffering.

None of the above criteria should matter for the usage of this kind of webhook for the purpose of diagnostics which is most relevant to troubleshooting analysis. Diagnostics have no requirement on performance from webhook. When the events are already filtered to audit events, and when the conversion of those events is even more selective, the impact on diagnostics is little or none from changes in the rate and delay of incoming events.

Tuesday, September 17, 2019

Data Storage and connectors:
The nature of data storage is that they accumulate data over time for analytics. S3 Apis are a popular example of programmatic access for web access to store data in the cloud. A connector works similarly but they don’t necessarily require the data storage to be remote or code to be written for the data transfer. In fact, one of the most common asks from a storage product is that it facilitates data transfer using standard shell commands.
The purpose of the connector is to move data quickly and steadily between source and destination regardless of their type or the kind of data to be transferred. Therefore, the connector needs to be specific to the destination storage. The connector merely automates the steps to organize and fill the containers in the data storage by sending data in fragments, if necessary. The difference made by a connector is enormous in terms of convenience to stash data and for reusability of the same automation for different sources. The traditional stack of command line storage over programmable interfaces to allow automation beyond the programmatic access is not lost. However requiring customers to write their wrappers for their own command line utility to send data is somewhat tedious and avoidable.
In addition to the programmatic access for receiving data, data stores need to customize the input of data from different contexts such as protocols, bridging technologies such as message queues, and even the eccentricities of the sender. It is in these contexts that a no-code ready-made tool is preferred. Data transfers may also need to be in a chain requiring data to be relayed between systems like piping operations in shell environment. A new connector may wrap another existing connector as well.
One of the most common examples of a connector is a TCP based data connector. The data is simply sent by opening a networking socket to make a connection. This is executed with standard command line tools as follows:
cat logfile | nc ipaddress port
The inclusion of such a data connector for a storage product is probably the most convenient form of data transfer. Even if the storage product requires programmatic access, wrapping the access APIs to facilitate a TCP connector like above will immensely benefit those who do not have to write code to send data to storage.
With the automation for a TCP connector written once, there will not be a need to repeat the effort elsewhere or to reinvent the wheel.

Monday, September 16, 2019

Kubernetes provides Webhooks as a way to interact with all system generated events. This is the equivalent of http handlers and modules in ASP. Net in terms of functionality to intercept and change requests and Responses. The webhooks however are an opportunity to work on System generated resources such as pod creation requests and so on.
There are two stages where webhooks can be run. They are correspondingly named as mutating or validating webhooks. The first is an opportunity to change the requests on Kubernetes core V1 resources. The second is an opportunity to add validations to requests and Responses.
Since these span a lot of system calls the webhooks are invoked frequently. Therefore they must be selective in the requests they modify to exclude the possibility that they touch requests that were not intended.
In addition to selectors, the power of webhooks is best denomonstrated when they select all requests of a particular type to modify. For example this is an opportunity for security to raise the baseline by allowing or denying all resources of a particular kind. The execution of privileged pods may be disabled in the cluster with the help of webhooks.
The webhooks are light to run and serve similar to nginx http parameter modifiers. A number of them may be allowed to run

Sunday, September 15, 2019

The difference between log forwarding and event forwarding becomes clear when the use of command line options for kube-apiserver is considered. For example, the audit-log-path option dumps the ausit events to a log file that cannot be accessed from within the Kubernetes runtime environment within the cluster. Therefore this option cannot be used with FluentD because that us a containerized workload. On the other hand, the audit-web-hook option allows the service to listen for callbacks from the Kubernetes control plane to the arrival of audur events. These service listening from Falco for this web hook endpoint is running in its own container as a Kubernetes service. The control plane makes only web request per audit event and since the events are forwarded over the http, the Falco service can efficiently handle the rate and latency of traffic.
The performance consideration between the two options is also notable. The log forwarding is the equivalent of running the tail command on the log file and forwarding it over TCP as netcat command. This utilizes the sand amount of data in transfers and uses a TCP connection although it does not traverse as many layers as the web hook. It is also suitable for Syslog drain that enables further performance improvements
The Webhook command is a push command and requires packing and unpacking of data as it traverses up and down the network layers. There is no buffering involved on the service side so there is a chance that some data will be lost as service goes down. The connectivity is also subject to faults more than the Syslog drain. However the http is best suited for message broker intake which facilitates filtering and processing that can significantly improve performance.
The ability to transform events is not necessarily restricted to the audit container based service or services specific to the audit framework. The audit data is rather sensitive which is why it’s access is restricted. The transformation of events can occur even during analysis. This lets the event queries be simpler when the events are transformed. The use of streaming analysis enables the view of the data since the origin as holistic and continuous. With the help of windows over the data, the transformations are efficient.
Transformations can also be persisted where the computations are costly. This helps pay those costs one time rather than every time they need to be analyzed. Persisted transformations help with reuse and sharing. This makes it convenient and efficient to use a single source of truth. Transformations can also be chained between operators and they serve to firm a pipeline. This makes it easier to diagnose, troubleshoot and improve separation of concerns.