Friday, December 13, 2019

Events Records
Kubernetes is a container orchestration framework that hosts applications. The events from the applications are saved globally in the cluster by the kube-apiserver. Hosts are nodes of a cluster and can be scaled out to as many as required for the application workload. Events are different from logs. A forwarding agent forwards the logs from the container to a sink. A sink is a destination where all the logs forwarded can be redirected to a log service that can better handle the storage and analysis of the logs. The sink is usually external to the cluster so it can scale independently and may be configured to receive via a well-known protocol called syslog. The sink controller forwards cluster level data such as cpu, memory usages from nodes and may include cluster logs from api-server on the master node.  A continuously running process forwards the logs from the container to the node before the logs drain to the sink via the forwarding agent. This process is specified with a syntax referred to as a Daemonset which enables the same process to be started on a set of nodes which is usually all the nodes in the clusters. There is no audit event sink. Events are received by webhooks.
Kubernetes provides a security relevant chronological set of records documenting the sequence of activities that have affected the system by individual users and actions taken on their behalf by the system in these events.
Web application developers often record and restrict their logs of the flow of control messages to their database or choice storage. That is where they have kept their business objects and their master data. It seems convenient for them to persist the exceptions as well as the log messages in their choice store too although the norm is to keep the log separate from data. They save themselves the hassle of having to look somewhere else and with a different methodology. But is this really required?  Data in a database is the bread and butter for the business. Every change to the data be it addition, modification or deletion represents something that business understands and is usually billable. The state of the data reflects the standing of the business. Peoples’ name and credit cards are maintained in the database. And if this make is clear that something as dynamic as the activities of the system maintained in a log during the interactions with a user are not the same as the data, then it can be argued that they need not end up in the same place. But it's not just a database and a log file that we are comparing. A database comes with ACID guarantees and locking that a log file cannot provide.  In fact, we do see various other usages of the logs. Logs could flow to a file, an index, an emailing and alert system, and a database individually or all at once. Let us just take an example of comparing the option to save in a SQL database with an alternative to index the logs as events where key values are extracted and maintained for searches later on. Events are time-series inputs to a key value index. This association between the activities and a timestamp is helpful in determining and reconstructing chain of events that may be meaningful in such things as diagnosis, forensics and even analysis for charts and post-mortem actions. The store for such events could be a NoSQL database which has a methodology that would be different from the traditional SQL database. While we don't want to play into the differences between SQL and NoSQL for such a thing as logs because it can land up in both places either directly or after processing, we do want to realize that the workflow associated with logs are indeed different and more versatile than the CRUD activity of data. Logs for instance have enabled to move or copy data from one place to another without having to interrupt or lock the source. For example, we can ship the logs of one system to another where they can play back the entries to reconstruct the data to the same state as the original. Since the records are time series, we only need to ship what has transpired between last and current.  Alternatively, for every change to the original, we could log to the remote copy also. Both methods can be combined too and these are not exclusive. The scope of the changes to be affected in mirroring can be set on a container by container basis such as a database. In the case of the log shipping, the restore operation has to be completed on each copy of the log. It's important to recognize the difference in a log from a web application to the difference in the log of the database changes. The former is about the logic which may span both the web tier and the database. The latter is about changes to the data only without any regard for how it was affected. In other words, the log of a database system is about the changes to the state of the databases. What we are saying here is that the logs in the latter case are much more important and relevant and different from talking about the web logs or web access logs. That said, the flow of control and the exceptions they encounter or their successes can over time also provide useful information. What works for the logs of a database also can be put to use for that of the applications should there be a requirement for it. These application level activities and Logic can remain private while their visibility to system allows their observation and recording globally as events.
Kubernetes provides Webhooks as a way to interact with all system generated events. This is the equivalent of http handlers and modules in ASP. Net in terms of functionality to intercept and change requests and Responses. The webhooks however are an opportunity to work on System generated resources such as pod creation requests and so on.
There are two stages where webhooks can be run. They are correspondingly named as mutating or validating webhooks. The first is an opportunity to change the requests on Kubernetes core V1 resources. The second is an opportunity to add validations to requests and Responses.
Since these spans a lot of system calls the webhooks are invoked frequently. Therefore, they must be selective in the requests they modify to exclude the possibility that they touch requests that were not intended.
In addition to selectors, the power of webhooks is best demonstrated when they select all requests of a particular type to modify. For example, this is an opportunity for security to raise the baseline by allowing or denying all resources of a particular kind. The execution of privileged pods may be disabled in the cluster with the help of webhooks.
The webhooks are light to run and serve similar to nginx http parameter modifiers. A number of them may be allowed to run
#codingexercise
    private static class EventFilter implements FilterFunction<String> {
        @Override
        public boolean filter(String line) throws Exception {
               return !line.contains("NonEvent"); 
        }
    }

No comments:

Post a Comment