Cluster computing

Wednesday, June 19, 2019

We continue discussing Keycloak on Kubernetes. The Service catalog returns the details of the resource as a K8s secret. If the application persists the K8s secret on a mounted volume, care must be taken to mark the volumes as readOnly.
Similarly, while Keycloak configuration is internal, it should be prevented from reconfiguration after the deployment.
The Service broker listens on port 9090 over http. Since this is internal, it has no TLS requirements. When the token passes the trust boundary, we rely on the kubectl interface to secure the communication with the API Server. As long as clients are communicating with kubectl or the API Server, this technique works well. In general, if the server and the clients communicate via TLS and they have verified the certificate chain, then there is little chance of token falling in wrong hands. The URL logging or https proxy are still vulnerabilities but the man in the middle attack is less of an issue if the client and the server exchange session id and keep track of each other's session id. As an API implementation, session Id's are largely site or application based and not the APIs concern but it’s good to validate based on session id if such is available.
Sessions are unique to the application. Even the client uses refresh tokens or re-authorizations to keep the session alive. At the API level, if the sessions were kept track of, it would not be tied to the OAuth revokes and re-authorizations, hence relying on session id alone is not preferable. At the same time, using session id as an additional parameter to confirm along with each authorization helps tighten security. It is safe to assume the same session prevalence until the next authorization or an explicit revoke. By tying the checks exclusively to the token, we keep this streamlined to the protocol.
In the absence of session, we can use refresh tokens after token expiry. Since the refresh token is a protocol (RFC) intrinsic technique, it is already safe to use to prolong the period of access beyond token expiry time. Repeatedly acquiring a refresh token is the same as keeping a session alive. The above threat mitigation works regardless of the actual implementation of a notion of session.

Tuesday, June 18, 2019

We continue with the discussion of Keycloak deployment on Kubernetes:
This deployment consists of an identity provider together with the broker. In future, there may be more than one identity provider. A user goes to the identity provider to login. This is just how OAuth operates where all requests for user tokens are initiated by redirecting the user to the login screen of the identity provider. Since a user interface is involved the interaction between the user and the IDP is now subject to all the threats that a web interface faces. Threats such as cross site scripting, man-in-the-middle attacks, SQL injection attacks, cross-origin resource sharing and others are all vulnerabilities exploited from the client side. Enabling browser traffic to be over https mitigates only some of these concerns as transport is only a layer below the application logic.
We now turn towards the Keycloak Json file that describes the various configurations for the Keycloak. The Java adapter for Keycloak is described with attributes such as “auth-server-url", “ssl-required”, “cors-allowed-methods", “cors-exposed-headers", “bearer-only”, “expose-token”, “verify-token-audience", “disable-trust-manager", “trust-store”, “client-keystore”, “token-minimum-time-to-live" and “redirect-rewrite-rules". These options help harden security.
Using any generic Open ID Connect Resource provider is an alternative to using the Java adapter for Keycloak. However, the advantage of using Java adapter as opposed to the generic OIDC library, is that it facilitates tighter security with minimal code by use of configuration options. The adapter binds the platform with the framework so that the application and the clients can be secured. The configuration options provide all the parameters to tighten the security
Use cases:
There are only two use cases for the OIDC resource provider – first it allows applications to request token for a user. In this case an identity token containing the username and profile information is returned. An access token containing the role mappings and authorization information is also returned.
The second case is when a remote service requests token on behalf of the user. It is treated as a client and issued an access_token.

Monday, June 17, 2019

Today we continue with the threat assessment of keycloak deployment on Kubernetes cluster.
Storing of the data exchanged from the system to the user is generally outside the trust boundary. If the data is tampered, hijacked or compromised, the activities within the trust boundary can be manipulated. This makes the system vulnerable and therefore validation of the data becomes important activity within each component.
The data usually comprises of a token along with attributes that are made available by the issuing Keycloak service broker in Json format. Since this is part of the OAuth protocol, the mechanisms of the protocol already mitigate security vulnerabilities pertaining to the data. It is just the handling of the data after the protocol has issued them that needs to be secured. Again, this falls outside the trust boundary and as such is largely on the user side of responsibilities. The system takes efforts only to validate the data. After validation, it is assumed that the token belongs to the identity to whom the token was issued. If the token is not hijacked, only registered and authorized users can take action on the system.
Tokens are not restricted to users. Client credential access grants tokens to clients from the issuing authority. Clients and users are both registered so their requests are only honored if they can be looked up.
In the case of deployment of Keycloak service broker over Kubernetes clusters, we validate the integration between Keycloak service broker, the service catalog and the open service broker API. We assume each component is independently assessed with their STRIDE model and only secure the integration. Since the Keycloak service broker fits nicely into the open service broker API framework of Kubernetes, we can take it to be internal.
Therefore, we evaluate only the Keycloak deployment. This deployment consists of an identity provider together with the broker. In future, there may be more than one identity provider. A user goes to the identity provider to login. This is just how OAuth operates where all requests for user tokens are initiated by redirecting the user to the login screen of the identity provider. Since a user interface is involved the interaction between the user and the IDP is now subject to all the threats that a web interface faces. Threats such as cross site scripting, man-in-the-middle attacks, SQL injection attacks, cross-origin resource sharing and others are all vulnerabilities exploited from the client side. Enabling browser traffic to be over https mitigates only some of these concerns as transport is only a layer below the application logic.

Sunday, June 16, 2019

We continue discussing the STRIDE model of testing:
we apply how the different threats can arise by partitioning it between that which is within trust boundary and that which is outside.
When a user1 can behave like user2, that is considered spoofing. Possible defense might involve issuing tokens specific to users.
Tampering is when the user has successfully modified the token to her advantage.
Repudiation is when the user can hijack a valid token that the system cannot refute.
Denial of Service is when the user can tank the Identity provider (IDP) or the API server.
Elevation of privilege is when the user has compromised the IDP or the API server.
When we add Keycloak to the above Kubernetes authentication,
We add the following components:

In this case the interactions are deeper in the trust boundary where the Open Service broker API represents the API server in the earlier diagram.
When components go deeper within the trusted boundary, the security risk reduces. However this is just at the protocol between the components. It does not say anything about the layered communication between the components or how the request and response are protected.
API security mitigates most of these concerns with the help of request parameters and the use of encryption.
However, storing of the data exchanged from the system to the user is generally outside the trust boundary. If the data is tampered, hijacked or compromised, the activities within the trust boundary can be manipulated. This makes the system vulnerable and therefore validation of the data becomes important activity within each component.

Saturday, June 15, 2019

This is an initial draft of the STRIDE model of threat mitigation in authentication performed on Kubernetes clusters.
STRIDE stands for
Spoofing Identity – is the threat when a user can impersonate another user
Tampering with data- is the threat when a user can access Kubernetes resources or modify the contents of security artifacts.
Repudiation – is the threat when a user can perform an illegal action that the Kubernetes cannot deter
Information Disclosure – is the threat when say a guest user can access resources as if the guest was the owner.
Denial of service – is the threat when say a crucial component in the operations of the Kubernetes is overwhelmed by requests so that others experience outage
Elevation of privilege – is the threat when the user has gained access to the components within the trust boundary and the system is therefore compromised.
Usually we begin the process of evaluating against these factors with a control and data flow diagram.
A control flow diagram may look something like this:

Now we apply how the different threats can arise in this diagram by partitioning it between that which is within trust boundary and that which is outside.
When a user1 can behave like user2, that is considered spoofing. Possible defense might involve issuing tokens specific to users.
Tampering is when the user has successfully modified the token to her advantage.
Repudiation is when the user can hijack a valid token that the system cannot refure.
Denial of Service is when the user can tank the Identity provider (IDP) or the API server.
Elevation of privilege is when the user has compromised the IDP or the API server.

Thursday, June 13, 2019

There are a few other troubleshooting mechanisms which we can call supportability measures. I’m listing them here.
Monitoring and Alerts – System events are of particular interest to administrators and users. A way to register monitors and their policies will help to have an overarching and comprehensive view of the operations of the product. Different kinds of sensors may be written for this purpose and packaged with the product.
Counters- performance counters for various operations of the product will be very helpful for diagnosis on what takes a lot of time. When the elapsed time and processing time are measured separately, they help tremendously in finding bottlenecks or long running tasks.
Dynamic views – If the operational data is persisted then the current window of activity can be viewed with built-in queries. After all the storage product stores streams and it can take all activity data as append-only data.
User Interface – There are pages that can help remote monitoring of products and troubleshooting via viewing their logs or setting up back channels of communication with the hosts of the application. Such interface will be very helpful for remote troubleshooting on customer deployments.
The APIs for collecting metrics from the system will prove very helpful to other applications who don’t need to involve other means of access and roll these operation monitoring workflows into theirs. API invocation decouple technology stacks and help with independent monitoring. Since API are published over the web, they are usable across networks.
Virtually all API can be packaged into an SDK for developer convenience. These will tremendously improve the possibilities for application development and open up the boundaries for custom usages. Such expanded possibilities means the product will endear itself to developers and their organizations.

Transparency in user query execution:
Streaming queries are a new breed. Most applications like Flink require the query logic to be packaged in a module prior to execution. Although a user interface is provided, much of the execution and its errors are latent. Consequently, the user has very limited tools for progress, debugging and troubleshooting in general
For example, when the Flink application is standalone and a query has been provided, the user may receive no output for a long while. When the data set is large, the delay might be confusing to the user on whether it comes from the processing time over the data set or whether the logic was incorrectly written. The native web interface for the Apache Flink provides some support in this regard. It gives the ability to watch for watermarks which can indicate whether there is any progress made. If there are no watermarks then it is likely that the event time windows never elapsed.
Similarly, if the logic requires extract-transform-load of data, there is an increased likelihood of resource consumption and overall performance impact. This might manifest itself by way of myriad symptoms such as error messages and failed executions.
The error messages themselves usually suffer from two problems. One they are not descriptive enough for the user to take immediately resolving actions. And second they don’t generally differentiate between user error and operational error. For example, “an insufficient number of network buffers” does not immediately mean parallelism must be reduced. Another example is when a NotSerializableException does not indicate if the user’s query logic must be changed or if the data is just not good.
The absence of a progress bar on the UI and the requirement that the user follow Flink conventions only makes it more difficult to troubleshoot. User has syntax from Flink such as savepoints to interpret progress. Users can create, own or delete savepoints which represents the execution state of a streaming job. These savepoints point to actual files on the storage. If the access the savepoints becomes restricted or unavailable in some circumstance, the troubleshooting is impaired. Contrast this with Checkpointing which the Flink creates and deletes without user intervention. While checkpoints are focused on recovery, much more lightweight than savepoints, and bound to the job lifetime, they can become equally efficient diagnostic mechanisms