Cluster computing

Wednesday, June 12, 2019

Earlier archives and tar balls of executables from different languages were made available to the user to run an application. However, this approach involved more user interaction than images. Images are also convenient for sharing on registries across organizations.
However, this new format of packaging has posed new challenges for verification and validation as opposed to the executables that often had both binaries and symbols where the symbols could become helpful in interpreting the executables. Moreover, code analysis tools used to work very well with introspection of executables. Rigorous annotations in the code as well as analyzers of these static annotations tremendously improved the sanctity of some of the executables as the end product of all the application development. With the availability of images, it was harder to tell if there was anything injected or otherwise contaminating the image
Therefore the only option left for organizations is to control the process with which images proliferate. Strict control of process helps and access control of end product helps ease the validity of the images. However peer to peer exchange of images is also just as important for the general usages as the images from corporations. Therefore there is a certain class differentiation made available in the registry as public and private as well as internal images and external facing images.
When the verification and validation of images are described as standard routines regardless of the origin of images, they can be encapsulated in a collection where each routine corresponds to a certain security policy. These include policy definitions that are standard as well as those that are custom to specific entities, departments or time-periods. The evaluation of a routine for a policy is however an automatable task and can be made available as a software product or development library depending on the spectrum between full-service and do-it-yourself approach.
Much of the validation and verification is therefore external to the binaries and images and the technology to perform these can also be saved as custom images.
One of the advantages of integration with build pipeline is that the verification and validation are triggered more often. This enables continuous integrations where every build can be considered to be vetted.
Scanning the end-results of the build for the security vulnerabilities is also another benefit of doing this at build time. Now the fixes can not only be monitored on a build to build basis, but also the vulnerability defects will be flagged before the binaries reach the consumers.

Tuesday, June 11, 2019

An embedded Keycloak application extends the class KeycloakApplication. (Names of classes from the Keycloak source used here are italicized for better reading) It can keep the KeycloakServerProperties with defaults. This embedded Keycloak application registers a set of users to begin with. For each user, it creates a KeycloakSession It persists the user and the session in a corresponding Keycloak user json file and defines an ExportImportConfig with this temporary file. An ExportImportManager is then instantiated with the session and used to run an import.
An embedded Keycloak application will do two things on instantiation. First – it will create a master realm admin user and Second it will import the Keycloak existing configuration file.
The creation of the master realm admin user is the same concept that every Keycloak application does. That’s why it just uses the existing ApplianceBootstrap class to create a master realm user in a transaction scope for which it uses the session’s transaction manager. The admin username and password is obtained from the KeycloakServerProperties which we had initialized with defaults in the beginning
The second step merely finds the configuration file to import the users. If the embedded application has registered users with configuration file, it is imported otherwise the default keycloak application file is imported.
The Keycloak embedded application needs to be hosted on an embedded web server. Jetty or tomcat could serve this purpose. A suitable servlet stack will be added to the bootstrap and its corresponding reactive stack will give access to connectors, server resources and the server itself. The Keycloak server properties is used to configure the application The Keycloak session destroyer listener is used to add a listener and the Keycloak session servlet filter is used to add a filter. Essentially, we instantiate a container with this servlet and this brings a lot of utilities as opposed to working natively with an http handler and the container exposes a way to treat it as an http handler.
A path handler will route the requests to this servlet. Together the http handler and the path handler are registered to create an instance of the jetty or the tomcat server. At this point the Keycloak web application has been started.
Now with this embedded web application, we can take the following steps. We can register a Keycloak client in the master realm and request an access token. The access token will be used to list the clients with the secure endpoint and a successful response should indicate that the token was used.
Optionally a role can be defined in the master realm and the embedded application can be used to validate the role. This is optional because it involves registering a user and mapping the role to the user.

Monday, June 10, 2019

Address Book:
Most address books are proprietary in nature or at least tied to a software stack. Active Directory for instance does not let you access the address book without tightly integrating it with their network, enabling access only with their protocol and query language and requiring their own access routines.
Contact Books from Phone Applications to membership directories are proliferating on the other hand and often taking up space and consuming resources on the other hand while remaining local and often out of date.
Somewhere between these two extreme examples of address books, we see the need for universally accessible cloud storage for a directory representing the address book and its natural querying over the World Wide Web from any device anywhere.
Since the address book is primarily an unstructured data on the wire, there is very little need to see it as one requiring the onus of a structured storage at least for the majority of the use cases. Besides these access requirements, object storage brings out the best in storage engineering to keep the data durable and available in the cloud.
An object store offers better features and cost management than most competitors such as relational databases, document data stores and private web services. The existence of data library product in the market on a variety of stacks and not just the Address Book software providers indicate that this is a lucrative and a commercially viable offering. MongoDB for instance competes well in this market. From database servers to an appliance, the data library product is viewed as a web accessible storage but it provided with form of constraints to its access. Object storage not only facilitates this web access but also provides no restraints on its access making it the most flexible platform for dynamic queries and analysis. It does not treat the physical data stores as a mere content while providing storage engineering best practice on all the data.
At the same time, a document stores provides all aspects of data storage for storing and querying an address book. It is not a traditional address book but it addresses most of the requirements. A traditional address book has been bound to a database for long and we have argued that it does not need to be so for non-transactional read and write. In json, these appear as nested fields and are pre-joined into objects. The frequently accessed objects can be in the cache as well. A search engine provides search over this catalog. Since the address book entries are independent just like in a document or catalog, we can refer its access to be similar to that for a catalog. Therefore, functional data access is provided by the Catalog API. The API and the engine separately cover all operations on the catalog. The API can then be used by downstream units. A search engine allow shell like search operators which can be built on Lucene/Solr Architecture. A Lucene index keeps track of terms and their occurrence locations but the index needs to be rebuilt each time the catalog changes. The Catalog API can retrieve results directly from the catalog or via the search engine. In both cases, the customer only issues a single query.
In a sense, this is very similar to how we want the object storage to store the catalog and make it web-accessible. The data is available for retrieval and iteration directly out of the store while hierarchy and dimensions may be maintained by a web service over the Object Storage.

Sunday, June 9, 2019

We now review the integration testing of embedded Keycloak. This is facilitated with the help of a Keycloak Spring security adapter. The adapter is installed by declaring a dependency on the Maven POM or Gradle build using the artifact Id of keycloak-spring-security-adapter. The adapter does not get included in the web.xml. Instead a keycloak.json is provided.
Since we are using Spring security, Keycloak conforms with a convenient base class KeycloakWebSecurityConfigurerAdapter with which we can instantiate the security context configuration. This class provides methods to configure HTTPSecurity, register the Keycloak authentication provider as the global authentication provider and the session authentication strategy. A NullAuthenticatedSessionStrategy used for bearer-only applications is sufficient for integration testing.
This Keycloak adapter also supports multi-tenancy. This refers to using the same application to secure multiple Keycloak realms. Each realm is supported with the help of an instance that implements the KeycloakConfigResolver interface. Typically, a resolver requires to lookup a Keycloak.json specific to a realm and the resolver is mentioned in the web.xml file of the Spring application.
All Spring applications have a naming convention. This includes Security Roles. Keycloak security roles must conform using the ROLE_ prefix. Since bearer tokens can be passed between clients, Keycloak provides an extension of Spring’s RestTemplate which can be invoked by Spring SecurityConfig
Note that Keycloak as a standalone instance requires the user interface for registering users and clients and requesting access before bearer tokens can be issued. It is only in the embedded form, that we avoid using this configuration Therefore, embedded is not really a hack on the Keycloak product but one sufficiently modified for testing purposes.

Saturday, June 8, 2019

Docker is a container platform that allows applications to run on a virtual host. Spinning up containers for the duration of a set of tasks allows proper cleanup of resources for these tasks while enabling isolation between workloads and between workloads and hosts.

Containers are fast. They are not built from the ground up. They are just pre-fabricated as images. These images are just binaries representing a set of bits that can either be 0 or 1. They don’t have text and are very much like executables in the sense that this is machine data and specifically useful only for immediate instantiation of a software product or platform. A binary can have many layers and support different options for each layer thereby creating several flavors of images. Images once created are generally not modified.

Images are then saved in a repository and re-used. This has been a popular option for packaging binaries.

However, this new format of packaging has posed new challenges for verification and validation as opposed to the executables that often had both binaries and symbols where the symbols could become helpful in interpreting the executables. Moreover, code analysis tools used to work very well with introspection of executables. Rigorous annotations in the code as well as analyzers of these static annotations tremendously improved the sanctity of some of the executables as the end product of all the application development. With the availability of images, it was harder to tell if there was anything injected or otherwise contaminating the image

Therefore the only option left for organizations is to control the process with which images proliferate. Strict control of process helps and access control of end product helps ease the validity of the images. However peer to peer exchange of images is also just as important for the general usages as the images from corporations. Therefore there is a certain class differentiation made available in the registry as public and private as well as internal images and external facing images.

When the verification and validation of images are described as standard routines regardless of the origin of images, they can be encapsulated in a collection where each routine corresponds to a certain security policy. These include policy definitions that are standard as well as those that are custom to specific entities, departments or time-periods. The evaluation of a routine for a policy is however an automatable task and can be made available as a software product or development library depending on the spectrum between full-service and do-it-yourself approach.

Much of the validation and verification is therefore external to the binaries and images and the technology to perform these can also be saved as custom images

Friday, June 7, 2019

Logging Sidecar:
AcmeApplication is hosted on Kubernetes and logging throughout the
product can take advantage of the rich features of the runtime and its
plugins. This document justifies the need to support logging including
mechanisms to do it regardless of whether pods, jobs, batches or
containers are being recycled.
AcmeApplication integrated analytics over storage. As such, all the
deployers and the running instances of components log to a file which
is automatically rolled after predefined limits. This works well when
the file is accessible at all times and can be used in queries such as
log stream queries.
However, virtually every K8s resource can have varying lifetime and
therefore the default logging framework applied logging at a cluster
scope. AcmeApplication does not yet take advantage of an application
block pattern for all the applications because the applications are
fundamentally different and proprietary in their logging and format.
The default Kubernetes framework can tap into the files regardless of
origin and send the logs to a common sink that is singleton
cluster-wide. This is yet to be turned on. Metrics, InfluxDB and
Grafana are some of the other applications that have their own logs
and there is nothing unifying the logs and their namespaces into a
common store for analysis
Some products such as log indexes can be configured for rich analytics
via their search language that has been acknowledged in the industry
as best for machine generated data. The integration of a log index
store with AcmeApplication is a business decision and not a
technological decision. Popular use cases for logging and queries over
logs involve significant cost reduction in production support,
troubleshooting and sustaining engineering.
Whether an index store collects the logs from all the applications via
the connectors or whether a syslog drain is setup for consolidation is
merely a choice depending on the use case. The technologies used may
vary from use case to use case but in this document, we suggest a
system that is architecturally sound
It has not yet been proven to be helpful to have stream queries over
logs as much as regular interactive queries over the user interface on
all the logs from different applications that is stored in designated
indexes. Consequently, there are various levels and areas of emphasis
in the stack to support rich analytics over logging.
At the Kubernetes system level, among the popular options, we could
prioritize fluentD logging sidecar as a way to collect logs from
across the pods. This will be beneficial to the applications as it
alleviates the concerns from them and at the same time provides a
consistent model across the applications.
At the store level, we recommend that the logs have their own storage
whether it is the file storage or an index store. This will allow
uniform aging, archiving and rolling of logs based on timeline and
will work across all origins
At the analytics level, we can outsource the reporting stack, charts,
graphs and dashboards to the application that is best to analyze
machine data.
Although StorageApplication, SmartAnalytics, metrics, charts and
graphs can be used to store, search and render results, it will take a
longer round-trip time between the user query and the results.
Therefore, dogfooding of the StorageApplication store for logging will
not be helpful at this time. But perhaps, this could be considered in
subsequent versions.

Thursday, June 6, 2019

This document is a summary of the key features of Keycloak which is an open-source Identity and Access Management (IAM) software. Organizations use it when they want a reliable solution without reinventing some of the core modules of an IAM. It is open source and works well with major public cloud providers. It supports both OAuth2 as well as Open ID Connect (OIDC) where the former is used to provide authentication and authorization while that latter provides identity on top of that.
The resources of the Keycloak can be described as users, clients, roles, groups, events and identity providers. The last item in this category can work with social media, security application markup language and OpenID connect which are separate protocols for finding identity in an external identity provider. A module known as the user federation module allows integration with an organization’s default identity provider using protocols such as Lightweight directory access protocol and Kerberos.
Keycloak can work well with all kinds of application – mobile, frontend, backend using what is called a keycloak adapter. The adapter ties all this applications to the default client on the realm which looks up the identity and authenticates and authorizes the user. There is usually one adapter for a set of backend services but there can be many for load balancing with a given client. Sessions can be replicated across adapters. Keycloak is stateless and this paradigm is suitable for containers.
The stateless architecture implies that the token requested by a Keycloak adapter will be validated in each of the pods hosting the backend services. There can be many pods with the same service but each will perform the token validation by checking the header, payload and signature that is typical of Json web token. This token is refreshed for use when it expires where the old one is discarded in favor of a new one.
Keycloak is usually a standalone server distribution. It is designed as a single product that can be quickly setup and configured. It supports Docker registry, OpenJDK and spring-boot to name some of the popular developer tools. Keycloak is hardened with IP restriction / port restriction. It mitigates vulnerabilities such as password guesses and brute force attacks. In the event, that an access token or refresh token is compromised, it can apply a revocation policy to all applications. It takes the hostname of the client from the request parameters which is typical for most protocols including S3.