Cluster computing

Thursday, July 15, 2021

Azure Secret Management System:

Introduction: Azure KeyVaults store secrets consumed by users, applications, services, and devices without the need to manage it themselves. The documentation on this service offering from the Azure Public cloud helps us review some of the features that can be leveraged for its usage. This article captures one aspect of their usage that is popular with DevOps but does not get much attention. Secrets are used to safeguard access to resources and access to those resources must be whitelisted. Depending on the resources, there can be many whitelists and subscriptions, or domains can be whitelisted for root folders.

Description: We begin with a root folder that can be environment-specific and includes Deployment subscriptions and Storage subscriptions. Adding a subscription to this root folder under one of the categories is equivalent to whitelisting that subscription for access to resources. Similarly, there can be many paths granting access and the subscription may need to be added to all these paths. Even new regions can be part of the path and adding a subscription to the new region grants access based on this whitelist. A whitelist can be followed up with an approval service to complete the addition.

A whitelist can be used together with role-based access control. For example, setting the Azure login context to the given subscription can then be used to find the service principal and the role to which the principal needs to be added. The service principal of an app can be added to the storage key operation service role. Similarly, security group-based role assignments can be created. This completes the access control for the resources.

At the resource level, we can take the example of a storage account, a commonly-used and critical resource for many services. The secret management system may have a path for all storage accounts and there would be a path specifier specifically for this storage account by name.

This specific whitelisting then proceeds with the following steps:

Step 1. Determine the rootPath for the storage account and the subscriptionId that needs to be added.

Step 2. Use the DSMSProxy to check if the rootPath folder exists.

Step 3. If Step 2 succeeds, add the subscriptionId to the rootPath folder.

Internal bool whitelist(string rootPath, Guid subscriptionId) {

If (DSMSProxy.Folders.Exists(rootPath)) {

DSMSProxy.Folders.AddToWhitelist(rootPath, subscriptionId);

return true;

}

throw new Exception(string.format(“{0} Not Found”, rootPath));

}

Thus, we see a Zookeeper like a strategy to maintain whitelists based on folder paths for resources that complement the RBAC control.

Wednesday, July 14, 2021

Lessons Learned from Region buildouts:

Introduction: This article summarizes some of the lessons learned from building new region capabilities for a public cloud. Many public and private cloud providers expand their geographical presence in terms of datacenters. This is a strategic advantage for them because it draws business from the neighborhood of the new presence. A geographical presence for a public cloud is only somewhat different from that of a private cloud. A public cloud lists regions where the services it offers are hosted. A region may have three availability zones for redundancy and availability and each zone may host a variety of cloud computing resources – small or big. Each availability zone may have one or more stadium sized datacenters. When the infrastructure is established, the process of commissioning services in the new region can be referred to as buildouts. This article mentions some of the lessons learned in automating new region buildouts.

Description: First, the automation must involve context switching between the platform and the task for deploying each service to the region. The platform co-ordinates these services and must maintain an order, dependency and status during the tasks.

Second, the task of each service itself is complicated and requires definitions in terms of region-specific parameters to an otherwise region agnostic service model.

Third, the service must manifest their dependencies declaratively so that they can be validated and processed correctly. These dependencies may be between services, on external resources and the availability or event from another activity.

Fourth, the service buildouts must be retry-able on errors and exceptions otherwise the platform will require a lot of manual intervention which increase the cost

Fifth, the communication between the automated activities and manual interventions must be captured with the help of the ticket tracking or incident management system

Sixth, the workflow and the state for each activity pertaining to the task must follow standard operating procedures that are defined independent of region and available to audit

Seventh, the technologies for the platform execution and that for the deployments of the services might be different requiring consolidation and coordination between the two. In such case, the fewer the context switches between the two the better.

Eighth, the platform itself must have support for templates, event publishing and subscription, metadata store, onboarding and bootstrapping processes that can be reused.

Ninth, the platform should support parameters for enabling a region to be differentiated from others or for customer satisfaction in terms of features or services available.

Tenth, the progress for the buildout of new regions must be actively tracked with the help of tiles for individual tasks and rows per services.

Conclusion: Together, these are only some of the takeaways for a new region buildout but they show case some of the issues to be faced and their mitigations.

Tuesday, July 13, 2021

Implementing AAA

Introduction: Implementing authentication, authorization, and auditing for software products.

Description: User interface and application programming interfaces must be secured so that only authenticated and authorized users can use the functionality. There are many protocols, techniques and signin experiences which vary depending on the target audience of the product just as much as the technology stack involved. This article talks about some of the main considerations when designing identity and authentication modules across a variety of products.

WebAPIs often use various authentication methods such as basic authentication, token authentication, or even session authentication. However, applications using the webAPIs may also require sessions to associate the resources for the same user using the same user-agent* (defined in RFC 2616). For example, consider the Cart API. To get the items in the shopping cart of a user the application may have to use the GET method on the Cart API as follows:
GET Cart/ API
“X-Merchant-ID: client_key”
“X-Session: <user session identifier>”
The application can also add or edit items in the shopping cart or delete it. This example illustrates the need for the application to associate an identifier for the scope of a set of webAPI calls.
This session identifier is usually obtained as a hash of the session cookie which is provided by the authentication and authorization server. The session identifier can be requested as part of the login process or by a separate API Call to a session endpoint. An active session removes the need to re-authenticate. It provides a familiar end-user experience and functionality. The session can also be used with user-agent features or extensions to assist with authentication such as password-manager or 2-factor device reader.
The session identifier is usually a hash of the session. It is dynamic in nature, for external use and useful only for identifying something that is temporary. It can be requested based on predetermined client_secrets.
Sessions can time out or be explicitly torn down – either by the user or by the system forcing re-authentication. Therefore, session management must expire/clear the associated cookie and identifiers.
Sessions will need to be protected against csrf attack and clickjacking just as they are for other resources.
Sessions are treated the same as credentials in terms of their lifetime. The user for the session can be looked up as follows:
{HttpCookie cookie = HttpContext.Current.Request.Cookies[FormsAuthentication.FormsCookieName];
FormsAuthenticationTicket ticket = FormsAuthentication.Decrypt(cookie.Value);
SampleIdentity id = new SampleIdentity(ticket);
GenericPrincipal prin = new GenericPrinicipal(id, null);
HttpContext.Current.User = prin;}

Micro-services and their popularity are a demonstration of the power of APIs especially when the callers are mobile applications and personal desktops. Many companies implement microservices as NodeJS or Django web applications with OAuth and SSO. I have not seen the use of CAS in companies, but I have seen it in educational institutions. Django for instance brings the following functionality:

django - social auth - which makes social authentication simpler.
Pinax - which makes it popular for websites.
django-allauth which integrates authentication, addressing, registration, account management as well as 3rd party social account.
django-userena which makes user accounts simpler
django-social registration which combines OpenID, OAuth and FacebookConnect
django-registration which is probably the most widely used for the framework
django-email-registration which claims to be very simple to use and other such packages.
These implementations are essentially to facilitate the user account registration via templated views and a database or other membership provider backends.

There are other implementations as well such as EngineAuth, SimpleAuth and AppEngine-OAuth-Library. EngineAuth does the multiprovider authentication and saves the userid to a cookie.
SimpleAuth supports OAuth and OpenID. AppEngine-OAuth now provides user authentication against third party websites.

NodeJS style implementation evne allows the use of the providers as strategy in addition to bringing some of the functionalities as described above for Django. If we look at a ‘passport’ implementation for example, I like the fact that we can easily change the strategy to direct against the provider of choice. In fact, the interface is something that makes it quite clear.
Methods used are like
app.get('/login', function(req, res, next)) {
passport.authenticate('AuthBackendOfChoice', function (req,res, next) {
:
etc.
Additional methods include:
var passport = require('passport') , OAuthStrategy = require('passport-oauth').OAuthStrategy; passport.use('provider', new OAuthStrategy({ requestTokenURL: 'https://www.provider.com/oauth/request_token', accessTokenURL: 'https://www.provider.com/oauth/access_token', userAuthorizationURL: 'https://www.provider.com/oauth/authorize', consumerKey: '123-456-789', consumerSecret: 'shhh-its-a-secret' callbackURL: 'https://www.example.com/auth/provider/callback' }, function(token, tokenSecret, profile, done) { User.findOrCreate(..., function(err, user) { done(err, user); }); } ));
There seems to be little or no django-passport implementation in the source repositories or for that matter any python-passport implementation.
Netor technologies has a mention for something that's same name and is also an interesting read.
For example, they create a table to keep the application_info and user_info. The application_info is like the client in the OAuth protocol. In that it keeps track of the applications as well as the user information. The user information is keeping track of usernames and passwords. The user_applications is the mapping between the user and the applications.
The authentication is handled using a Challenge Response scheme. The server responds with the user's password salt along with a newly generated challenge salt and a challenge id. The client sends back a response with the hash resulting from hash(hash(password+salt) + challenge). These are read by the server and deleted after use. There is no need to keep them.
The code for create user looks like this:
def create(store, user, application = None):
      if application is None:
          application = Application.findByName(unicode('passport'))
      result = UserLogin.findExisting(user.userid, application.applicationid)
      if result is None:
              result = store.add(UserLogin(user, application))
              store.commit()
      return result
and the authentication methods are handled in the controllers. The BaseController has the method to get user login and the ServiceController has the method to authenticate via a challenge.
This seems a clean example for doing a basic registration of user accounts and integrating with the application.

Regarding audit, most membership providers and their corresponding data stores can help turn on data capture and audit trails.

It is also possible to send tags and key-value pairs in parameters to webAPI calls which significantly enhance any custom logic to be built. For example,

[Fact]

public void TestPost3()

{

var httpContent = new StringContent("{ \"firstName\": \"foo\" }", Encoding.UTF8, "application/json");

var client = new HttpClient();

var result = client.PostAsync("http://localhost:17232/api/Transformation/Post3", httpContent).GetAwaiter().GetResult();

}

[HttpPost]

[ActionName("Post3")]

public void Post3([FromBody]IDictionary<string, string> data)

{

// do something

}

Conclusion: IAM is not a canned component that can be slapped on all products without providing a rich set of integrations and leveraging the technology stacks available. While some vendors make it easy and seamless to use, the technology stack underneath is complex and implements a variety of design requirements.

Monday, July 12, 2021

Addendum:

This is a continuation of the article on NuGet packages. Their sources and resolutions.

The reference to having a single package source to eliminate source that have duplicate packages is enforceable by an organization to bolster the security and integrity of packages used to build the source code. The organizations can include a list of registries behind the feed that can be used to source packages both internal and external and enforcing that the developers use only one feed enables them to consolidate all requests through the controlled feed. This is a desirable pattern and one that alleviates concerns of uncontrolled packages from different sources and eventually polluting the source code asset of the organization.

The developers also have a lot of tools for this purpose. First, the NuGet executable allows listing of packages along with the source. The list command can be used to browse the packages in the remote folder.

Similarly, the local command can be used with the NuGet Executable to view all the local caches to which the packages were downloaded. This is a very useful mechanism for trial and error. A developer can choose to clear all the packages and reinitiate the download. This is useful to try with different package feed and sources and narrows down the problem space in half for troubleshooting package dependencies.

It is also possible to find out the dependency tree for the assemblies referenced via packages. Although this is not directly supported by the tool used to list the packages and their locations, it is easy to walk the dependencies iteratively until all the dependencies have been enumerated. Visited dependencies do not need to be traversed again. The site MyGet.org allows these dependencies to be visualized with reference to their feed but when drawing the dependency tree for a project, neither the compiler nor the NuGet executable provides that option as opposed to those available for other languages.

A sample method that relies on built-in functions to eliminate dependencies already visited looks somewhat like this:

static void OutputGraph(LocalPackageRepository repository, IEnumerable<IPackage> packages, int depth)

{

foreach (IPackage package in packages)

{

Console.WriteLine("{0}{1} v{2}", new string(' ', depth), package.Id, package.Version);

IList<IPackage> dependentPackages = new List<IPackage>();

foreach (var dependencySet in package.DependencySets)

{

foreach (var dependency in dependencySet.Dependencies)

{

var dependentPackage = repository.FindPackage(dependency.Id, dependency.VersionSpec, true, true);

if (dependentPackage != null)

{

dependentPackages.Add(dependentPackage);

}

OutputGraph(repository, dependentPackages, depth += 3);

}

Courtesy: Stackoverflow.com

If the visited needs to be tracked by caller, then the code would follow the conventional depth-first search:

DFS ( V, E)
For each vertex v in V
       V.color=white
       V.d = nil
Time = 0
For each vertex v in V:
       If v.color == white:
              DFS-Visit (V, E)

DFS-VISIT (V,E, u)
time = time + 1
   u.d = time
   u.color = gray
   foreach vertex v adjacent to u
        If v.color == white
           DFS-VISIT(V,E,v)
         Else
               If v.d <= u.d < u.f <= v.f throw back edge exception.
u.color = black
time = time + 1
u.f = time

Sunday, July 11, 2021

A note about naming convention:

One of the hallmarks of good automation is the granularity of reusable actions. The code behind these actions, their associated artifacts, resources and even configurations must be specific to the action. With many actions, it might become hard to locate any one of them. A good naming convention overcomes this hurdle. Many developers are very particular about the style used in their code. There are even helper files like stylecop.json that help with bringing consistency to how code is written by different developers, so they look the same across contributions. Similarly, a naming convention brings order to the madness from proliferating code snippets in the automation recipes. When a playbook uses consistent naming convention, it becomes more readable and easier to maintain and use. Top notch automations will always bear readable and consistent naming.

There are quite a few conventions to choose from. There’s Camel case where the first letter of every word is capitalized with no spaces or symbols between words like in UserAccount. There’s also snake case where the words are separated by an underscore such as in user_account. The Kebab case is like the Snake case but overcomes the difficulty of using special characters with certain systems by merely replacing the underscore with a hyphen. The Hungarian convention uses lowercase letters to indicate the intention first before using the name with Pascal case such as in strUserAccount.

The use of different conventions is necessary for different purposes in the system. For example, Camel case or Pascal case is widely popular for readability across languages such as Pascal, Java and .NET. Resources in environments such as the private cloud or the public cloud use the Kebab case.

Architecture for any automation system is enhanced by its ability to introduce entities into existing collections. When these collections have proper naming schemes with approriate prefix/suffix, the name alone is sufficient to give enough information about the entity without having to look it up. This is a real saving in costs in addition to the convenience it brings. The use of naming conventions must be practiced diligently.

Saturday, July 10, 2021

The NuGet package resolution:

Introduction: This article introduces us to NuGet which is an essential tool for modern development platforms. It is a mechanism through which developers create share and consume useful code since the code is bundled along with its DLL and package information. A NuGet package is a single zip file with a .nupkg extension. It contains code as well as the package manifest. Organizations support sharing and publishing NuGet packages to a global, central, and public repository by the name nuget.org. The public repository can be complemented with a private repository. The package can be uploaded to either the public or private hosts when the package is downloaded the corresponding source is specified. The flow of packages between creators, hosts, and consumers is discussed next. The central repository has over 100,000 unique packages and they are frequently downloaded by developers every day. With a such large collection of packages, package browsing, lookups, identification, versioning, and compatibility must be clearly called out. This must be done in both the public and the private repository. When the package is downloaded, it is downloaded to the local cache on the developer’s system. When the application is built, these dependencies can be copied over to the target folder where the compiled code is dropped. This allows for the assembly to be found locally and loaded locally thus creating an isolation mechanism between applications referencing those packages on the same host. Enabling this isolation of application dependencies or assemblies is one way in which applications can safeguard that they will work on any host. One of the frequently encountered routines with package consumption is that the application must always use the same compatible versions of the packages. When the dependencies are updated to the next versions, sometimes, they don't work well together. In order for the application to continue working, it must maintain compatibility between the application dependencies which can be set once and maintained as and when the packages are updated. The initial assembly compatibility is ironed out at the build time in the form of compilation failures and resolutions. Subsequent package updates must always target incremental and higher versions than when they were initially built. If the version increment causes a compatibility break, the application has the choice to remain at the current version of the assembly and wait for subsequent versions that fix. The dependencies must be declared clearly with the project file used to contain and build the source. The version compatibility can be made more deterministic with the help of versions associated with those packages as well as their substitution policies. Packages might get updated for many reasons both from the publisher for the sake of defects found and fixed by the publisher as well as by revisions recommended from the common vulnerabilities database. Code and binary analysis tools help with these recommendations for the packages to be updated. Different versions of the package can coexist side-by-side for the same host as long as they are located in different folders. Package versioning and automatic redirect of versions are some techniques that help with this versioning when some or all of the packages get updated for an application. The manifests of package dependencies with their versions as well as the binding redirects in the application configuration file enable this compatibility to be maintained on revisions such that the application will always compile and run on any host. Certain assemblies are part and parcel of the runtime that is required to execute the application. These system assemblies are also resolved in the same way as application dependency when they are specified by targets except that their location is different from the local NuGet package cache. The system assemblies are bundled in different target frameworks such as the .Net core and .Net framework. The target framework with a moniker. such as 'netcoreapp3.1', provides an exhaustive collection of system assemblies that make it universal for applications to run and provide all the features it might expect. The target framework with a moniker, such as 'net48', is more lightweight and geared towards the portability of applications and the latest features of the runtime. With fewer assemblies, they have a smaller footprint. When the dependencies are not found, the package sources must be provided to download these dependencies. Once the packages are downloaded, they will be resolved and loaded. As discussed earlier external and internal package sources might both be required within the development environment to resolve different assemblies. Certain organizations restrict the package sources to just one otherwise it might violate the point of origin of the package and spurious packages might be introduced into the build and application binaries. They provide a workaround to target different package sources via proxying but they rely on a single authoritative source to control the overall package sourcing. The transparency in the package resolution from its source is a security consideration rather than a functionality consideration and is a matter of policy. When the assemblies are downloaded, a different set of security considerations come into play. The package cache on the local host is subject to safeguarding just like any other data asset. A developer can choose to clean the cache and start downloading all over again to improve package health and hygiene during rebuilds. This technique is called restore and it is available as an option during builds. Since the destination of the packages on the localhost where the code is built and run, is always the file system, there might be issues locating them even after the dependencies are downloaded. The resolution of the path where the assemblies are found and loaded from must also be made deterministic. There are two ways to go about this – one technique is the mention of the specific path for the assembly and the second option is to register it to the global assembly cache which is unique to every host and must maintain system assemblies. Registering the assembly to the global assembly cache provides a way to resolve them independent of the file system but it interferes with the application isolation policy. Finally, assemblies loaded into the runtime might still be incorrect and some troubleshooting might need to be done. This is not always easy but there are techniques to help with this case. The assembly loading process can be made more transparent by requiring the app domain to log to the console on how it finds and loads this assembly. Standard events can be bound and the state of the app domain can be viewed. If the name and version of the assembly are not enough to know about it, the assembly can also be enquired about its types by a technique known as reflection. The resolving of the assembly location and the order of the loading of assemblies can be made transparent with help of the assembly log viewer. The log viewer writes to a well-known location which can also be customized. The level of logging can be set with the help of registry settings on the host but care must be taken to not turn on logging for an extended period of time. Since the loading of assemblies is quite a common occurrence the logging must be turned on or off only for the duration of investigation otherwise the logs will typically tend to grow very large very quickly. Together with all these techniques, a developer can not only consume packages from well-known source but also publish packages for others to use in a more deterministic manner.

Friday, July 9, 2021

A note about automation continued...

The landscape of automation has also evolved. At one time, they were bound to the hosts and the programmability offered by the components on the hosts. In the Linux world, automation relies on shell scripts often invoked with SSH. In the Windows world, PowerShell added SSH support recently. Cross-platform support is still lacking but organizations have inventory and core functionalities deployed on both platforms. Fortunately, more and more automation now rely on microservice APIs for programmatic and shell-based access (think curl commands) to features that are not limited to the current host

Public cloud computing infrastructure hosts customer workloads as well as the ever-increasing portfolio of services offered by the cloud to its customers. These services from publishers both external and internal to the cloud require automation over the public cloud. They write and maintain this logic and bear the cost of keeping up with the improvements and features available for this automation logic. This article investigates the implementation platform for a global multi-tenant automation-as-a-service offering from the public cloud.

Multi-tenancy and software-as-a-service model is possible only with a cloud computing infrastructure. The automation logic for a service for a cloud differs significantly from that for a desktop. A cloud expects more conformance than a desktop or enterprise automation justifying the need for a managed program. As Cloud service developers struggle to keep up with the speed of software development for cloud-savvy clients, they face automation as a necessary evil that draws their effort from their mission. Even when organizations pay the cost up upfront in the first version released with a dedicated staff, they realize that the cloud is evolving at a pace that rivals their own release timeframes. Some may be able to keep up with the investments year after year but for most, this is better outsourced so that they spend less time on rewriting with newer automation technologies or embracing the enhancements features to the cloud.