Cluster computing

Thursday, July 1, 2021

A revisit to event-driven systems:

Introduction: Software applications often make use of event-driven systems in addition to synchronous calls between components and services. This article revisits some of these discussions.

Description: First, let us define the event-driven system. It is state based. When the state changes, the system is expected to take certain actions following a finite automaton. This kind of system knows only how to proceed from one state to another and the state transitions must be determined beforehand and validated to not run into endless loops or no activity. The states hold meaning for the application calling the system.

There are several advantages to an event-driven system. Protocol handlers are a notable example of such a system. They do not have to remember the caller or maintain a session and can be entirely event-based. Message and requests arriving over the wire can be a representation of control plane state transition requests. This form of system is highly scalable.

Events must be well-defined. Their scope and granularity determine the kind of actions taken. Event-driven systems can perform stateful processing. They can persist the states to allow the processing to pick up where it left off. The states also help with fault tolerance. This persistence of state protects against failures including data loss. The consistency of the states can also be independently validated with a checkpointing mechanism available from Flink. The checkpointing can persist from the local state to a remote store. Stream processing applications often take in the incoming events from an event log. Therefore, this event log stores and distributes event streams written to durable append-only log on tier 2 storage where they remain sequential by time. Flink can recover a stateful streaming application by restoring its state from a previous checkpoint. It will adjust the read position on the event log to match the state from the checkpoint. Stateful stream processing is therefore not only suited for fault tolerance but also reentrant processing and improved robustness with the ability to make corrections. Stateful stream processing has become the norm for event-driven applications, data pipeline applications, and data analytics applications.

Wednesday, June 30, 2021

Kusto queries in Azure public cloud

Kusto queries in Azure public cloud.

Query language become as popular as shell scripting for data scientist and analytics professionals. While SQL has been universally used for a wide variety of purposes, it was always called out for its shortcomings with respect to features available in shell scripting such as data pipelining and a wide range of native commands that can be used in stages between an input and a desired output. Pipelining is the term referring to the intermediary stages of processing where the output from one operator is taken as the input to another. PowerShell scripting bridged this gap with a wide variety of cmdlets that can be used in pipelined aka vectorized execution. While PowerShell enjoys phenomenal support in terms of the library of commands from all Azure services, resources, and their managers, it is not always easy to work with all kinds of data and makes no restrictions to the scope or intent of the changes to the data. Kusto addresses this is specifically by allowing only read operations on the data and requiring the use of copying into new tables, rows, and columns for any editing. This is the hallmark of Kusto’s design. It separates the read only queries from the control commands that are used to create the structure to hold intermediary data.

Kusto is popular both with Azure monitor as well as Azure data explorer. It is a read only request to process data and returns results in plain text. If uses a data flow model that is remarkably like the slice and dice operators in the shell commands.IT can work with structured data with the help of tables, rows, and columns but it is not restricted to schema-based entities. It can be applied to unstructured data such as telemetry data. It consists of sequence of statements delimited by semicolon operator and has at least one tabular query operator. The name of a table is sufficient to stream the rows to a pipeline operator that separates the filtering into its own stage with the help of a SQL like where clause. Sequences of where clauses can be chained to result in a more refined set of resulting rows. It can be as short as a tabular query operator, a data source, and a transformation. Any use of new tables, rows and columns require the use of control commands that are differentiated from Kusto queries because they begin with a dot character. The separation of these control commands helps with security of the overall data analysis routines. Administrators will have less hesitation for Kusto queries to run on their data. Control commands also help to manage entities or discover their metadata. A sample control command is a “.show” command that shows all the tables in the current database.

One of the features of Kusto is its set of immediate-visualization chart operators that render the data in a variety of plots. While the scalar operators could summarize rows of data and were quite popular already with their equivalence to shell commands, the ability to project tabular data and summarize before pipelining to a chart operator makes it even more data friendly and popular to data scientists. These visualizations are not restricted to the timecharts and can include multiple-series, cycles, and data types. A join operator can combine two different data sources to make it more convenient to plot and visualize the results. Distributions and percentiles are easy to compute and often required for time slice windows involving metrics. Results can be assigned to variables with the help of let command and data from several databases can be virtualized behind a Kusto query.

Tuesday, June 29, 2021

Writing a PowerShell cmdlet in C# for Azure Service Bus entities.

Problem statement: Azure Service Bus is a Microsoft public cloud resource that is the equivalent of Message Brokers such as RabbitMQ, ZeroMQ and others that implement the Advanced Message Queuing (AMQP) protocol. The entities of the Azure Service Bus include queues and topics. A queue allows a producer and consumer to send and receive messages in first-in-first-out order. A topic allows the messages sent by one or more producers to be distributed to several subscriptions where subscribers are registered and receive those messages. This article describes some of the issues encountered when writing a PowerShell cmdlet for a migrating a service bus entity and their messages to another service bus.

Description: PowerShell is the language of automation especially for custom logic that does not have any support from the features or built-ins available directly from a resource. Usually, custom logic is reduced to a minimum but resources like products are focused on improving their offerings rather than their integration. Automation to migrate one service bus to another required the use of a built-in feature named geo-replication of ServiceBus. Geo-replication has two limitations. First, it replicates only the structure and not the content of the Service Bus entities. Second, it requires the source and target Service Bus to be hosted in different geographic regions that is quite unlike some of the other Azure resources. The latter can be overcome by replicating across one region and then replicating back to the same region. But it still does not resolve the message replication which is critical to have the second instance be like the first.

This calls for a custom program that implements the enumeration of service bus entities from one instance to another which is available from the SDK for NamespaceManager. The messages from these entities can be read with the help of the Software Development Kit (SDK) for Azure Service Bus. The trouble with these independent SDKs is that they require different versions of dependencies. One such dependency is the Windows.Powershell.5.reference assembly and the namespace manager has a specific version requirement for the assembly version of 4.0.4.1 while the Azure service bus SDK requires the use of much more recent versions. Changes to the program to allow for the use of the lowest common denominator in terms of versions is not a viable option because there is none.

The error encountered appears something like this:

Azure.Messaging.ServiceBus.ServiceBusException: Could not load file or assembly 'System.Runtime.CompilerServices.Unsafe, Version=4.0.4.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified. (ServiceCommunicationProblem) ---> System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime.CompilerServices.Unsafe, Version=4.0.4.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

Any binding redirect issued in the Application configuration to require runtime dependencies to be bound to the higher version also does not mitigate this error.

In addition, target frameworks have been changed from netstandard2.1 to netcoreapp3.1 and finally net48, which results in other kinds of assembly errors. These include:

FileNotFoundException: Could not load file or assembly 'Microsoft.Bcl.AsyncInterfaces, Version=1.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' or one of its dependencies. The system cannot find the file specified.

And System.AggregateException, HResult=0x80131500, Message=One or more errors occurred. and Source=mscorlib

Each of those framework options results in a lot of trial and errors. The Cmdlet also has dependencies other than the two mentioned above and any changes to the versioning also changes their versions.

Conclusion: Registering the required assembly to the Global Assembly Cache via offline mechanisms enables us to get past the error and although it is not a great experience, it enables the automation to run.

Monday, June 28, 2021

Writing project plan continued...

A summary of a five point plan to tackle new project is listed below:

Step 1: Definition:

This involves defining six elements:

1. Objectives: use Specific, Measurable, Achievable, Realistic and Time-bound (SMART) technique to set objectives with internal and external input.

2. Scope: While it is tempting to define what’s in scope, it’s usually the ones that are out of scope that is easier to write

3. Success criteria: determine the project success and failure as it pertains to business

4. Deliverables: list work items in detail, if possible

5. Requirements: This is about what you need as well as what the stakeholders need.

6. Schedule: A baseline schedule will have milestones and deadline.

Step 2: Identify risk assumptions and constraints: If the risk management cannot be planned ahead of the work items, designate a person to monitor for risks to the projects such as with time and cost.

Step 3: Organize the people for the project into customers, stakeholders and accountables with roles and responsibilities.

Step 4: List the project resources with a bill of materials including licenses, tools and any other external dependencies.

Step 5: Establish a project communications plan including a channel for team discussions, stakeholder discussions and customer discussions. Usually, these communications will elaborate on the preceding steps.

Conclusion: Defining the projects, identifying risks, assembling teams, gathering resources and providing communication channels will get a project off the ground. You will love it when a plan comes together.

Sunday, June 27, 2021

Writing a Project plan for a new project:

Introduction: This article talks about the preparation needed for getting started with a new software project that involves providing improvements to complex software systems. There are several considerations to be made. When the project starts new resources might have been assigned who would require ramp-up time. The mission and the execution plan of the improvements might not have any clarity and even summed up in a sentence. The relevant dependencies might not have been listed. The stakeholders might not all be in alignment with the agenda. The architectural enhancements for the delivery of the software features might not have been thought through. A lot of steps might need to be prioritized before the end goal is reached. This is a helpful introduction to the planning involved.

Description: One of the advantages of software development is that this is a beaten path with plenty of templates and learnings available from the past. The objective may be new and unclear but there are several tools available to put the plan in place. First among these is the project management tools and the Gantt chart. This tool helps not only with the distribution of tasks and alignment of the task with the timeline, but it also helps with determining cost, as well as the critical path for the completion of milestones. The timeline can be put in perspective based on a pegged release data and worked backward budgeting for tasks that can be squeezed in. When a new project begins, even the tasks are not enumerated sufficiently. The difficulty to put them on a chart or in sprint cycles and with milestones becomes difficult. This can be improved by gathering the stakeholders and the resources in the same room for multiple discussions. These discussions will empower them both to present the facts and keep the milestones practical. These could be done internally as well as with biweekly stakeholder meeting where incremental deliverables are realized via two-week sprint planning, meetings, and demonstrations of experiments and prototypes. The earlier these conversations take place, the better the planning and the outcome for the team. New technologies require significant investments in learning and the convergence happens better when there are introductions by subject matter experts from stakeholder’s organizations. It will be more equitable for everyone when it happens in a conference room.

Another tool that helps with this kind of software development is a functional specification document that has a well-known template for organizing thoughts. It starts with critical use cases that are articulated as stories involving a person who is trying to solve a problem. The narrative for the use case outlines both what is available to the user and what would have helped. The gamut of options and the impact of the feature requested become clearer with a well-written use case. It might involve a quantitative aspect to what needs to be done and elaborate on the kind of outcome that would best serve the user. When the objective becomes clearer this way, intermediate deliverables fall in place. There is no limit to the number or description of use cases if the end goal becomes clearer. The means and mode of delivering the features are usually not that hard to envision once the goal is in place. The delivery of the features takes time and the allotment of tasks to resources must consider internal and external factors that might alter the schedule.

One technique that helps with the deliverables is determining the priority and the severity. The former describes the relevance each item might have based on the feedback from the stakeholders. Usually, higher priority items are in the critical path for the release and must be tackled first. A backlog of items can help maintain the list that needs to be burned down while the priority determines their selection into the sprints. The severity is dependent on the impact of the task. If the task is touching data or configuration and potential to affect many users, it has a higher severity than others. It is somewhat easier to determine severity when the source code and architecture are well-understood. Sound architecture can handle evolving business needs with the least changes. The determination of the right design will help mitigate the high-severity tasks with ease. The design part of the software development lifecycle involves multiple perspectives, calculations, and validations. With a large team for the software development, it might be easy to distribute the design to virtual teams involving multiple or different hats for individuals rather than based on components alone. A DevOps cadence can help improve the mode in which the features are developed.

Another technique that helps with removing surprises and distractions is by handing out showcase user interfaces even before the actual functionality is delivered. This is a sound practice even for cases where it does not involve software. Such dry runs including emails to stakeholders can significantly eliminate noise and refine the expectations. Many tasks are done better when the communication is clear and involves the right audience. With more and more communication, it is possible to change course when things do not pan out. Mistakes are easy to occur, and corrective actions are easy to take when there is visibility.

Lastly, software development is about people and not just about process and practice. Changes, moderations, and selection of techniques depend largely on the intentions of those who benefit from them.

Saturday, June 26, 2021

Zone Down failure simulation:

Introduction: A public cloud enables geo-redundancy for resources by providing many geographical regions where resources can be allocated. A region comprises several availability zones where resources allocated redundantly across the zones. If the resources fail in one zone, they are switched out with those from another zone. A zone may comprise several datacenters each of which may be stadium-sized centers that provision compute, storage and networking. When the user requests a resource from the public cloud, it usually has 99.99% availability. When the resources are provisioned for zone-redundancy, their availability increases to 99.999%. The verification for resources to failover to an alternate zone when one goes down is key to measuring that improvement in availability. This has been a manual exercise so far. This article attempts to explore the options to automate the testing from a resource perspective.

Description:

Though they sound like Availability sets, the AZs comprise datacenters with independent power, cooling, and networking and the availability sets are logical groupings of virtual machines. AZ is a combination of both a fault domain as well as an update domain, so changes do not occur at the same time. Services that support availability zones fall under two categories: zonal services – where a resource is pinned to a specific zone and a zone-redundant service for a platform that replicates automatically across zones. Business continuity is established with a combination of zones and azure region pairs.

The availability zones can be queried from SDK, and they are mere numbers within a location. For example, az VM list-skus --location eastus2 --output table will list VM SKUs based on region and zones. The zones are identified by numbers such as 1, 2, 3 and these do not mean anything other than that the zones are distinct. The numbers don’t change for the lifetime of the zone, but they don’t have any direct correlation to physical zone representations.

There are ways in which individual zone-resilient services can allow zone redundancy to be configured.

When the services allow in-place migration of resources from zonal to zone-redundancy or changing the number of zones for the resource that the service provisions, the simulation of the zone down behavior is as straightforward as asking the service to reconfigure the resource by specifying exactly what zones to have. For example, it could start with [“1”, “2”, “3”] and to simulate a zone down the failure of “3”, it could be reprovisioned with [“1”, “2”] This in-place migration is not expected to cause any downtime for the resource because “1” and “2” continue to remain as part of the configuration. Also, the re-provisioning can be revolving around the zones requiring only the source and target zone pair and since there are three zones, that resource can always be accessed from one zone or the other.

Conclusion: Zone down can be simulated when there is adequate support from the services that provide the resource.

Reference the earlier discussion on this topic: https://1drv.ms/w/s!Ashlm-Nw-wnWzhemFZTD0rT35pTS?e=kTGWox

Friday, June 25, 2021

Learnings from Deployment technologies:

Introduction: The following article summarizes some learnings from deployment technologies that serve to take an organization’s software redistributable, package it and deploy it so that the software may run in different environments and as separate instances. They evolve from time to time and become narrowed to the environments to which they serve. For example, earlier there was WixSharp to build MS for installing and uninstalling applications or tarball on Linux for the deployment of binaries as an archive. Now, we have a lot more involved technologies that deploy to both on-premises and the cloud. Some of the salient learnings from such technologies are included here and this article continues from the previous discussion here.

Description:

An introduction to WixSharp might be in order. It is a language for writing Microsoft installers (MSI) that generates Wix extension files that describe the set of actions to be taken on a target computer for installing, upgrading, or rolling back a software. The artifacts are compiled with an executable called candle and therefore the artifacts have a rhyming file extension as Wix. WixSharp makes it easy to author the logic for listing all the dependencies of your application for deployment. It converts the C# code to wxs file which in turn is compiled to build the MSI. There are a wide variety of samples in the WixSharp toolkit. Some of them require very few lines to be written for a wide variety of deployment time actions. The appeal in using such libraries is to be able to get the task done sooner with few lines of code. The earlier way of writing and editing the WXS file was error-prone and tedious. This is probably one of the most important learnings. Any language or framework that allows precise and succinct code or declarations is going to be a lot more popular than verbose ones.

The second learning is about the preference for declarative syntax over logic. It is very tempting to encapsulate all the deployment logic in one place as a procedure. But this tends to become monolithic and takes away all the benefits of out-of-band testing and validation of artifacts. It is also involving developer attention as versions change. On the other hand, the declarative format expands the number of artifacts into self-contained declarations that can be independently validated.

The third learning is about the reduction of custom logic. Having several and involved custom logic for organizations defeats the purpose of a general-purpose infrastructure that can bring automation, consistency, and framework-based improvements to deployment. Prevention of custom logic also prevents hacks and makes the deployments more mainstream and less vulnerable to conflicts and issues. The use of general-purpose logic will help with enhancements that serve new and multiple teams as opposed to a single customer.