Cluster computing

Saturday, May 21, 2022

Azure Service Fabric Cluster Instance and Replicas:

This is a continuation of the Azure Service Fabric articles with most recent one as included here. The Azure Service Fabric instance allows specifying the TargetReplicaSize and the MinReplicaSetSize for a stateful service. The first determines the number of replicas that the system creates and maintains for each replica set of a service. The MinReplicaSetSize is the minimum allowed number of replicas for each replica set of a service.

These two parameters allow a configuration that permits two concurrent failures to occur without the partition going in a quorum loss. That situation can happen when there’s one planned failover upgrade bringing node or replica down and one unplanned failover such as when node crashes.

If the TargetReplicaSetSize = 5, MinReplicaSetSize = 3, then without failures, there will be five replicas in the view of the replica set and if failures occur, then the ServiceFabric will allow a decrease until it reaches MinReplicaSetSize.

ServiceFabric uses the majority quorum of the number of replicas maintained in this view and that is the minimum level of reliability for the operation of the cluster. If the replicas follow below this level, then further writes will be disallowed. Examples of suboptimal configurations involve a quorum loss of TargetReplicaSetSize = 3 and MinimumReplicaSetSize = 2 or when both are equal.

Stateless services do not need to specify replicas. They can have their instances scaled to the same count as in a replica set size. An instance of a stateless service is a copy of the service logic that runs on one of the nodes of the cluster. An instance within a partition is uniquely identified by its instanceId.

The lifecycle of an instance is modeled in a cycle of transitions between Inbuild stage, Ready stage, Closing stage and Dropped stage with an occasional transition from Ready to Dropped.

The Inbuild stage of a ClusterResourceManager determines the placement for the instance and enters its lifecycle. The instance is started on the node. When it exits it transitions to the ready state. If the application host or node for this instance crashes, it transitions to the dropped state.

In the closing state, the Service Fabric is in the process of shutting down the instance on a node. When it completes the shutdown, it transitions to the dropped state. In the dropped state, the metadata maintained by the Service Fabric is marked to be deleted.

A replica of a stateful service is a copy of the service logic running on one of the cluster nodes. The replica lifecycle has a few additional stages: down, opening and stand-by. The down state is when the replica code is not running. The opening state is when ServiceFabric needs to bring the replica back up again. A standby stage is entered when a replica was down and is open now but has not entered the replica set. If the keep duration expires, the standby is discarded.

The role of a replica determines its function in the replica set which includes Primary, ActiveSecondary, IdleSecondary, None and Unknown.

Friday, May 20, 2022

Technical Service Guide – for Service Fabric Applications

Purpose:

The Service Fabric provisions applications as per the manifest. While applications can range in purpose, there is no substitute for deployments with high availability and without instances/replicas. ServiceFabric or SF for short makes this happen seamlessly, with scaling and with great monitoring. The Service Fabric framework enforces consistency and provides visibility across deployments of components, instances and replicas as the application goes through its lifecycle. While the SF hosting framework has been introduced with an article referred to here, this document describes the technical service guide used for troubleshooting SFs.

What are the tools for diagnosability in SF hosting framework? 

The tools for diagnosability of SF hosting framework execution are the logs and Service Fabric explorer which is a user interface for viewing the health of the cluster, nodes, applications, instance, and replicas. The inputs to the system are the manifests and executables which can be deployed successfully and result in the applications being launched correctly. The Service Fabric framework provides retries and configurations so that the ensemble and quorum are setup correctly for the applications.

In addition, there are several PowerShell cmdlets available from the Service Fabric module that makes diagnosing and Service Fabric deployment easier.

How to validate the manifest, executables and packages? 

Out of band validation of manifests, executables and packages are possible via the AzureDevOps (ADO) pipeline where the necessary validations are exercised at the time of build, test and creation of package. Many of the artifacts are parameterized and checked into a source code repository branch. So, the infusion of values to parameters must ensure that the artifacts are unique per deployment. The manifests file and parameter files contain different parameters that can determine the hosting model, the service or application distribution and the nodes. Specifying the configuration correctly greatly affects the outcome and the time to go live.

How to tell which application is being installed by the Service Fabric? 

The logs displayed on Service Fabric Explorer have detailed entries for each operation of the Service Fabric framework. A specific manifest used for provisioning will have a unique Id made from its parts that can be used to track and correlate the entries for a timeline of all actions taken on the manifest. The entries will bear this ID so it can be used as a filter for the log entries.

How to find if the Service Fabric application provisioning is supported in different clouds?

The Service Fabric is a framework that can allow the migration of an application type from local environment, private cloud and to public cloud as long as it can work correctly on one.

AuthN and AuthZ:

Application passes through the credentials of the user via the AAD auth client. In select cases, it may make a service-to-service call which works based on application id and certificate. In these cases, the troubleshooting revolves around the following cases. For the interactions between a deployment, a ServiceMap can come useful.

Issues around credential:

Application user interfaces usually make an HTTP request which bears an authentication header. This has a bearer token when the request needs to be authenticated and authorized. The token is issued to an identity and Windows lists many forms of identity including but not restricted to UserCredential, ApplicationCertificateThumbprint, ApplicationClientId, ApplicationKey, ApplicationNameForTracing, ApplicationToken, Authority, EmbeddedManagedIdentity, ApplicationCertificateSubjectDistinguishedName, ApplicationCertificateIssuerDistinguishedName, and ApplicationCertificateSendPublicCertificate. In all of these cases, the security token provider service provides a resolution of whether a token was successfully issued. The remote server will accept this token or reject with an Unauthorized error. In this case, the resolution is to check that the caller is part of the security group and the certificates or applications used have not expired.

Issues around role:

Frequently, the controllers for security such as the external service controller can reject these if the role does not have sufficient permissions. For example, a platform service administrator role might be required to execute privileged operations as part of the Service Fabric application provisioning. These can be mitigated as per the powershell commands to add role

Issues around claims:

In some cases, Service Fabric application provisioning might require a claim to be added to the principal. This can also be done with the help of an application.

Troubleshooting: 

Logs and activity reports on the Service Fabric Explorer cover one or multiple manifests that had been processed by the task. A "ServiceError" term can refer to most of the errors. When fixing the issues, the caller must verify that the application has all the things it needs.

Error message "System.Net.WebException: Port already in use ---> System.Net.Sockets.SocketException: The port is already in use" errors are usually transient and the system will launch additional replicas until the minimum number is active.

Cause and fix: Ensure that the configuration is set up correctly as per the last article on the ServiceFabric cluster.

Error message An application failed to start or exit the Inbuild stage

Cause and fix: It is highly recommended to view the associated logs on the corresponding node of the application so that it can describe the steps that need to be taken.

Error message Certificates are missing.

Cause and fix: This is usually a packaging error because the certificates on a cluster node must be made available for the application to launch.

Escalation path: 

For issues that have not been resolved by this document, please send mail to support@acme.com or follow up on the support channel: <link-to-support-channel>. 

Thursday, May 19, 2022

Multitenancy Part 3

This is a continuation of a series of articles on Microsoft Azure from an operational point of view that surveys the different services from the service portfolio of the Azure public cloud. The most recent article discussed architecting multitenant applications on Azure. This continues to discuss tenant model and lifecycle.

The choice of tenancy models is very important to designing the multi-tenant architecture. There are primarily two models - the business-to-business model and the business-to-consumer model. The former requires tenant isolation for organizations, divisions, teams, and departments while the latter is about individual consumers. The business-to-consumer model must respect privacy and security for the data and the business-to-business model must respect regulatory compliance.

Tenants can be distinguished between logical and physical tenants. When the scale of tenants increases, one of the relieving measures taken is to replicate the solution or some of its components to meet the increased demand. The load from one single instance may then be spilled over to another or the traffic can be mapped to infrastructure based on certain criteria. In a B2C model, each user can be a separate logical tenant. They can be mapped to different physical tenants using different deployed instances. This results in a one-many mapping between logical and physical tenants. When compared to the B2B model, the definition of the logical tenant becomes clearer. In a B2B model, the resources for a firm are isolated from the start. In this case, the logical and the physical tenants mean the same.

One of the key differences between logical and physical tenant is how the isolation is enforced. When multiple logical tenants share a single set of infrastructure, it relies on application code and a tenant identifier in the database to keep each tenant’s data separate. Physical tenants have their own infrastructure so code running on them find it less important to be aware that this is a multi-tenant environment. Physical tenants are also referred to as deployments, supertenants, or stamps.

Tenant isolation can run deep. For example, having a single set of shared infrastructure, with separate instances of the application and separate databases for each tenant or sharing some common resources while keeping other resources separate for each tenant or keeping data on a separate physical infrastructure. Separate resources for each tenant is a practice for public cloud as well and often translates to separate physical infrastructure using dedicated hosts.

The tenant lifecycle depends on the tenant. Solutions that are software-as-a-service may want to honor customer requests for trials with a trial tenant. Questions about rigor for trial data, infrastructure for trial tenants, purchase option after trials and limits imposed on trial tenants must be answered. Regular tenants can be onboarded as the first step of their lifecycle which involves routines for allocation and initialization that could also be automated, setting up protection of data, meeting compliance standards, preparing for disaster recovery and setting up pricing options and billing models. If the customers require a pre-production environment, onboarding might be different since expectations around availability might be relaxed.

Reference to multitenancy: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=aWj2Z0  

Wednesday, May 18, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

Social engineering applications provide a wealth of information to the end-user, but the questions and answers received on it are always limited to just that – social circle. Advice solicited for personal circumstances is never appropriate for forums which can remain in public view. It is also difficult to find the right forums or audience where the responses can be obtained in a short time. When we want more opinions in a discrete manner without the knowledge of those who surround us, the options become fewer and fewer. In addition, crowd-sourcing the opinions for a personal topic is not easily available via applications. This document tries to envision an application to meet this requirement. 

The previous article continued the elaboration on the usage of the public cloud services for provisioning queue, document store and compute. It talked a bit about the messaging platform required to support this social-engineering application. The problems encountered with social engineering are well-defined and have precedence in various commercial applications. They are primarily about the feed for each user and the propagation of solicitations to the crowd. The previous article described selective fan out. When the clients wake up, they can request their state to be refreshed. This perfects the write update because the data does not need to be sent out. If the queue sends messages back to the clients, it is a fan-out process. The devices can choose to check-in at selective times and the server can be selective about which clients to update. Both methods work well in certain situations. The fan-out happens in both writing as well as loading. It can be made selective as well. The fan-out can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which clients are active over a period so that only those clients get preference.   

In this section, we talk about Chatty I/O antipattern. When I/O requests are frequent and numerous, they can have a significant impact on performance and responsiveness. Network calls and other I/O operations are much slower compared to compute tasks. Each I/O request has a significant overhead as it travels up and down the networking stack on local and remote and includes the round trip time, and the cumulative effect of numerous I/O operations can slow down the system. There are some common causes of chatty I/O which include:

Reading and writing individual records to a database as distinct requests – When records are often fetched one at a time, then a series of queries are run one after the other to get the information. It is exacerbated when the Object-Relational Mapping hides the behavior underneath the business logic and each entity is retrieved over several queries. The same might happen on write for an entity.

Implementing a single logical operation as a series of HTTP requests. This occurs when objects residing on a remote server are represented as proxy in the memory of the local system. The code appears as if an object is modified locally when in fact every modification is coming with at least the cost of the RTT. When there are many networks round trips, the cost is cumulative and even prohibitive. It is easily observable when a proxy object has many properties, and each property get / set requires a relay to the remote object. In such case, there is also the requirement to perform validation after every access.

Reading and writing to a file on disk – File I/O also hides the distributed nature of interconnected file systems. Every byte written to a file on a mount must be relayed to the original on the remote server. When the writes are several, the cost accumulates quickly. It is even more noticeable when the writes are only a few bytes and frequent.

There are several ways to fix the problem. They are about detection and remedy. When the number of I/O requests are many, they can be batched into coarse requests. The database can be read with one query substituting many queries. It also provides an opportunity for the database to execute it better and faster. Web APIs can be designed with the REST best practices. Instead of separate GET method for different properties there can be single GET method for the resource representing the object. Even if the response body is large, it will likely be a single request. File I/O can be improved with buffering and using cache. Files may need not be opened or closed repeatedly. This helps to reduce fragmentation of the file on disk.

When more information is retrieved via fewer I/O calls, there is a risk of falling into the extraneous fetching antipattern. The right tradeoff depends on the usages. It is also important to read only as much as necessary to avoid both the size and the frequency of calls. Sometimes, data can also be partitioned into two chunks, frequently accessed data that accounts for most requests and less frequently accessed data that is used rarely. When data is written, resources need not be locked at too large a scope or for longer duration.

Tuesday, May 17, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

In this section, we talk about extraneous fetching. When services call datastores, they retrieve data for a business operation, but they often result in unnecessary I/O overhead and reduced responsiveness. This antipattern can occur if the application is trying to save on the number of requests by fetching more than required. This is a form of overcompensation and is commonly seen with catalog operations because the filtering is delegated to the middle tier. For example, user may need to see a subset of the details and probably does not need to see all the responses at once yet a large dataset from the campaign is retrieved. Even if the user is browsing the entire campaign, paginating the results avoids this antipattern.

Another example of this problem is the inappropriate choices in design or code where for example, a service gets all the response details via the entity framework and then filters only a subset of the fields while discarding the rest. Yet another example is when the application retrieves data to perform an aggregation such as a count of responses that could be done by the database instead. The application calculates total sales by getting every record for all orders sold instead of executing a query where the predicates are pushed down to the store. Similarly other manifestations might come about when the EntityFramework uses LINQ to entities. In this case, the filtering is done in memory by retrieving the results from the table because a certain method in the predicate could not be translated to a query. The call to AsEnumerable is a hint that there is a problem because the filtering based on IEnumerable is usually done on the client side rather than the database. The default for LINQ to Entities is IQueryable which pushes the filters to the data source.

Fetching only the relevant columns from a table as compared to fetching all the columns is another classic example of this antipatterns and even though this might have worked when the table was only a few columns wide, it changes the game when the table adds several more columns. Similarly, aggregation performed in the database overcomes this antipattern instead of doing it in memory on the application side.

As with data access best practice, some considerations for performance holds true here as well. Partitioning data horizontally may reduce contention. Operations that support unbounded queries can implement pagination. Features that are built right into the data store can be leveraged. Some calculations need not be repeated especially with summation forms. Queries that return a lot of results can be further filtered. Not all operations can be offloaded to the database but those where the database is highly optimized can be offloaded.

A few ways to detect this antipattern include identifying slow workloads or transactions, behavioral patterns exhibited by the system due to limits, correlating the instances of slow workloads with those patterns, identifying the data stores being used, identify any slow running queries that reference these data source and performing a resource specific analysis of how the data is used and consumed.

These are some of the ways to mitigate this antipattern.

Some of the metrics that help with detecting and mitigation of extraneous fetching antipattern include total bytes per minute, average bytes per transaction and requests per minute.

Monday, May 16, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

In this section, we talk about monolithic persistence antipattern that must be avoided. This antipattern occurs when a single data store hurts performance due to resource contention. Additionally, the use of multiple data sources can help with virtualization of data and query.

A specific example of this antipattern is when the crowdsourced application gets transactional records, logs, metrics and events to the same database. The online transaction processing benefits from a relational store but logs and metrics can be moved to a log index store and time-series database respectively. Usually, a single datastore works well for transactional data but this does not mean documents need to be stored in the same data store. A blob store or document database can be used in addition to a regular transactional database to allow individual documents to be shared without any impact to the business operations. Each document can then have its own web accessible address.

This antipattern can be fixed in one of several ways. First, the data types must be listed, and their corresponding data stores must be assigned. Many data types can be bound to the same database but when they are different, they must be passed to the data stores that handles them best. Second, the data access patterns for each data type must be analyzed. If the data type is a document, a CosmosDB instance is a good choice. Third, if the database instance is not suitable for all the data access patterns of the given data type, it must be scaled up. A premium sku will likely benefit this case.

Detection of this antipattern is easier with the monitoring tools and the built-in supportability features of the database layer. If the database activity reveals significant processing, contention and very low data rate, it is likely that this antipattern is manifesting.

Examine the work performed by the database in terms of data types which can be narrowed down by callers and scenarios, may reveal just the culprits that are likely to be causing this antipattern

Finally, periodic assessments must be performed on the data storage tier.

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

Examine the work performed by the database in terms of data types which can be narrowed down by callers and scenarios, may reveal just the culprits that are likely to be causing this antipattern

Finally, periodic assessments must be performed on the data storage tier.