Cluster computing

Sunday, April 3, 2022

Migration plan for a cloud service to Service Fabric:

Goal: List some of the considerations towards the migration of a web service hosted on IaaS infrastructure to Service Fabric. A description of the summary and features of Service Fabric are shown here. This article focuses on architectural decisions and recommended practices.

The migration cost is usually a sliding scale between the extremes of no code changes to the existing service to refactoring for stateless microservices such that they can be run on serverless platforms. Proper choice of migration strategy can ensure a great deal of satisfaction for all stakeholders.

A typical approach to migrate existing workloads is the lift-and-shift strategy. For example, the workload can be provisioned directly to another VMs with network and storage components and deploy the existing applications onto those VMs. Another approach is to move the application to PaaS platform. The drawback of the lift-and-shift strategy is that it often results in overprovisioning and overpaying for compute resources.

The cost-effective way of running applications has been demonstrated by the container orchestration framework. Containerizing an existing application enables it to run on a cluster with other applications. It provides improvements in resource usages, dynamic scaling of instances, shared monitoring and DevOps.

Optimizing and provisioning the resources for containerization is not trivial. Service Fabric allows experimentation by scaling out the instances on demand. Both Windows and Linux applications can be migrated to a runtime platform without changing code and their instances can be scaled without overprovisioning VMs. The result is better density, better hardware use, simplified operations, and overall lower cloud-compute costs.

Even a large set of Windows-based web applications on erstwhile IIS hosting infrastructure can be migrated to Service Fabric with improved density, monitoring, consistency, and DevOps all within a secure extended private network in the cloud. The principle is to use Docker and Service Fabric’s containerization support that package and hosts existing web applications on a shared cluster with preconfigured monitoring and operations. This results in an optimal performance-to-cost ratio.

Saturday, April 2, 2022

Migration plan for a cloud service to Service Fabric:

Service Fabric is Microsoft’s distributed platform for building, running, and maintaining microservices applications in the cloud. It is a container orchestrator, and it can provide quality of service to microservice framework models such as stateful, stateless, and actor. It differs from Azure Container Service in that it is an Infrastructure-as-a-Service offering rather than a Platform-as-a-Service offering. There is also a Service Fabric Mesh offering that provides a PaaS service for Service Fabric applications. Service Fabric provides its own specific programming model, allows guest executables, and orchestrate Docker containers. It supports both Windows and Linux, but it is primarily suited for Windows. It can scale to handle Internet-of-Things traffic. It is open to workloads and is technology-agnostic. It relies on Docker and has supported both Windows and Linux containers, but it provides a more integrated feature-rich orchestration that gives more openness and flexibility.

The table of comparisons includes:

Area	Cloud Services	Service Fabric
Application composition	Roles	Services
Density	One role instance per VM	Multiple services in a single node
Minimum number of nodes	2 per role	5 per cluster, for production deployments
State management	Stateless	Stateless or stateful
Hosting	Azure	Cloud or on-premises
Web hosting	IIS	Self-hosting
Deployment model	Classic deployment model	Resource Manager
Packaging	Cloud service package files (.cspkg)	Application and service packages
Application update	VIP swap or rolling update	Rolling update
Autoscaling	Built-in service	Virtual machine scale sets for auto scale out
Debugging	Local emulator	Local cluster

A worker role can be mapped to a Service Fabric stateless service. A web role can also be mapped to a ServiceFabric stateless service. Unlike Web roles, service fabric does not support IIS. A service must move to a web framework that can be self hosted and then the service must be moved to Service Fabric

Worker role and service fabric Service API offer similar entry points. The key differences between the lifecycle of worker roles and Service Fabric services is that the Lifecycle is tied to the VM for a worker role and It is separate from the VM for a service fabric service.

The lifetime of a worker role is determined by when the Run method exits while the RunAsync method for service fabric can run to completion while the instance remains. The Http listeners are described in the service definition files for cloud services while they are specified in the service manifest for service fabric. The configuration for a Service Fabri Service comes from application manifest in the Application package, service manifest in the service package, and settings.xml from the configuration package. The settings can be saved in the setting.xml file, the application manifest can be used to override the setting, and the environment specific settings can be put into the parameter file.

Friday, April 1, 2022

Service Fabric Notes Summary

ServiceFabric is Microsoft’s distributed platform for building, running and maintaining microservices applications in the cloud. It is a container orchestrator and it is able to provide quality of service to microservice framework models such as stateful, stateless, and actor. It differs from Azure Container Service in that it is a Infrastructure-as-a-Service offering rather than a Platform-as-a-Service offering. There is also a Service Fabric Mesh offering that provides a PaaS service for Service Fabric applications. Service Fabric provides its own specific programming model, allows guest executables and orchestrate Docker containers. It supports both Windows and Linux but it is primarily suited for Windows. It can scale to handle Internet-of-Things traffic. It is open to workloads and is technology-agnostic. It relies on Docker and has supported both Windows and Linux containers but it provides a more integrated feature-rich orchestration that gives more openness and flexibility.

It is often compared to Kubernetes which is also a container orchestration framework that hosts applications. Kubernetes extends this idea of app+container all the way where the host can be nodes of a cluster. Cluster membership is maintained by requiring kubelets to maintain a connection with the API server to send heartbeats every period. Service Fabric avoids this with a decentralized ring and failure detection. It is motivated by the adage that distributed consensus is at the heart of numerous co-ordination problems, but it has a trivially simple solution if there is a reliable failure detection service.

Leader election differs from algorithm to algorithm. For example, Raft algorithm was proposed to address the long-standing issues with understandability of the widely studied Paxos algorithm. It has a clear abstraction and presentation and can be a simplified version of the original Paxos algorithm. Specifically, Paxos divides terms between servers whereas Raft allows a follower to become a candidate in any term, but followers will vote for only one candidate per term. Paxos followers will vote for any candidate, whereas Raft followers will only vote for a candidate if the candidate’s log is at least as up to date. If a leader has uncommitted log entries from a previous term, Paxos will replicate them in the current term whereas Raft will replicate them in their original term. Raft’s leader election is therefore lightweight when compared to Paxos. Service Fabric organizes nodes in rings and the heartbeats are only sent to a small subset of nodes called the neighborhood. The arbitration procedure involves more nodes besides the monitor but it is only executed on missed heartbeats. A quorum of privileged nodes in an arbitration group helps resolve the possible failure detection conflicts, isolate appropriate nodes, and maintain a consistent view of the ring. Membership and failure detection in Service Fabric relies on two key design principles: 1. A notion of strongly consistent membership and 2. Decoupling failure detection from failure decision. All nodes responsible for monitoring a node X must agree on whether X is up or down. When used in the SF-ring, all predecessors and successors in the neighborhood of a node X agree on its X’s status. The SF-Ring is a distributed hash table. It provides a seamless way of scaling from small groups to large groups. SF-Ring was developed at around the same time as P2P DHTs like Pastry, Chord, and others. SF-Ring is unique because 1) Routing table entries are bidirectional and symmetrical, 2) the routing is bidirectional, 3) Routing tables are eventually convergent, 4) there is decoupled mapping of Nodes and keys and 5) There are consistent routing tokens.

Service Fabric distinguishes itself with support for strong consistency and support for stateful microservices. Each of the SF components offer strong consistency behavior. There were two ways to do this: provide consistent – build consistent applications on top of inconsistent components or use consistent components from the grounds-up. The end-to-end principle dictates that if performance is worth the cost for a functionality then it can be built into the middle. If consistency were instead to only be built at the application layer, each distinct application will have significant costs for maintenance and reliability. Instead if the consistency is supported at each layer, it allows higher layer design to focus on their relevant notion of consistency and allows both weakly consistent applications and strongly consistent applications to be built on top of Service Fabric. This is easier than building consistent applications over an inconsistent substrate.

Thursday, March 31, 2022

Service Fabric (continued)    

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring, Part 4 discussed its architecture and Part 5 described compute planning and scaling. This article describes Service Fabric security best practices.

Azure Service Fabric makes it easy to package deploy and manage scalable and reliable microservices. It helps with developing and managing cloud applications. These applications and services can be stateless or stateful. They are run with high efficiency and load balancing. It supports real-time data analysis, in-memory computation, parallel transactions, and event processing in the applications.

The security best practices are described at various levels. At the level of an instance of Service Fabric, the Azure Resource Manager templates and the Service Fabric PowerShell modules create secure clusters. X.509 certificates must be used to secure the instance. Security policies must be configured and the Reliable Actors security configuration must be implemented. The TLS must be configured so that all communications are encrypted. Users must be assigned to roles and Role based Access Control must be used to secure all control plane access.

At the level of a cluster, certificates continue to secure the cluster and client access – both read-only and admin access are secured by Azure Active Directory. Automated deployments use scripts to generate, deploy and roll over the secrets. The secrets are stored in the Azure Key Vault and the Azure AD is used for all other client access. Authentication is required from all users. The cluster must be configured to create perimeter networks by using Azure Network Security Groups. Cluster virtual machines must be accessed via jump servers with Remote Desktop Connection.

Within the cluster, there are three scenarios for implementing cluster security by various technologies.

Node-to-node security: This scenario secures communication between the VMs and the computers in the cluster. Only computers that are authorized to join the cluster can host applications and services in the cluster.

Client-to-node security: This scenario secures communication between a Service Fabric client and the individual nodes in the cluster.

Service Fabric role-based access control: This scenario uses separate identities for each administrator and user client role that accesses the cluster. The role identities are specified when the cluster is created.

A detailed checklist for security and compliance is also included for reference: https://1drv.ms/b/s!Ashlm-Nw-wnWzR4MPnriBWYTlMY6

Wednesday, March 30, 2022

Service Fabric (continued)    

Within the cluster, there are three scenarios for implementing cluster security by various technologies.

Client-to-node security: This scenario secures communication between a Service Fabric client and the individual nodes in the cluster.

Tuesday, March 29, 2022

Service Fabric (continued)   

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring and Part 4 discussed its architecture. This article describes compute planning and scaling.

Service Fabric supports a wide variety of business applications and services. These applications and services can be stateless or stateful. They are run with high efficiency and load balancing. It supports real-time data analysis, in-memory computation, parallel transactions, and event processing in the applications. Applications can be scaled in or out depending on the changing resource requirements.

Service Fabric handles hosts stateful services that must support large scale and low latency. It can help process data on millions of devices where the data for the device and the computation are co-located. It is equally effective for both core and edge services and scales to IoT traffic. Apps and services are all deployed in the same Service Fabric cluster through the Service Fabric deployment commands and yet each of them is independently scaled and made reliable with guarantees for resources. This independence improves agility and flexibility.

Scalability considerations depend on the initial configuration and whether scaling is required for the number of nodes of each node type or if it is required for services.

Initial cluster configuration is important for scalability. When the service fabric cluster is created, the node types are determined, and each node type can scale independently. A node type can be created for each group of services that have different scalability or resource requirements. A node type for the system services must first be configured. Then separate node types can be created for public or front-end services and other node types as necessary for the backend. Placement services can be specified so that services are only deployed to the intended node types.

The durability tier for each node type represents the ability for Service Fabric to influence virtual machine scale set updates and maintenance operations. The production workloads requires Silver or higher durability tier. If the bronze durability tier is used, additional steps are required for scale-in.

Each node type can have a maximum of 100 nodes. Anything more than that will require more node types. A VMSS does not scale instantaneously so the delay must be tolerated during autoscaling. Automatic scale in to reduce the number depends on silver or gold durability tier.

Scaling services depend on whether the services are stateful or stateless. Autoscaling of stateless services can be done by using the average partition load trigger or setting instance count to -1 in the service manifest. Stateful services require each node to get adequate replicas. Dynamic creation or deletion of services or whole application instances is also supported.

Average partition load trigger allows us to scale up the number of nodes. The instanceCount in the service manifest automatically creates and deletes service instances to match.

Monday, March 28, 2022

Service Fabric (continued)   

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring and Part 4 discussed its architecture. This article describes compute planning and scaling.

Capacity and Scaling are two different considerations for Service Fabric and must be reviewed individually. Cluster capacity considerations include Key considerations include initial number and properties of cluster node types, durability level of each node type, which determines Service Fabric VM privileges within Azure infrastructure, and reliability level of the cluster, which determines the stability of Service Fabric system services and overall cluster function

A cluster requires a node type. A node type defines the size, number, and properties for a set of nodes (virtual machines) in the cluster. Every node type that is defined in a Service Fabric cluster maps to a virtual machine scale set aka VMSS. A primary node type is reserved to run critical system services. Non-primary node types are used for backend and frontend services.

Node type planning considerations depend on whether the application has multiple services or if they have different infrastructure needs such as greater RAM or higher CPU cycles or if any of the application services need to scale out beyond hundred nodes or if the cluster spans availability zones.