Thursday, March 31, 2022

 

Service Fabric (continued)     

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring, Part 4 discussed its architecture and Part 5 described compute planning and scaling.  This article describes Service Fabric security best practices.

Azure Service Fabric makes it easy to package deploy and manage scalable and reliable microservices. It helps with developing and managing cloud applications. These applications and services can be stateless or stateful. They are run with high efficiency and load balancing. It supports real-time data analysis, in-memory computation, parallel transactions, and event processing in the applications.

The security best practices are described at various levels. At the level of an instance of Service Fabric, the Azure Resource Manager templates and the Service Fabric PowerShell modules create secure clusters. X.509 certificates must be used to secure the instance. Security policies must be configured and the Reliable Actors security configuration must be implemented. The TLS must be configured so that all communications are encrypted. Users must be assigned to roles and Role based Access Control must be used to secure all control plane access.

At the level of a cluster, certificates continue to secure the cluster and client access – both read-only and admin access are secured by Azure Active Directory. Automated deployments use scripts to generate, deploy and roll over the secrets. The secrets are stored in the Azure Key Vault and the Azure AD is used for all other client access. Authentication is required from all users. The cluster must be configured to create perimeter networks by using Azure Network Security Groups. Cluster virtual machines must be accessed via jump servers with Remote Desktop Connection.

Within the cluster, there are three scenarios for implementing cluster security by various technologies.

Node-to-node security: This scenario secures communication between the VMs and the computers in the cluster. Only computers that are authorized to join the cluster can host applications and services in the cluster.

Client-to-node security: This scenario secures communication between a Service Fabric client and the individual nodes in the cluster.

Service Fabric role-based access control: This scenario uses separate identities for each administrator and user client role that accesses the cluster. The role identities are specified when the cluster is created.

A detailed checklist for security and compliance is also included for reference: https://1drv.ms/b/s!Ashlm-Nw-wnWzR4MPnriBWYTlMY6  

 

 

 

 

Wednesday, March 30, 2022

Service Fabric (continued)     

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring, Part 4 discussed its architecture and Part 5 described compute planning and scaling.  This article describes Service Fabric security best practices.

Azure Service Fabric makes it easy to package deploy and manage scalable and reliable microservices. It helps with developing and managing cloud applications. These applications and services can be stateless or stateful. They are run with high efficiency and load balancing. It supports real-time data analysis, in-memory computation, parallel transactions, and event processing in the applications.

The security best practices are described at various levels. At the level of an instance of Service Fabric, the Azure Resource Manager templates and the Service Fabric PowerShell modules create secure clusters. X.509 certificates must be used to secure the instance. Security policies must be configured and the Reliable Actors security configuration must be implemented. The TLS must be configured so that all communications are encrypted. Users must be assigned to roles and Role based Access Control must be used to secure all control plane access.

At the level of a cluster, certificates continue to secure the cluster and client access – both read-only and admin access are secured by Azure Active Directory. Automated deployments use scripts to generate, deploy and roll over the secrets. The secrets are stored in the Azure Key Vault and the Azure AD is used for all other client access. Authentication is required from all users. The cluster must be configured to create perimeter networks by using Azure Network Security Groups. Cluster virtual machines must be accessed via jump servers with Remote Desktop Connection.

Within the cluster, there are three scenarios for implementing cluster security by various technologies.

Node-to-node security: This scenario secures communication between the VMs and the computers in the cluster. Only computers that are authorized to join the cluster can host applications and services in the cluster.

Client-to-node security: This scenario secures communication between a Service Fabric client and the individual nodes in the cluster.

Service Fabric role-based access control: This scenario uses separate identities for each administrator and user client role that accesses the cluster. The role identities are specified when the cluster is created.

Tuesday, March 29, 2022

Service Fabric (continued)    

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring and Part 4 discussed its architecture. This article describes compute planning and scaling.

Service Fabric supports a wide variety of business applications and services. These applications and services can be stateless or stateful. They are run with high efficiency and load balancing. It supports real-time data analysis, in-memory computation, parallel transactions, and event processing in the applications. Applications can be scaled in or out depending on the changing resource requirements.

Service Fabric handles hosts stateful services that must support large scale and low latency. It can help process data on millions of devices where the data for the device and the computation are co-located. It is equally effective for both core and edge services and scales to IoT traffic. Apps and services are all deployed in the same Service Fabric cluster through the Service Fabric deployment commands and yet each of them is independently scaled and made reliable with guarantees for resources. This independence improves agility and flexibility.

Scalability considerations depend on the initial configuration and whether scaling is required for the number of nodes of each node type or if it is required for services.

Initial cluster configuration is important for scalability. When the service fabric cluster is created, the node types are determined, and each node type can scale independently. A node type can be created for each group of services that have different scalability or resource requirements. A node type for the system services must first be configured. Then separate node types can be created for public or front-end services and other node types as necessary for the backend. Placement services can be specified so that services are only deployed to the intended node types.

The durability tier for each node type represents the ability for Service Fabric to influence virtual machine scale set updates and maintenance operations. The production workloads requires Silver or higher durability tier. If the bronze durability tier is used, additional steps are required for scale-in.

Each node type can have a maximum of 100 nodes. Anything more than that will require more node types. A VMSS does not scale instantaneously so the delay must be tolerated during autoscaling. Automatic scale in to reduce the number depends on silver or gold durability tier.

Scaling services depend on whether the services are stateful or stateless. Autoscaling of stateless services can be done by using the average partition load trigger or setting instance count to -1 in the service manifest. Stateful services require each node to get adequate replicas. Dynamic creation or deletion of services or whole application instances is also supported.

Average partition load trigger allows us to scale up the number of nodes. The instanceCount in the service manifest automatically creates and deletes service instances to match.

 

 

 

Monday, March 28, 2022

 Service Fabric (continued)    

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring and Part 4 discussed its architecture. This article describes compute planning and scaling.

Service Fabric supports a wide variety of business applications and services. These applications and services can be stateless or stateful. They are run with high efficiency and load balancing. It supports real-time data analysis, in-memory computation, parallel transactions, and event processing in the applications. Applications can be scaled in or out depending on the changing resource requirements.

Service Fabric handles hosts stateful services that must support large scale and low latency. It can help process data on millions of devices where the data for the device and the computation are co-located. It is equally effective for both core and edge services and scales to IoT traffic. Apps and services are all deployed in the same Service Fabric cluster through the Service Fabric deployment commands and yet each of them is independently scaled and made reliable with guarantees for resources. This independence improves agility and flexibility.

Capacity and Scaling are two different considerations for Service Fabric and must be reviewed individually. Cluster capacity considerations include Key considerations include initial number and properties of cluster node types, durability level of each node type, which determines Service Fabric VM privileges within Azure infrastructure, and reliability level of the cluster, which determines the stability of Service Fabric system services and overall cluster function

A cluster requires a node type. A node type defines the size, number, and properties for a set of nodes (virtual machines) in the cluster. Every node type that is defined in a Service Fabric cluster maps to a virtual machine scale set aka VMSS. A primary node type is reserved to run critical system services. Non-primary node types are used for backend and frontend services.

Node type planning considerations depend on whether the application has multiple services or if they have different infrastructure needs such as greater RAM or higher CPU cycles or if any of the application services need to scale out beyond hundred nodes or if the cluster spans availability zones.

Sunday, March 27, 2022

 

Service Fabric (continued)    

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring and Part 4 discussed its architecture. This article describes its usage scenarios.

Service Fabric supports a wide variety of business applications and services. These applications and services can be stateless or stateful. They are run with high efficiency and load balancing. It supports real-time data analysis, in-memory computation, parallel transactions, and event processing in the applications. Applications can be scaled in or out depending on the changing resource requirements.

Service Fabric handles hosts stateful services that must support large scale and low latency. It can help process data on millions of devices where the data for the device and the computation are co-located. It is equally effective for both core and edge services and scales to IoT traffic.

Service Fabric is also useful for scenarios that require low-latency reads and writes, such as in online gaming or instant messaging. Applications can be built to be interactive and stateful without having to create a separate store or cache. Gaming and instant messaging are some examples of this scenario.

Applications that must reliably process events or streams of data run well on Service Fabric with its optimized reads and writes. Service Fabric supports application processing pipelines, where results must be reliable and passed on to the next processing stage without any loss. These pipelines include transactional and financial systems, where data consistency and computation guarantees are essential.

Stateful applications that perform intensive data computation and require the colocation of processing (computation) and data in applications benefit from Service Fabric as well. Stateful Service Fabric services eliminate that latency, enabling more optimized reads and writes. As an example, real-time recommendation selections for customers that require a round trip-time latency of less than hundred milliseconds are handled with ease.

Service Fabric also supports highly available services and provides fast failover by creating multiple secondary service replicas. If a node, process, or individual service goes down due to hardware or other failure, one of the secondary replicas is promoted to a primary replica with minimal loss of service.

Individual services can also be partitioned where services are hosted by different hosts. Individual services can also be created and removed on the fly. Services can be scaled from a few instances on a few nodes to thousands of instances on several nodes and dialed down as well. Service Fabric helps with the complete life cycles.

Examples of stateless services include Azure Cloud Services and that for stateful microservices that must maintain authoritative state beyond the request and its response include ASP.Net and node.js services Service Fabric provides high availability and consistency of state through simple APIs that provide transactional guarantees backed by replication.

Stateful services in Service Fabric bring high availability to all types of applications and not just those that depend on a database or a data store. This covers both relational and big data stores. Applications can have both their state and data managed for additional performance gains without sacrificing reliability, consistency or availability

Apps and services are all deployed in the same Service Fabric cluster through the Service Fabric deployment commands and yet each of them is independently scaled and made reliable with guarantees for resources. This independence improves agility and flexibility.

Stateful microservices simplify application design because they remove the need for the additional queues and caches and have traditionally been required to address the availability and latency requirements of purely stateless applications. Service Fabric provides reliable services and reliable actors programming models and reduce the application complexity while achieving high throughput and low latency.

 

 

 

Saturday, March 26, 2022

 

Service Scalability and Reliability

These are some observations about scalability and reliability of a cloud based service.

The primary consideration is between the tradeoffs for compute versus data optimizations.

The scale-out of the computational tasks is achieved by their discrete, isolated, and finite nature where some input is taken in raw form and processed into an output. The scale out can be adjusted to suit the demands of the workload and the outputs can be conflated as is customary with map-reduce problems.  Since the tasks are run independently and in parallel, they are tightly coupled. Network latency for message exchanges between tasks is kept to a minimum.

Compute-oriented improvements have the following benefits: 1) high performance due to the parallelization of tasks. 2) ability to scale out to arbitrarily large number of cores, 3) ability to utilize a wide variety of compute units and 4) dynamic allocation and deallocation of compute. 

Some of the best practices demonstrated by this approach include the following: It exposes a well-designed API to the client. It can auto-scale to handle changes in the load. It caches semi-static data. It uses polyglot persistence when appropriate. It partitions data to improve scalability, reduce contention, and optimize performance. There are Kusto endpoints for read-only data in USNat, USSec and public cloud.

The storage approach leans on multiple persistence and larger volume of data so that services can stage processing and analysis both of which have different patterns. The data continues to be made available in real-time but there is a separation of read-only and read-write. Copy-on-Write mechanism is provided by default and versioning is supported.

Some of the benefits of this approach include the following: The ability to mix technology choices, achieving performance through efficiency in data processing, queuing on the service side, and interoperability with existing service technology stacks. 

Some of the best practices with this architectural style leverage parallelism, partition data, apply schema-on read semantics, process data in place, balance utilization and time costs, separate cluster resources, orchestrate data ingestion and scrub sensitive data. 

Some of the architectural guidance and best practice for implementing cloud services is found via the reference documentation online and as presented in this article

 

Friday, March 25, 2022

 

Service Fabric (continued)    

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring and Part 4 discussed its architecture. This article describes its support for microservices. 

 

Service Fabric provides an infrastructure to build, deploy, and upgrade microservices efficiently with options for auto scaling, managing state, monitoring health, and restarting services in case of failure. It helps developers and administrators to focus on the implementation of workloads that are scalable, reliable, and manageable by avoiding the issues that are regularly caused by complex infrastructures. The major benefits it provides include deploying and evolving services at very low cost and high velocity, lowering costs to changing business requirements, exploiting the widespread skills of developers, and decoupling packaged applications from workflows and user interactions.

 

Service Fabric follows an application model where an application is a collection of microservices. The application is described in an application manifest file that defines the different types of service contained in that application, and pointers to the independent service packages. The application package also usually contains parameters that serve as overrides for certain settings used by the services. Each service package has a manifest file that describes the physical files and folders that are necessary to run that service, including binaries, configuration files, and read-only data for that service. Services and applications are independently versioned and upgradable.

 

A package can deploy more than one application but if one service fails to upgrade, the entire application is rolled back. For this reason, the microservices architecture is best served by multiple packages. If a set of services share the same resources and configuration or have the same lifecycle, then those services can be placed in the same application type.

Service Fabric programming models can be chosen whether the services are stateful or stateless.

 

Service Fabric distinguishes itself with support for strong consistency and support for stateful microservices. Each of the SF components offer strong consistency behavior. There were two ways to do this: provide consistent – build consistent applications on top of inconsistent components or use consistent components from the grounds-up. The end-to-end principle dictates that if performance is worth the cost for a functionality, then it can be built into the middle. If consistency were instead to only be built at the application layer, each distinct application would have significant costs for maintenance and reliability. Instead, if the consistency is supported at each layer, it allows higher layer design to focus on their relevant notion of consistency and allows both weakly consistent applications and strongly consistent applications to be built on top of Service Fabric. This is easier than building consistent applications over an inconsistent substrate. 

 

A stateless service is chosen when it must scale, and the data or state can be stored externally.  There is also an option to run an existing service as a guest executable and it can be packaged in a container will all its dependencies. Service Fabric models are both container and executable as stateless services.

 

An API gateway (ingress) sits between external clients and the microservices and acts as a reverse proxy, routing requests from clients to microservices. As an http proxy, it can handle authentication, SSL termination, and rate limiting.

Thursday, March 24, 2022

 

Service Fabric (continued)   

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring.

This article continues the discussion on Service Fabric with a focus on its architecture. ServiceFabric is built with layered subsystems which enables us to write applications that are highly available, scalable, manageable and testable.

The major subsystems in a ServiceFabric include the following:

-          the transportation subsystem that secures point to point communication as the base layer

-          the federation subsystem that federates a set of nodes to form a consistent scalable fabric

-          the communication system that does service discovery

-          the reliability system that offers reliability, availability, replication, and service orchestration.

-          The hosting and activation subsystem that offers application lifecycle

-          The management subsystem that performs deployment, upgrade and monitoring

-          The testability subsystem that performs fault injection, test in production

-          The application model that is declarative application description

-          The native and managed APIs that support reliable, scalable applications

Service Fabric provides the ability to resolve service locations through its communication subsystem. The application programming models exposed to the developers are layered on top of these subsystems along with the application model to enable tooling.

 

The transport subsystem implements a point-to-point datagram communication channel which is used for communication within service fabric clusters and communication between the service fabric cluster and clients. It enables broadcast and multicast in the Federation layer and provides encrypted communication. It is not exposed to users.

 

The federation subsystem stitches the various nodes into a single unified cluster. It provides the distributed systems primitives needed by the other subsystems - failure detection, leader election, and consistent routing. It is built on top of distributed hash tables with a 128-bit token space which is a ring topology over the nodes.

The reliability subsystem consists of a Replicator, Failover Manager, and Resource Balancer.  The Replicator ensures that state changes in the primary service replica are replicated and in sync. The Failover Manager ensures that the load is automatically redistributed across the nodes on additions and removals.  The Resource Manager places service replicas across failure domains in the cluster and ensures that all failover units are operational.

The Management subsystem consists of a cluster manager, Health manager, and the Image store. The Cluster manager places the applications on the nodes based on service placement constraints. The Health manager enables health monitoring of applications, services and cluster entities. The Image store service provides storage and distribution of the application binaries.

The Hosting subsystem services manages the lifecycle of an application on a node that the hosting subsystem provides.

The communication subsystem provides reliable messaging within the cluster and service discovery through the Naming service.

The Testability subsystem  is a suite of tools specifically designed for testing services built on Service Fabric.

Wednesday, March 23, 2022

 

Service Fabric (continued) 

Part 2 compared Paxos and Raft. Part 3 discussed SF-Ring. This article continues the discussion on Service Fabric with it support for microservices. Microsoft Azure Service Fabric is a distributed systems platform to package, deploy, and manage scalable and reliable Microservices and containers while supporting native cloud development. Service Fabric helps developers and administrators to focus on the implementation of workloads that are scalable, reliable and manageable by avoiding the issues that are regularly caused by complex infrastructures. The major benefits it provides include: deploying and evolving services at very low cost and high velocity, lowering costs to changing business requirements, exploiting the widespread skills of developers and decoupling packaged applications from workflows and user interactions.

SF provides first-class support for full Application Lifecycle Management (ALM) of cloud applications, from development, deployment, daily management, to eventual decommissioning. It provides system services to deploy, upgrade, detect, and restart failed services; discover service location; manage state; and monitor health. In production environments, there can be hundreds of thousands of  microservices running in an unpredictable cloud environment. SF is an automated system that provides support for the complex task of managing these microservices.

An application is a collection of constituent microservices (stateful or stateless) in Service Fabric. Each of these performs a complete and standalone function and is composed of code, configuration and data. The code consists of the executable binaries, the configurations consist of service settings that can be loaded at run time, and the data consists of arbitrary static data to be consumed by the microservice. A powerful feature of SF is that each component in the hierarchical application model can be versioned and upgraded independently.

Service Fabric distinguishes itself with support for strong consistency and support for stateful microservices. Each of the SF components offer strong consistency behavior. There were two ways to do this: provide consistent – build consistent applications on top of inconsistent components or use consistent components from the grounds-up. The end-to-end principle dictates that if performance is worth the cost for a functionality then it can be built into the middle. If consistency were instead to only be built at the application layer, each distinct application will have significant costs for maintenance and reliability. Instead if the consistency is supported at each layer, it allows higher layer design to focus on their relevant notion of consistency and allows both weakly consistent applications and strongly consistent applications to be built on top of Service Fabric. This is easier than building consistent applications over an inconsistent substrate.

Support for stateful microservices that maintain a mutable authoritative state beyond the service request and its response is a notable achievement of Service Fabric. Stateful microservices can demonstrate high-throughput, low-latency, fault-tolerant online transaction processing services by keeping code and data close on the same machine. It also simplifies the application design by removing the need for additional queues and caches.

Tuesday, March 22, 2022

 

Service Fabric (continued)

Part 2 compared Paxos and Raft. This article continues the discussion with a comparison to Service Fabric.

Kubernetes requires distributed consensus, but Service Fabric relies on its ring. Distributed consensus algorithm like Paxos and Raft must perform leader election, Service Fabric doesn’t. Service Fabric avoids this with a decentralized ring and failure detection. It is motivated by the adage that distributed consensus is at the heart of numerous co-ordination problems, but it has a trivially simple solution if there is a reliable failure detection service. Service Fabric does not use a distributed consensus protocol like Raft or Paxos or a centralized store for cluster state. Instead, it proposes a Federation subsystem which answers the most important question on membership which is whether a specific node is part of the system. The nodes are organized in rings, and the heartbeats are only sent to a small subset of nodes called the neighborhood. The arbitration procedure involves more nodes besides the monitor but it is only executed on missed heartbeats. A quorum of privileged nodes in an arbitration group helps resolve the possible failure detection conflicts, isolate appropriate nodes and maintain a consistent view of the ring.

Nodes in a Service Fabric are organized in a virtual ring with 2^m points where m = 128 bits. Nodes and keys are mapped onto a point in the ring. A key is owned by the node closest to it, with ties won by the predecessor. Each node keeps track of multiple (a given number of) its immediate successor nodes and predecessor nodes in the ring which is called the neighborhood set. Neighbors are used to run SF’s membership and failure detection protocol.

Membership and failure detection in Service Fabric relies on two key design principles: 1. A notion of strongly consistent membership and 2. Decoupling failure detection from failure decision. All nodes responsible for monitoring a node X must agree on whether X is up or down. When used in the SF-ring, all predecessors and successors in the neighborhood of a node X agree on its X’s status. This forms a consistent neighborhood. Failure detection protocols can lead to conflicting decisions. For this reason, the decision of which nodes are failed is decoupled from the failure detection itself.

Monitoring and leasing make this periodic. Heartbeating is fully decentralized where each node is monitored by a subset of other nodes which are called its monitors. Node X periodically sends a lease renewal request which is a heartbeat message with a unique sequence number, to each of its monitors. When a monitor acknowledges, node X is said to carry a timer value called T0 so that X can wait for T0 time and then take actions with respect to Y. If a node X suspects a neighbor Y, it sends a fail(Y) message to the arbitrator but waits for T0 time after receiving the accept(.) message before reclaiming the portion of the Y’s ring. Any routing requests received for Y will be queued but processed only after the range has been inherited by Y’s neighbors.

The SF-Ring is a distributed hash table. It provides a seamless way of scaling from small groups to large groups. SF-Ring was developed at around the same time as P2P DHTs like Pastry, Chord, and others. SF-Ring is unique because 1) Routing table entries are bidirectional and symmetrical, 2) the routing is bidirectional, 3) Routing tables are eventually convergent, 4) there is decoupled mapping of Nodes and keys and 5) There are consistent routing tokens.

1.       SF-Ring maintains routing partners in routing tables at exponentially increasing distances in the ring. Routing partners are maintained both clockwise and anticlockwise. Most routing partners are symmetric due to bidirectionality

2.       A bidirectional routing table enables a node looking to forward a message for a key to find another node whose ID is closest to the key so that it may forward the message. It is a distributed form of binary search and is greedy in nature.

3.       SF nodes use a chatter protocol to continuously exchange routing table information. The symmetric nature of the routing enables failure information to propagate quickly and this leads to eventual converging of routing table entries that are affected.

4.       Nodes and services are mapped onto the ring in a way that is decoupled from the ring. Nearby nodes on the rings are selected from different fault domains and services are mapped in a near-optimal way and load balanced way.

5.       Each SF node owns a portion of the ring as encoded in its token. There is no overlap among tokens owned by nodes and every token range is eventually owned by at least one node. When a node leaves, its successor and predecessor split the range between them halfway. With these criteria, SF routing will eventually succeed.

Monday, March 21, 2022

 

Service Fabric (continued)

Part 1 introduced the Service Fabric. This article continues the discussion with a comparison of the messaging protocols:

Service Fabric is not like Kubernetes which is a heavily centralized system relying on an API server, multiple Kubelets, a central etcd cluster repository and heartbeats collected every period. Service Fabric avoids this with a decentralized ring and failure detection. It is motivated by the adage that distributed consensus is at the heart of numerous co-ordination problems, but it has a trivially simple solution if there is a reliable failure detection service. Service Fabric does not use a distributed consensus protocol like Raft or Paxos or a centralized store for cluster state. Instead, it proposes a Federation subsystem which answers the most important question on membership which is whether a specific node is part of the system. The nodes are organized in rings, and the heartbeats are only sent to a small subset of nodes called the neighborhood. The arbitration procedure involves more nodes besides the monitor but it is only executed on missed heartbeats. A quorum of privileged nodes in an arbitration group helps resolve the possible failure detection conflicts, isolate appropriate nodes and maintain a consistent view of the ring.

Kubernetes requires distributed consensus, but Service Fabric relies on its ring. Distributed consensus algorithms have evolved into many forms but two primarily dominate the market – Paxos and Raft. Both take a similar approach differing only in their approach for leader election. Raft only allows servers with up-to-date logs to become leaders, whereas Paxos allows any server to be leader if it then updates its logs to ensure that it remains up-to-date. Raft is more efficient since it does not involve log entries to be exchanged during leader elections.

Some of the other differences can be compared along the following dimensions:

1.       How does it ensure that each term has at most one leader?

a.       Paxos: A server s can only be candidate in a term t if t mod n = s. There will be only one candidate per term so there will be only one leader per term.            

b.       Raft: A follower can become a candidate in any term. Each follower will only vote for one candidate per term, so only one candidate can get a majority of votes and become the leader.

2.       How does it ensure that a new leader’s log contains all committed entries?

a.       Paxos: Each RequestVote reply includes the followers’s log entries. Once a candidate has received RequestVote responses from a majority of followers, it adds the entries with the highest term to its log.

b.       Raft: A vote is granted only if the candidate’s log is at least as up-to-date as the followers’. This ensures that a candidate only becomes a leader if it’s log is at least as up-to-date as a majority of its followers.

3.       How does it ensure that the leader safely commits log entries from previous terms?

a.       Paxos: Log entries from the previous terms are added to the leader’s log with the leader’s terms. The leader then replicates the log entries as if they were from the leader’s terms.

b.       Raft: The leader replicates the log entries to the other servers without changing the term. The leader cannot consider those entries committed until it has replicated a subsequent log entry from its own term.

Raft algorithm was proposed to address the long-standing issues with understandability of the widely studied Paxos algorithm. It has a clear abstraction and presentation and can be a simplified version of the original Paxos algorithm. Specifically, Paxos divides terms between servers whereas Raft allows a follower to become a candidate in any term, but followers will vote for only one candidate per term. Paxos followers will vote for any candidate, whereas Raft followers will only vote for a candidate if the candidate’s log is at least as up to date. If a leader has uncommitted log entries from a previous term, Paxos will replicate them in the current term whereas Raft will replicate them in their original term. Raft’s leader election is therefore lightweight when compared to Paxos.

 

Sunday, March 20, 2022

 Azure Service Fabric: 

Introduction: This is a continuation of a series of articles on Azure services from an operational engineering perspective and the role their design and algorithms play in the field. Most recently we discussed Azure Functions with the link here. This article turns to Microsoft Service Fabric with an intention to discuss the comparisons between Paxos, Raft and Service Fabric’s protocol. 

Discussion: 

ServiceFabric is Microsoft’s distributed platform for building, running, and maintaining microservices applications in the cloud. It is a container orchestrator and it is able to provide quality of service to microservice framework models such as stateful, stateless, and actor. It differs from Azure Container Service in that it is an Infrastructure-as-a-Service offering rather than a Platform-as-a-Service offering. There is also a Service Fabric Mesh offering that provides a PaaS service for Service Fabric applications. Service Fabric provides its own specific programming model, allows guest executables, and orchestrates Docker containers. It supports both Windows and Linux but it is primarily suited for Windows. It can scale to handle Internet-of-Things traffic. It is open to workloads and is technology-agnostic. It relies on Docker and has supported both Windows and Linux containers but it provides a more integrated feature-rich orchestration that gives more openness and flexibility. 

Cloud-native container applications are evaluated on a 12-factor methodology for building web applications and software-as-a-service which demand the  

  • Use of declarative frameworks for setup automation, minimizing time and cost for new developers joining the project 

  • Use of a clean contract with the underlying operating system, offering maximum portability between execution environments, 

  • Suitability for deployment on modern cloud platforms and avoiding the need for servers and system administration 

  • Ability to minimize divergence between development and production and to enable continuous deployment for maximum agility 

  • Ability to scale up without significant changes to tooling, architecture or development practices. 

Service Fabric encourages all of these so that its workloads can focus more on their business requirements. 

 

It is often compared to Kubernetes which is also a container orchestration framework that hosts applications. Kubernetes extends this idea of app+container all the way where the host can be nodes of a cluster. Kubernetes evolved as an industry effort from the native Linux containers support of the operating system.  It can be considered as a step towards a truly container-centric development environment. Containers decouple applications from infrastructure which separates dev from ops. Containers demonstrate better resource isolation and improved resource utilization. Kubernetes is not a traditional, all-inclusive PaaS. Unlike PaaS which restricts applications, dictates the choice of application frameworks, restricts supported language runtimes, or distinguishes apps from services, Kubernetes aims to support an extremely diverse variety of workloads. If the application has been compiled to run in a container, it will work with Kubernetes. PaaS provides databases, message buses, cluster storage systems but those can run on Kubernetes. There is also no click to deploy service marketplace. Kubernetes does not build user code or deploy it. However, it facilitates CI workflows to run on it. Kubernetes allows users to choose logging, monitoring, and alerting Kubernetes also does not require a comprehensive application language or system. It is independent of machine configuration or management. But PaaS can run on Kubernetes and extend its reach to different clouds. 

Service Fabric is used by some of the largest services in the Azure Cloud Service Portfolio but it comes with a different history, different goals, and different designs.  The entire Microsoft Azure Stack hybrid offering relies on Service Fabric to run all the platform core services. Kubernetes is a heavily centralized system. It has an API server in the middle, and agents called Kubelets that are installed on all worker nodes. All Kubelets communicate with the API server and this saves the state in a centralized repository – the etcd cluster which is a Raft-backed distributed KV-store. Cluster membership is maintained by requiring kubelets to maintain a connection with the API server to send heartbeats every period. Service Fabric avoids this with a decentralized ring and failure detection. It is motivated by the adage that distributed consensus is at the heart of numerous co-ordination problems, but it has a trivially simple solution if there is a reliable failure detection service. Service Fabric does not use a distributed consensus protocol like Raft or Paxos or a centralized store for cluster state. Instead, it proposes a Federation subsystem that answers the most important question on membership which is whether a specific node is part of the system. The nodes are organized in rings, and the heartbeats are only sent to a small subset of nodes called the neighborhood. The arbitration procedure involves more nodes besides the monitor but it is only executed on missed heartbeats. A quorum of privileged nodes in an arbitration group helps resolve the possible failure detection conflicts, isolate appropriate nodes and maintain a consistent view of the ring. 

Conclusion: 

Service Fabric recognizes different types of workloads and is particularly well-suited for stateful workloads.