Cluster computing

Thursday, May 27, 2021

The modus-operandi for the use of Azure Cache with Redis.

Performance and cost-effective use of Azure Cache for Redis instance result from following best practices. These are:

The Basic tier is a single node system with no data replication and no SLA, so use standard or premium tier.

Data loss is expected because it is an in-memory store and patching or failovers might occur.

Use a connect timeout of at least 15 seconds.

The default eviction policy is volatile-lru, which means that only keys that have a TTL value set will be eligible for eviction. If no keys have a TTL value, then the system won't evict any keys. If we want to stretch the eviction to all keys, use allkeys-lru policy. Keys can also have an expiration value set.

There is a performance tool available called redis-benchmark.exe. This is recommended to be run on Dv2 VM series.

The stats section shows the total number of expired keys. The keyspace section provides more information about the number of keys with timeouts and an average time-out value. The number of evicted keys can be monitored using the info command.

If all the keys are lost, it probably occurs due to one of three reasons: The keys have been purged manually, the azure cache for Redis is set to use a non-default database, or the Redis server is unavailable.

Redis is an in-memory data store. It is hosted on a single VM in a basic tier. If that VM is down, all data in the cache is lost. Caches in the standard or premium tier offer much higher resiliency against data loss by using two VMs in a replicated configuration. These VMs are located on separate domains for faults and updates, to minimize the chance of both becoming unavailable simultaneously. If a major datacenter outage happens, however, the VMs might still go down together. Data persistence and geo-replication is used to protect data against failures.

A cache is constructed of multiple virtual machines with separate, private IP addresses. Each virtual machine, also known as a node, is connected to a shared load balancer with a single virtual IP address. Each node runs the Redis server process and is accessible by means of the hostname and the Redis ports. Each node is considered either a primary or a replica node. When a client application connects to a cache, its traffic goes through this load balancer and is automatically routed to the primary node.

A basic cache has a single node which is always primary. Standard or premium cache has two nodes – one primary and the other replica. Clustered caches have many shards each with distinct primary and replica nodes.

Failover occurs when the primary goes offline and another becomes primary. Both notice changes. The old one sees the new primary and becomes a replica. Then it connects with the primary to synchronize data.

A planned failover takes place during system updates. The nodes receive advance notice and can swap roles and update the load-balancer. It finishes in less than 1 second.

An unplanned failover might happen because of hardware failure or unexpected outages. The replica node promotes itself to primary but the process is longer because it must first detect that the primary is offline and that the failover is not unnecessary. This lasts 10 to 15 seconds.

Patching involves failover which can be synchronized by the management service.

Clients handle failover effects with retry and backoff. If errors persist for longer than a preconfigured amount of time, the connection object should be recreated. Recreating the connection without restarting the application can be accomplished by using a Lazy<T> pattern.

Reboot and scheduled updates can be tossed in to test a client’s resiliency and the mitigations by the retry and backoff technique.

Cluster computing

Thursday, May 27, 2021

No comments:

Post a Comment