Cluster computing

Data Modernization – continued

This article picks up the discussion on data modernization with an emphasis on the expanded opportunities to restructure the data. Legacy systems were inherently built as online transaction processing systems and online analytical processing systems and usually as a monolithic server. With the shift to microservices for application modernization, data can now be owned by individual microservices that can choose to use the technology stack and specifically the database that makes most sense for that microservice without undue influence or encumbrance from other services. The popularity of unstructured storage – both big data for batch processing and event storage for streaming applications are evident from the shift to data lakes. That said, this does not mean relational storage is not required.

Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events

The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. In some extreme cases such as IoT, the events must be ingested at very high volumes. There is scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.

Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.

Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. Some IoT traffic mandate a guaranteed delivery Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.

Some of the best practices demonstrated by this code. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp. Large data transfer between services in this case is an antipattern. Loosely coupled event driven systems are best.

The Big Compute architectural style refers to the requirements for many cores to handle the compute for the business such as for image rendering, fluid dynamics, financial risk modeling, oil exploration, drug design and engineering stress analysis. The scale out of the computational tasks is achieved by their discrete, isolated, and finite nature where some input is taken in raw form and processed into an output. The scale out can be adjusted to suit the demands of the workload and the outputs can be conflated as is customary with map-reduce problems. Since the tasks are run independently and in parallel, they are tightly coupled. Network latency for message exchanges between tasks is kept to a minimum. The commodity VMs used from the infrastructure is usually the higher end of the compute in that tier. Simulations and number crunching such as for astronomical calculations involve hundreds if not thousands of such compute.

Some of the benefits of this architecture include the following: 1) high performance due to the parallelization of tasks. 2) ability to scale out to arbitrarily large number of cores, 3) ability to utilize a wide variety of compute units and 4) dynamic allocation and deallocation of compute.

Some of the challenges faced with this architecture include the following: Managing the VM architecture, the volume of number crunching, the provisioning of thousands of cores on time and getting diminishing returns from additional cores.

Some of the best practices demonstrated by this code include It exposes a well-designed API to the client. It can auto scale to handle changes in the load. It caches semi-static data. It uses a CDN to host static content. It uses a polyglot persistence when appropriate. It partitions data to improve scalability, it reduces contention, and optimizes performance.

Cluster computing

Friday, January 6, 2023

No comments:

Post a Comment