Data Modernization – continued
This article picks up the discussion on data
modernization with an emphasis on the expanded opportunities to restructure the
data. Legacy systems were inherently built as online transaction processing
systems and online analytical processing systems and usually as a monolithic
server. With the shift to microservices for application modernization, data can
now be owned by individual microservices that can choose to use the technology
stack and specifically the database that makes most sense for that microservice
without undue influence or encumbrance from other services. The popularity of
unstructured storage – both big data for batch processing and event storage for
streaming applications are evident from the shift to data lakes. That said,
this does not mean relational storage is not required.
Event Driven architecture consists of event
producers and consumers. Event producers are those that generate a stream of
events and event consumers are ones that listen for events
The scale out can be adjusted to suit the demands
of the workload and the events can be responded to in real time. Producers and
consumers are isolated from one another. In some extreme cases such as IoT, the
events must be ingested at very high volumes. There is scope for a high degree
of parallelism since the consumers are run independently and in parallel, but
they are tightly coupled to the events. Network latency for message exchanges
between producers and consumers is kept to a minimum. Consumers can be added as
necessary without impacting existing ones.
Some of the benefits of this architecture include
the following: The publishers and subscribers are decoupled. There are no
point-to-point integrations. It's easy to add new consumers to the system.
Consumers can respond to events immediately as they arrive. They are highly
scalable and distributed. There are subsystems that have independent views of
the event stream.
Some of the challenges faced with this
architecture include the following: Event loss is tolerated so if there needs
to be guaranteed delivery, this poses a challenge. Some IoT traffic mandate a
guaranteed delivery Events are processed in exactly the order they arrive. Each
consumer type typically runs in multiple instances, for resiliency and
scalability. This can pose a challenge if the processing logic is not
idempotent, or the events must be processed in order.
Some of the best practices demonstrated by this
code. Events should be lean and mean and not bloated. Services should share
only IDs and/or a timestamp. Large data transfer
between services in this case is an antipattern. Loosely coupled event driven
systems are best.
The Big Compute architectural style refers to the
requirements for many cores to handle the compute for the business such as for
image rendering, fluid dynamics, financial risk modeling, oil exploration, drug
design and engineering stress analysis. The scale out of the computational
tasks is achieved by their discrete, isolated, and finite nature where some
input is taken in raw form and processed into an output. The scale out can be
adjusted to suit the demands of the workload and the outputs can be conflated
as is customary with map-reduce problems.
Since the tasks are run independently and in parallel, they are tightly
coupled. Network latency for message exchanges between tasks is kept to a
minimum. The commodity VMs used from the infrastructure is usually the higher
end of the compute in that tier. Simulations and number crunching such as for
astronomical calculations involve hundreds if not thousands of such compute.
Some of the benefits of this architecture include
the following: 1) high performance due to the parallelization of tasks. 2)
ability to scale out to arbitrarily large number of cores, 3) ability to
utilize a wide variety of compute units and 4) dynamic allocation and
deallocation of compute.
Some of the challenges faced with this
architecture include the following: Managing the VM architecture, the volume of
number crunching, the provisioning of thousands of cores on time and getting
diminishing returns from additional cores.
Some of the best practices demonstrated by this
code include It exposes a well-designed API to the client. It can auto scale to
handle changes in the load. It caches semi-static data. It uses a CDN to host
static content. It uses a polyglot persistence when appropriate. It partitions
data to improve scalability, it reduces contention, and optimizes performance.
No comments:
Post a Comment