Data in motion – IoT solution and data replication
The transition of data from edge sensors to the cloud is a data engineering pattern that does not always get a proper resolution with the boilerplate Event-Driven architectural design proposed by the public clouds because much of the fine tuning is left to the choice of the resources, event hubs and infrastructure involved in the streaming of events. This article explores the design and data in motion considerations for an IoT solution beginning with an introduction to the public cloud proposed design, the choices between products and the considerations for the handling and tuning of distributed, real-time data streaming systems with particular emphasis on data replication for business continuity and disaster recovery. A sample use case can include the continuous events for geospatial analytics in fleet management and its data can include driverless vehicles weblogs.
Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events. The right choice of architectural style plays a big role in the total cost of ownership for a solution involving events.
The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. IoT requires events to be ingested at very high volumes. The producer-consumer design has scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.
Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.
Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. IoT traffic mandates a guaranteed delivery. Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.
The benefits and the challenges suggest some of these best practices. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp. Large data transfer between services is an antipattern. Loosely coupled event driven systems are best.
IoT Solutions can be proposed either with an event driven stack involving open-source technologies or via a dedicated and optimized storage product such as a relational engine that is geared towards edge computing. Either way capabilities to stream, process and analyze data are expected by modern IoT applications. IoT systems vary in flavor and size. Not all IoT systems have the same certifications or capabilities.
No comments:
Post a Comment