Cluster computing: Application troubleshooting continued

Thursday, June 4, 2020

Application troubleshooting continued

6) Integration: software analytical and storage stack do not necessarily agree in their nature. However, data that can be linked expands the value and the usage. Data library that spans heterogenous data sources and their software stacks are preferred yet not every stack can be plugged into a library layer.

7) Reliability: Data must always remain available. Unfortunately, the storage engineering best practice for data do not always make it into each and every product or their deployments. Object storage has been found to solve durability and availability of data which makes it easier to use with the storage servers. But production data does not always use object storage. That data lives in different data stores or content stores and needs to be fetched.

8) Push versus pull semantics: Queries pull the data into the analytics and the same query can be repeated on a time series data store to represent charts and graphs on a dashboard constantly However message queues may need to be involved so that push and pull semantics can be delegated to publishers and subscribers themselves. This generally means that the tools of the trade may incur a delay from the source of truth.

9) Pipelines become separate for batch processing and stream processing. Where Message queues may be used with batch processing, Apache Kafka is used for stream processing. While pipelines may be converted to Kafka, legacy pipelines will continue to be used. However, Apache Flume enables both RabbitMQ Flows and Kafka events to be journaled to S3.

10) Analysis stacks: Each stack may have its own quirks. Grafana works well with time series database which means getting good charts and graphs on a dashboard would require data to flow from their source into a time series database. An object storage itself is not a full-fledged time series database but it can be made to work as a junction for data in transit from different sources and destinations. A stream store unlike queues has a rich support for analytics via stream processing languages and frameworks.

These examples show some of the application roles in vectorized execution.

Cluster computing

Thursday, June 4, 2020

Application troubleshooting continued

No comments:

Post a Comment