Cluster computing

Saturday, January 26, 2019

Today we continue discussing the best practice from storage engineering
:
371) Data management software such as Cloudera can be deployed and run on any cloud. It offers an enterprise data hub, an analytics DB, and operational DB, data science and engineering and essentials. It is elastic and flexible, it has high performance analytics, it can easily provision over multiple clouds and it can be used for automated metering and billing. Essentially they allow different data models, real-time data pipelines and streaming applications with their big data platform. They enable data models to break free from vendor lockins and with the flexibility to let it be community defined.

372) The data science workbench offered from Cloudera involves a console on a web browser that users can authenticate themselves with using Kerberos against the cluster KDC. Engines are spun-up and we can seamlessly connect with Spark, Hive, and Impala. The engines are spun up based on engine kernels and profiles.

373) Cloudera Data Science workbench uses Docker and Kubernetes. Cloudera is supported on dedicated Hadoop hosts. Cloudera also adds a data engineering service called Altus. It’s a platform that works against a cloud by allowing clusters to be setup and torn down and jobs to be submitted to those clusters. Clusters may be Apache Spark, MR2 or Hive.

374) Containerization technologies and Backend as a service aka lambda functions can also be supported by products such as Cloudera which makes them usable with existing public clouds while it offers an on-premise solution

375) Most storage products don’t differentiate between human and machine data because it involves upper layers of data management. However, dedicated differentiation between human and machine data can make the products more customized for these purposes.

Cluster computing

Saturday, January 26, 2019

No comments:

Post a Comment