Sunday, July 30, 2017

Today we continue the discussion on Snowflake architecture.The engine for Snowflake is columnar, vectorized and push-based. The columnar storage is suitable for analytical workloads because it makes more effective use of CPU caches and SIMD instructions. Vectorized execution means data is processed in a pipelined fashion without intermediary results as in map-reduce. The Push-based execution means that the relational operators push their results to their downstream operators, rather than waiting for these operators to pull data.  It removes control flow from tight loops.
The cloud Services layer is always on and comprises of services that manage virtual warehouses, queries and transactions and all the metadata. The Virtual warehouses  consist of elastic clusters of virtual machines. These are instantiated on demand to scale the query processing.  The data storage spans availability zones and therefore is setup with replication to handle the failures from these zones.
We now review the security features of Snowflake which is designed to protect user data with two factor authentication, encrypted data import and export, secure data transfer and storage and role based access control for database objects. Data is encrypted in transit and before being written to storage. Key management is supported with key hierarchy so that the keys can be rotated and re-encrypted. Encryption and key management together complete the security. This key hierarchy has for four levels - root keys, account keys, table keys and file keys. Each layer encrypts the lower layer. Each account key corresponds to one user account, each table key corresponds to one database and each file key  corresponds to one table file. By using a hierarchy , we reduce the scope of the keys and the data to be secured.

No comments:

Post a Comment