Monday, July 31, 2017

Today we continue the discussion on Snowflake architecture.The engine for Snowflake is columnar, vectorized and push-based. The columnar storage is suitable for analytical workloads because it makes more effective use of CPU caches and SIMD instructions. Vectorized execution means data is processed in a pipelined fashion without intermediary results as in map-reduce. The Push-based execution means that the relational operators push their results to their downstream operators, rather than waiting for these operators to pull data.  It removes control flow from tight loops.Data is encrypted in transit and before being written to storage. Key management is supported with key hierarchy so that the keys can be rotated and re-encrypted. Encryption and key management together complete the security. By using a hierarchy , we reduce the scope of the keys and the data to be secured. Encryption keys go through four stages in their life cycle. First, they are created, then they are used to encrypt or decrypt, then they are marjed as no longer in use and finally decommissioned.Keys are rotated at periodic intervals.Retired keys can still be used to decrypt data but only the new ones are used to encrypt. Before a retired key is destroyed, data is reencrypted with the latest key. This is called rekeying.
Generally key rotation and compute resources require data redistribution. However, Snowflake allows users to scale up or down and even pause resources without any data movement. 
Snowflake draws inspiration from BigQuery, Google's approach to fast infinite sql processing. However BigQuery does not adhere strictly to SQL, its tables are append only and require schemas. mSnowflake provides ACID guarantees and full DML and does not require schemas for semi structured data.
#codingexercise
Find the length of the longest subsequence of consecutive integers in a given array
int GetLongest(List<int>A)
{
if (A == null || A.Count == 0) return 0;
if (A.Count == 1) return 1;
A.sort();
int max = 1;
int cur = 1;
for (int i = 1; i < A.Count; i++)
{
if (A[i-1] + 1 == A[i])
{
  cur = cur + 1;
}
else
{
  max = Math.Max(max, cur);  
  cur = 1;
}
}
max = Math.Max(max, cur);
return max;
}

No comments:

Post a Comment