Thursday, August 3, 2017

In previous blogposts we have been discussing Snowflake computing.  Today we conclude our discussions.The engine for Snowflake is columnar, vectorized and push-based. The columnar storage is suitable for analytical workloads because it makes more effective use of CPU caches and SIMD instructions. Vectorized execution means data is processed in a pipelined fashion without intermediary results as in map-reduce. The Push-based execution means that the relational operators push their results to their downstream operators, rather than waiting for these operators to pull data.  It removes control flow from tight loops.Data is encrypted in transit and before being written to storage. Key management is supported with key hierarchy so that the keys can be rotated and re-encrypted. Encryption and key management together complete the security. By using a hierarchy , we reduce the scope of the keys and the data to be secured.
Snowflake introduced three data types variant, array and objects. These enabled it to be a document store as well. with the gelp of these data types, it introduced msssive efficiencies in data ingestion. This "schema later" approach also allowed it to be parse and transform later.
#codingexercise
Find the length of the longest subsequence of consecutive integers in a given array
int GetLongest(List<int>A)
{
if (A == null || A.Count == 0) return 0;
if (A.Count == 1) return 1;
A.sort();
int max = 1;
int cur = 1;
for (int i = 1; i < A.Count; i++)
{
if (A[i-1] + 1 == A[i])
{
  cur = cur + 1;
}
else
{
  max = Math.Max(max, cur);  
  cur = 1;
}
}
max = Math.Max(max, cur);
return max;
}


No comments:

Post a Comment