Wednesday, August 2, 2017

Today we continue the discussion on Snowflake computing. The engine for Snowflake is columnar, vectorized and push-based. The columnar storage is suitable for analytical workloads because it makes more effective use of CPU caches and SIMD instructions. Vectorized execution means data is processed in a pipelined fashion without intermediary results as in map-reduce. The Push-based execution means that the relational operators push their results to their downstream operators, rather than waiting for these operators to pull data.  It removes control flow from tight loops.Data is encrypted in transit and before being written to storage. Key management is supported with key hierarchy so that the keys can be rotated and re-encrypted. Encryption and key management together complete the security. By using a hierarchy , we reduce the scope of the keys and the data to be secured.
Snowflake stores both semi-structured and schemaless data.There are three different data types added - variant, array and object. Variant type includes all native data types as well as arrays and objects. These therefore store documents. Arrays and objects are specializations of variants.
The variant data type is a self describing binary serialization which supports fast key value lookup as well as efficient type tests, comparison and hashing. The variant type also helps Snowflake to perform Extract Load Transform instead of Extract Transform and Load operations. We saw that this significantly reduced the data ingestion operations time for a customer of Snowflake as compared to the existing processes using MongoDB. The ability to load JSON directly while allowing parsing or type inference is called the "schema later" approach. This approach decouples the producers and consumers. In a traditional warehouse, changes to the schema required co-ordination between departments and time-consuming operations. 
#codingexercise
Find the sum of first n magic numbers
Magic numbers can be expressed as a power of 5 or sum of unique powers of  5. They occur in series represented by binary distributions such as : 001, 010, 011 etc.  for 5, 25, 30 ...
Therefore,


int GetMagicN(int n)
{
int power = 1;
int result = 0;
while (n)
{
power = power * 5;
if  ( n & 1)
    result += power;
n >> = 1;
}
return result;
}

long GetMagicSum (int k)
{
 long result = 0;
 for (int I = 1; I <= k; I++)
      result += GetMagicN(k);
return result;
}

No comments:

Post a Comment