Thursday, May 31, 2018

We mentioned BigTable and BigQuery in earlier posts where we discussed the purpose of each as forms of storage and processing as well as low-latency and analytical. In this regard there are a few considerations to BigQuery that are not quite obvious:
1)  BigQuery uses columnar data storage to perform analytics over its data. But data does not need to reside in the BigQuery if performance is not a limitation.  External data sources such as log files in cloud storage and records in BigTable can be directly queried. With or without ETL jobs and support for streaming data, some of these analysis can be done. against external data sources.

2) BigQuery is used not just for slower analytical processing. Even near real-time analysis can be achieved by streaming data into BigQuery.

3) BigQuery does not use SQL only. It can support user-defined functions via Javascript.

4) Query quotas and pricing may affect the choice between the interactive vs batch but they can both be used in an assortment for say a dashboard.

5) Queries can be prioritized so that adhoc queries and prepared queries can be run separately and with performance in favor of the latter.

6) BigQuery can work based on a snapshot of the data from a point in time.

7) Since the queries serve organizational needs, they can be arranged as per the requirements of the organization and this means they reflect the well known structure represented from organizations.

8) Since SQL is available for query language, we can use joins and normalized schemas increasing the possibilities of the query

9) Similarly data can be secured with fine grained access control such as at row and column level.

10) we can monitor and audit the usage of these queries so we always know the usages.
#codingexercise
Partition an array into two contiguous subsequences such that there need to be minimum value added to both sums to make them equal 
int GetPartition(List<int> A) 
{ 
    int n = A.Count; 
    var prefixes = new int[n]; 
    var suffixes = new int[n]; 
  
    prefixes[0] = A[0]; 
    for (int i = 1; i < n; i++) { 
        prefixes[i] = prefixes[i - 1] + A[i]; 
    } 
  
    suffixes[n - 1] = A[n - 1]; 
    for (int i = n - 2; i >= 0; i--) { 
        suffixes[i] = suffixes[i + 1] + A[i]; 
    } 
     Int min = suffixes[0]; 
    int index = 0; 
  
    for (int i = 0; i < n - 1; i++) { 
        if (Math.Abs(suffixes[i + 1] - prefixes[i]) < min) { 
           min = Math.Abs(suffixes[i + 1] - prefixes[i]) ; 
           if (suffixes[i + 1] < prefixes[i]) { 
               index = i + 1; 
            } else { 
               index = i; 
            } 
        } 
    } 
    return index; 
} 
Alternatively, for every position candidate in the array, we can compare the sum before and after  
Int before = 0; 
before  = before + A[I]; 
int after = A.Sum() - before; 
#codingexercise
In a rowwise and columnwise sorted matrix of distinct increasing integers, find if a given integer exists. 
Tuple<int,intGetPositionRowWise(int[,] A, int Rows, int Cols, int X) 
{ 
var ret = new Tuple<intint>(){-1,-1}; 
// x, y coordinates 
for (int i =0; i < Rows; i++) 
{ 
int index = binary_search(A, cols, i, X); 
if ( index != -1) 
{ 
ret.first = i; 
ret.second = index; 
break; 
} 
} 
return ret; 
}