Friday, April 19, 2024

 

This is a continuation of previous articles on allowing Spark/Scala/Snowflake code to execute on Azure Machine Learning Compute. The built-in Jupyter kernel of “Azure ML – Python 3.8” does not have pyspark and we discussed the choices of downloading version compatible jars as well as alternative code to get data from Snowflake.

In this article, we will review the steps to set up a Jupyter notebook for Snowpark Scala. An “Almond” kernel can be used to setup Scala and coursier can be used to install the Almond Kernel to specify a supported version of Scala. The Almond Kernel has a prerequisite that a Java Virtual Machine needs to be installed on the system. This can be done by installing AdoptOpenJDK version 8. Then Almond can be fetched with coursier, by downloading its release and running the executable with the command line parameters to install Almond and Scala. Coursier is a Scala application that makes it easy to manage artifacts. It can setup the Scala development environment by downloading and caching artifacts from the web.

The Jupyter notebook for Snowpark can then be configured by defining a variable for the path and directory for classes generated by the Scala REPL and creating it. The Scala REPL generates classes for the Scala code that user writes. This process of configuring the compiler is not complete without adding the directory created earlier as a dependency of the REPL interpreter. Next we create a new session in SnowPark with an example as below:

import $ivy.`com.snowflake:snowpark:1.12.0`

import com.snowflake.snowpark._

import com.snowflake.snowpark.functions._

val session = Session.builder.configs(Map(

    "URL" -> "https://<account_identifier>.snowflakecomputing.com",

    "USER" -> "<username>",

    "PASSWORD" -> "<password>",

    "ROLE" -> "<role_name>",

    "WAREHOUSE" -> "<warehouse_name>",

    "DB" -> "<database_name>",

    "SCHEMA" -> "<schema_name>"

)).create

session.addDependency(replClassPath)

and then the Ammonite Kernel classes can be added for the code.

The session can be used to run a SQL query and populate a dataframe which can then be used independent of the data source.

Previous articles: IaCResolutionsPart107.docx

Thursday, April 18, 2024

 

This is a continuation of previous articles on IaC shortcomings and resolutions. No infrastructure is useful without considerations for usability. As with the earlier example of using Azure Machine Learning workspace to train models using Snowflake data source, some consideration must be given to allow connections to data source and importing data. We cited resolving versions between Spark, Scala and Snowflake libraries within the kernel to allow data to be imported into a dataframe for use with SQL and this could be difficult for end-users if they were to locate the jars and download themselves.

One way to resolve this would be to use a different coding style as shown below:

import snowflake.connector

conn = snowflake.connector.connect(

    account=’<snowflake_account>’,

    host='<account>.east-us-2.azure.snowflakecomputing.com',

    user='<login>',

    private_key=<bytes-to-private_key>,

    role='<data-scientist-role>',

    warehouse='<name-of-warehouse>',

    database='<demo_db>',

    schema='<demo_table>'

)

 

cursor = conn.cursor()

cursor.execute('select ID from <schema> limit 10')

rows = cursor.fetchall()

for row in rows:

    print(row)

cursor.close()

conn.close()

 

When compared to the following code:

spark = (

    SparkSession.builder \

    .appName('SnowflakeSample') \

    .config("spark.jars","/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/snowflake-jdbc-3.12.2.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/snowflake-ingest-sdk-0.9.6.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/spark-snowflake_2.13-2.11.3-spark_3.3.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/scala-library-2.12.19.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/hadoop-azure-3.2.1.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/azure-storage-7.0.0.jar")

    .config(conf = conf) \

    .getOrCreate()

)

print(spark.version)

 

sfOptions = {

            "sfUser" : "<login>",

            "sfURL" : "<account>.east-us-2.azure.snowflakecomputing.com",

            "sfRole" : "<data-scientist-role>",

            "sfWarehouse" : "<warehouse>",

            "sfDatabase" : "<demo_database>",

            "sfSchema" : "<demo_table>",

            "pem_private_key" : <private-key-bytes>

            }

 

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

query = 'select ID from <schema> limit 10'

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \

    .options(**sfOptions) \

    .option("query",query) \

    .load()

 

It becomes clear that the spark.read.format can be difficult to run without the proper jars passed in the config.

Therefore, samples are important to be provided with infrastructure.

Previous articles: IaCResolutionsPart106.docx

Wednesday, April 17, 2024

 Given a wire grid of size N * N with N-1 horizontal edges and N-1 vertical edges along the X and Y axis respectively, and a wire burning out every instant as per the given order using three matrices A, B, C such that the wire that burns is 

(A[T], B[T] + 1), if C[T] = 0 or

(A[T] + 1, B[T]), if C[T] = 1

Determine the instant after which the circuit is broken 

     public static boolean checkConnections(int[] h, int[] v, int N) {

        boolean[][] visited = new boolean[N][N];

        dfs(h, v, visited,0,0);

        return visited[N-1][N-1];

    }

    public static void dfs(int[]h, int[]v, boolean[][] visited, int i, int j) {

        int N = visited.length;

        if (i < N && j < N && i>= 0 && j >= 0 && !visited[i][j]) {

            visited[i][j] = true;

            if (v[i * (N-1) + j] == 1) {

                dfs(h, v, visited, i, j+1);

            }

            if (h[i * (N-1) + j] == 1) {

                dfs(h, v, visited, i+1, j);

            }

            if (i > 0 && h[(i-1)*(N-1) + j] == 1) {

                dfs(h,v, visited, i-1, j);

            }

            if (j > 0 && h[(i * (N-1) + (j-1))] == 1) {

                dfs(h,v, visited, i, j-1);

            }

        }

    }

    public static int burnout(int N, int[] A, int[] B, int[] C) {

        int[] h = new int[N*N];

        int[] v = new int[N*N];

        for (int i = 0; i < N*N; i++) { h[i] = 1; v[i] = 1; }

        for (int i = 0; i < N; i++) {

            h[(i * (N)) + N - 1] = 0;

            v[(N-1) * (N) + i] = 0;

        }

        System.out.println(printArray(h));

        System.out.println(printArray(v));

        for (int i = 0; i < A.length; i++) {

            if (C[i] == 0) {

                v[A[i] * (N-1) + B[i]] = 0;

            } else {

                h[A[i] * (N-1) + B[i]] = 0;

            }

            if (!checkConnections(h,v, N)) {

                return i+1;

            }

        }

        return -1;

    }

        int[] A = new int[9];

        int[] B = new int[9];

        int[] C = new int[9];

        A[0] = 0;    B [0] = 0;    C[0] = 0;

        A[1] = 1;    B [1] = 1;    C[1] = 1;

        A[2] = 1;    B [2] = 1;    C[2] = 0;

        A[3] = 2;    B [3] = 1;    C[3] = 0;

        A[4] = 3;    B [4] = 2;    C[4] = 0;

        A[5] = 2;    B [5] = 2;    C[5] = 1;

        A[6] = 1;    B [6] = 3;    C[6] = 1;

        A[7] = 0;    B [7] = 1;    C[7] = 0;

        A[8] = 0;    B [8] = 0;    C[8] = 1;

        System.out.println(burnout(9, A, B, C));

1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 

8

Alternatively,

    public static boolean burnWiresAtT(int N, int[] A, int[] B, int[] C, int t) {

        int[] h = new int[N*N];

        int[] v = new int[N*N];

        for (int i = 0; i < N*N; i++) { h[i] = 1; v[i] = 1; }

        for (int i = 0; i < N; i++) {

            h[(i * (N)) + N - 1] = 0;

            v[(N-1) * (N) + i] = 0;

        }

        System.out.println(printArray(h));

        System.out.println(printArray(v));

        for (int i = 0; i < t; i++) {

            if (C[i] == 0) {

                v[A[i] * (N-1) + B[i]] = 0;

            } else {

                h[A[i] * (N-1) + B[i]] = 0;

            }

        }

        return checkConnections(h, v, N);

    }

    public static int binarySearch(int N, int[] A, int[] B, int[] C, int start, int end) {

        if (start == end) {

            if (!burnWiresAtT(N, A, B, C, end)){

                return end;

            }

            return  -1;

        } else {

            int mid = (start + end)/2;

            if (burnWiresAtT(N, A, B, C, mid)) {

                return binarySearch(N, A, B, C, mid + 1, end);

            } else {

                return binarySearch(N, A, B, C, start, mid);

            }

        }

    }

1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 

8


Monday, April 15, 2024

 #codingexercise 

There are N points (numbered from 0 to N−1) on a plane. Each point is colored either red ('R') or green ('G'). The K-th point is located at coordinates (X[K], Y[K]) and its color is colors[K]. No point lies on coordinates (0, 0).

We want to draw a circle centered on coordinates (0, 0), such that the number of red points and green points inside the circle is equal. What is the maximum number of points that can lie inside such a circle? Note that it is always possible to draw a circle with no points inside.

Write a function that, given two arrays of integers X, Y and a string colors, returns an integer specifying the maximum number of points inside a circle containing an equal number of red points and green points.

Examples:

1. Given X = [4, 0, 2, −2], Y = [4, 1, 2, −3] and colors = "RGRR", your function should return 2. The circle contains points (0, 1) and (2, 2), but not points (−2, −3) and (4, 4).

class Solution {

    public int solution(int[] X, int[] Y, String colors) {

        // find the maximum

        double max = Double.MIN_VALUE;

        int count = 0;

        for (int i = 0; i < X.length; i++)

        {

            double dist = X[i] * X[i] + Y[i] * Y[i];

            if (dist > max)

            {

                max = dist;

            }

        }

 

        for (double i = Math.sqrt(max) + 1; i > 0; i -= 0.1)

        {

            int r = 0;

            int g = 0;

            for (int j = 0; j < colors.length(); j++)

            {

                if (Math.sqrt(X[j] * X[j] + Y[j] * Y[j]) > i) 

                {

                    continue;

                }

 

                if (colors.substring(j, j+1).equals("R")) {

                    r++;

                }

                else {

                    g++;

                }

            }

            if ( r == g && r > 0) {

                int min = r * 2;

                if (min > count)

                {

                    count = min;

                }

            }

        }

 

        return count; 

    }

}

 

Compilation successful.

 

Example test:   ([4, 0, 2, -2], [4, 1, 2, -3], 'RGRR')

OK

 

Example test:   ([1, 1, -1, -1], [1, -1, 1, -1], 'RGRG')

OK

 

Example test:   ([1, 0, 0], [0, 1, -1], 'GGR')

OK

 

Example test:   ([5, -5, 5], [1, -1, -3], 'GRG')

OK

 

Example test:   ([3000, -3000, 4100, -4100, -3000], [5000, -5000, 4100, -4100, 5000], 'RRGRG')

OK

 


Sunday, April 14, 2024

 Write a SkipList using arbitrary choices for skip levels:

import java.util.Random;

class Skiplist {

    Skiplist next1;

    Skiplist next2;

    Skiplist next3;

    Skiplist next4;

    int data;

 

    public Skiplist() {

        next1 = null;

        next2 = null;

        next3 = null;

        next4 = null;

        data = Integer.MIN_VALUE;

    }

    

    public Skiplist searchInternal(int target) {

        Skiplist skipList = this;

        while (skipList != null && skipList.next4 != null && skipList.next4.data < target ) {

            skipList = skipList.next4;

        }

        // if (skipList == null) {skipList = next3;}

        while (skipList != null && skipList.next3 != null && skipList.next3.data < target) {

            skipList = skipList.next3;

        }

        // if (skipList == null) {skipList = next2;}

        while (skipList != null && skipList.next2 != null && skipList.next2.data < target ) {

            skipList = skipList.next2;

        }

        // if (skipList == null) {skipList = next;}

        while (skipList != null && skipList.next1 != null && skipList.next1.data < target) {

            skipList = skipList.next1;

        }

        // if (skipList != null && skipList.data == target) { return this; }

        if (skipList != null && skipList.data > target) { return null; }

        return skipList;

    }

    public boolean search(int target) {

        Skiplist skipList = searchInternal(target);

        if (skipList == null) return false;

        if (skipList.data == target) return true;

        if ( skipList.next1 != null && skipList.next1.data == target) return true;

        return false;

    }

    

    public void add(int num) {

        Skiplist skipList = searchInternal(num);

        Skiplist obj = new Skiplist();

        obj.data = num;

        if (skipList != null) {

            obj.next1 = skipList.next1;

            skipList.next1 = obj;

            if (skipList.data == Integer.MIN_VALUE) {

                 skipList.next1 = obj;

                 skipList.next2 = obj;

                 skipList.next3 = obj;

                 skipList.next4 = obj;

                 return;

            }

        } else {

            obj.next1 = this;

            obj.next2 = this;

            obj.next3 = this;

            obj.next4 = this;

            return;

        }

        Random r = new Random();

        r.nextInt(1);

        int coinFlip = r.nextInt(1);

        if (coinFlip == 0) {return;}

        if (skipList.next2 != null) {obj.next2 = skipList.next2;}

        coinFlip = r.nextInt(1);

        if (coinFlip == 0) {return;}

        if (skipList.next3 != null) {obj.next3 = skipList.next3;}

        coinFlip = r.nextInt(1);

        if (coinFlip == 0) {return;}

        if (skipList.next4 != null) {obj.next4 = skipList.next4;}

    }

    

    public boolean erase(int num) {

        Skiplist skipList = searchInternal(num);

        if (skipList == null) {

                return false;

        }

        if (skipList.data == num) {

            skipList.data = Integer.MIN_VALUE;

            return true;

        }

        if (skipList.next1 == null && skipList.data != num) {

            return false;

        }

        if (skipList.next1 != null && skipList.next1.data == num) {

            skipList.next1 = skipList.next1.next1;

            return true;

        }

        return false;

    }

}

Given an integer n, return any array containing n unique integers such that they add up to 0.

class Solution {

    public int[] sumZero(int n) {

        int[] A = new int[n];

        int start = 0-n/2;

        for (int i = 0; i < n/2; i++) {

            A[i] = start;

            start++;

        }

        int next  = n/2;

        if (n%2 == 1) {

            A[next] = 0;

            next++;

        } 

        start = 1;

        for (int i = next; i < n; i++) {

            A[i] = start;

            start++;

        }

        return A;

    }

}


Friday, April 12, 2024

 Given clock hands positions for different points of time as pairs A[I][0] and A[I][1] where the order of the hands does not matter but their angle enclosed, count the number of pairs of points of time where the angles are the same

    public static int[] getClockHandsDelta(int[][] A) {

        int[] angles = new int[A.length];

        for (int i = 0; i < A.length; i++){

            angles[i] = Math.max(A[i][0], A[i][1]) - Math.min(A[i][0],A[i][1]);

        }

        return angles;

    }

    public static int NChooseK(int n, int k)

    {

        if (k < 0 || k > n || n == 0) return 0;

        if ( k == 0 || k == n) return 1;

        return Factorial(n) / (Factorial(n-k) * Factorial(k));

    }

 

    public static int Factorial(int n) {

        if (n <= 1) return 1;

        return n * Factorial(n-1);

    }


    public static int countPairsWithIdenticalAnglesDelta(int[] angles){

        Arrays.sort(angles);

        int count = 1;

        int result = 0;

        for (int i = 1; i < angles.length; i++) {

            if (angles[i] == angles[i-1]) {

                count += 1;

            } else {

                if (count > 0) {

                    result += NChooseK(count, 2);

                }

                count = 1;

            }

        }

        if (count > 0) {

            result += NChooseK(count, 2);

            count = 0;

        }

        return result;

    }


        int [][] A = new int[5][2];

         A[0][0] = 1;    A[0][1] = 2;

         A[1][0] = 2;    A[1][1] = 4;

         A[2][0] = 4;    A[2][1] = 3;

         A[3][0] = 2;    A[3][1] = 3;

         A[4][0] = 1;    A[4][1] = 3;

 1 2 1 1 2 

1 1 1 2 2 

4

#codingexercise: https://1drv.ms/w/s!Ashlm-Nw-wnWhOw-gkwgI-LEfc-p1A?e=x18YPl