Tuesday, April 30, 2024

 This is a continuation of previous articles on IaC shortcomings and resolutions. One of the primary concerns with cloud-based deployment is cost and there are several built-in features at all levels of resource hierarchy and management portal to become more efficient.  Some of the mitigations translate back into the IaC where, for example, existing app services in Azure public cloud that were behind several regional Application Gateways may need to be directly associated with a consolidated global FrontDoor. Such transitions must be carefully planned as there is a chance this will affect ongoing traffic. Both source and destination might have their own DNS aliases and callers may need to eventually move to the global FrontDoor.

The steps can be easily articulated in the form of azure cli commands as requiring the creation of a new origin group within the FrontDoor backend and adding the app services as origin within the group, then creating the ruleset and route to associate with the origin group which are listed in the addendum below.

However, care must be taken to ensure that the resources with private links are not mixed with the resources without private links. So, the organization of app services might differ from the source. Another difference might be the creation of appropriate ruleset where the rules articulate a more fine-grained redirect than was possible earlier. That said, Front Door offers fewer rewriting capabilities than the source, so some selection might be involved.

Finally, it is important to prepare for the contingency of region failures so the FrontDoor can divert traffic between regions. Configuration that prevents this will likely not help with Business Continuity and Disaster Recovery initiatives. Also, probes, logging, private network access, and continuous monitoring for usage and costs will be incurred.


Addendum: steps for automation


# assuming a FrontDoor already exists that can be displayed with:

# az afd profile show --name my-fd-01 --resource-group rg-afd-01

 

az afd origin-group create \

    --resource-group rg-afd-01 \

    --origin-group-name my-fd-01-og-02 \

    --profile-name my-fd-01 \

    --probe-request-type GET \

    --probe-protocol Https \

    --probe-interval-in-seconds 120 \

    --probe-path / \

    --sample-size 4 \

    --successful-samples-required 3 \

    --additional-latency-in-milliseconds 50

 

az afd origin create \

    --resource-group rg-afd-01 \

    --host-name web-app-01.azurewebsites.net \

    --profile-name my-fd-01 \

    --origin-group-name my-fd-01-og-02 \

    --origin-name web-app-01 \

    --origin-host-header web-app-01.azurewebsites.net \

    --priority 2 \

    --weight 1000 \

    --enabled-state Enabled \

    --http-port 80 \

    --https-port 443

 

az afd origin create \

    --resource-group rg-afd-01 \

    --host-name web-app-02.azurewebsites.net \

    --profile-name my-fd-01 \

    --origin-group-name my-fd-01-og-02 \

    --origin-name web-app-02 \

    --origin-host-header web-app-02.azurewebsites.net \

    --priority 2 \

    --weight 1000 \

    --enabled-state Enabled \

    --http-port 80 \

    --https-port 443

 

az afd route create \

    --resource-group rg-afd-01 \

    --endpoint-name my-fd-01-ep \

    --profile-name my-fd-01 \

    --route-name my-fd-01-route-02 \

    --https-redirect Enabled \

    --origin-group my-fd-01-og-02 \

    --supported-protocols Https Http \

    --link-to-default-domain Enabled \

    --forwarding-protocol MatchRequest \

    --patterns-to-match /* \

    --custom-domains my-fd-01-cd

 

az afd rule-set create \

    --profile-name my-fd-01 \

    --resource-group rg-afd-01 \

    --rule-set-name ruleset02

 

az afd rule create \

    --resource-group rg-afd-01 \

    --rule-set-name ruleset02 \

    --profile-name my-fd-01  \

    --order 1 \

    --match-variable UrlPath \

    --operator Contains \

    --match-values web-app-01 \

    --rule-name rule01 \

    --action-name UrlRedirect \

    --redirect-protocol Https \

    --redirect-type Moved  \

    --custom-hostname web-app-01.azurewebsites.net

 

az afd rule create \

    --resource-group rg-afd-01 \

    --rule-set-name ruleset02 \

    --profile-name my-fd-01  \

    --order 2 \

    --match-variable UrlPath \

    --operator Contains \

    --match-values web-app-02 \

    --rule-name rule02 \

    --action-name UrlRedirect \

    --redirect-protocol Https \

    --redirect-type Moved  \

    --custom-hostname web-app-02.azurewebsites.net


Monday, April 29, 2024

 This is an article on when to go local for hosting automation and apps before eventually moving it to the cloud. In the era of cloud-first development, this still holds value. We take the specific example of building copilots locally. The alternative paradigm to local data processing is federated learning and inferences which helps with privacy preservation, improved data diversity and decentralized data ownership but works best with mature machine learning models. 

As a recap, a Copilot is an AI companion that can communicate with a user over a prompt and a response. It can be used for various services such as Azure and Security, and it respects subscription filters. Copilots help users figure out workflows, queries, code and even the links to documentation. They can even obey commands such as changing the theme to light or dark mode. Copilots are well-integrated with many connectors and types of data sources supported. They implement different Natural Language Processing models and are available in various flagship products such as Microsoft 365 and GitHub. They can help create emails, code and collaboration artifacts faster and better.    

  

This article delves into the creation of a copilot to suggest IaC code relevant to a query. It follows the same precedence as a GitHub Copilot that helps developers write code in programming languages. It is powered by the OpenAI Codex model, which is a modified production version of the Generative Pre-trained Transformer-3 aka (GPT-3). The GPT-3 AI model created by OpenAI features 175 billion parameters for language processing. This is a collaboration effort between OpenAI, Microsoft and GitHub.    

  

A copilot can be developed with no code using Azure OpenAI studio. We just need to instantiate a studio, associate a model, add the data sources, and allow the model to train. The models differ in syntactic or semantic search.  The latter uses a concept called embedding that discovers the latent meaning behind the occurrences of tokens in the given data. So, it is more inclusive than the former.  A search for time will specifically search for that keyword with the GPT-3 but a search for clock will include the references to time with a model that leverages embeddings.  Either way, a search service is required to create an index over the dataset because it facilitates fast retrieval. A database such as Azure Cosmos DB can be used to assist with vector search.   

  

At present, all these resources are created in a cloud, but their functionality can also be recreated on a local Windows machine with the upcoming release of the Windows AI Studio. This helps to train the model on documents that are available only locally. Usually, the time to set up the resources is only a couple of minutes but the time to train the model on all the data is the bulk of the duration after which the model can start making responses to the queries posed by the user. The time for the model to respond once it is trained is usually in the order of a couple of seconds.  A cloud storage account has the luxury to retain documents indefinitely and with no limit to size but the training of a model on the corresponding data accrues cost and increases with the size of the data ingested to form an index.   


Sunday, April 28, 2024

 This is an article about the hosting of automation and apps in the public cloud versus the private datacenters given that source code repository and pipelines are not part of the cloud. When it comes to hosting automation and apps, there are some key differences between the public cloud and private datacenters. Let's break it down: 1. Infrastructure Ownership: o Public Cloud: In the public cloud, the infrastructure is owned and managed by a third-party provider like Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP). Users can access and use this shared infrastructure on a payas-you-go basis. o Private Datacenters: Private datacenters, on the other hand, are owned and operated by the organization itself. This means that the organization has full control over the infrastructure and can customize it based on their specific requirements. 2. Scalability: o Public Cloud: Public cloud providers offer virtually unlimited scalability, allowing users to easily scale their resources up or down based on demand. This elasticity enables organizations to handle spikes in traffic or resource requirements without having to invest in additional hardware. o Private Datacenters: Private datacenters typically have finite resources, and scaling can be more complex and time-consuming. Organizations need to plan and provision resources in advance, which may require upfront investments. 3. Maintenance and Management: o Public Cloud: Public cloud providers handle the maintenance and management of the underlying infrastructure, including server hardware, networking, and security. This allows organizations to focus on developing and deploying their applications without worrying about infrastructure management. o Private Datacenters: In private datacenters, the organization is responsible for maintaining and managing the infrastructure, which includes tasks like hardware maintenance, security, and software updates. This requires dedicated IT personnel and can be more resource-intensive. 4. Security and Compliance: o Public Cloud: Public cloud providers invest heavily in security and compliance measures to protect customer data. They offer various security services and certifications, making it easier for organizations to achieve compliance with industry standards and regulations. o Private Datacenters: With private datacenters, organizations have more control over security measures and can implement their own security protocols. However, this also means they are solely responsible for ensuring compliance and maintaining a secure environment. It's worth mentioning that while the hosting of automation and apps can be done in the public cloud or private datacenters, the source code repository and pipelines are not inherently part of the cloud infrastructure. These components are typically managed separately, either on-premises or using cloud-based services like GitLab, GitHub, or Bitbucket. Ultimately, the choice between public cloud and private datacenters depends on factors such as scalability needs, control requirements, security considerations, and budgetary constraints. Organizations often opt for a hybrid cloud approach, leveraging both public and private infrastructure to achieve a balance between flexibility, control, and cost-effectiveness Sample automation: https://github.com/raja0034/booksonsoftwarets 


Saturday, April 27, 2024

 This is a summary of the book “Anatomy of  a Breakthrough – How to get unstuck when it matters the most” written by Adam Alter and published by Simon and Schuster in 2023. This book is about the framework for getting unstuck given that everyone can get stuck including celebrities like Brie Larson, Brian Chesky and Jeff Bezos. It is full of psychological research, anecdotes and practical tips. His roadmap to getting energized for the long run positions us for our breakthrough. This involves not giving up too quickly, identifying and solving problems now, focusing on reducing anxiety, challenging ourselves in the right increments, simplifying complex problems, and remaining curious and questioning our views. We regain momentum by getting active and boosting our motivation.

Everyone gets stuck at some point in life, whether in a job, hobby, relationship, creatively or personally. Even successful individuals, such as Brie Larson, Airbnb founders, Amazon founder Jeff Bezos, and Game of Thrones fans, have experienced long periods of stuckness. People often feel isolated and believe they are victims of terrible luck, not realizing the universal experience. Psychological phenomena underpin this skewed perception, as people tend to focus on their own difficulties while overlooking those facing others. Headwinds/tailwinds asymmetry is a phenomenon that causes people to overestimate their hardships and underestimate their good fortune.

People usually get stuck in the middle of a project or when they reach a plateau. To avoid a midcourse slump, eliminate the middle as much as possible by using narrow bracketing techniques. However, be cautious of relying on these techniques for too long, as they can lose their power over time. Success takes time and effort, and people's best ideas usually come late in the process.

Identify and solve problems early to avoid trapping you later. Three common traps can hinder problem-solving: failing to see a problem exists, assuming a problem is too small to need attention, and believing the problem is too remote to matter. To avoid these traps, slow down, notice snags, and challenge assumptions. Perform frequent reviews to prevent small problems from growing too large. Focus on reducing anxiety to move beyond paralysis and avoid perfectionism. Instead of aiming for perfection, strive for excellence and set achievable standards. Avoid the expectation of complete originality and focus on finding optimally distinct ideas. Challenge yourself in the right increments, not too much too fast. Researchers found a sweet spot in the ratio of success to failure, where one failure out of every five or six attempts is optimal. Accept failure as a natural part of progress and view it as a signal for stepping out of your comfort zone and engaging with challenges that foster learning.

Challenges can come with hardships, such as discomfort or fear. To increase your capacity to undertake challenges, apply the "hardship inoculation" approach, which involves a small dose of a virus before full exposure to the disease. This approach can help overcome discomforts like fear and disappointment.

Simplify complex problems by conducting a "friction audit," which helps identify friction points that cause bottlenecks, struggles, waste, or failure. By removing or simplifying these points, you can move towards a solution.

Remaining curious and questioning your views can reduce the chances of getting stuck. Experimenting to reveal new techniques and strategies can lead to breakthroughs. Questioning and wondering can help revive curiosity and help you learn.

Regain momentum by becoming active and boosting motivation. Focus on taking actions where you excel, such as observing, identifying problems, and collecting data. By applying these strategies, you can overcome obstacles and move forward in life.


Friday, April 26, 2024

 

Integer Arrays A and B are grid line co-ordinates along x and y-axis respectively bounded by a rectangle of size X * Y. If the sizes of the sub-rectangles withing the bounding rectangle were arranged in the descending order, return the size of the Kth rectangle.                          

   

   public static int getKthRectangle(int X, int Y, int K, int [] A, int[] B){
        int[] width = calculateSideLength(A, X);
        int[] height = calculateSideLength(B, Y);

      Arrays.sort(width);
        Arrays.sort(height);
        int start = 1;
        int end = width[width.length-1] * height[height.length-1];
        int result = 0;
        while (start <= end) {
            int mid = (start + end)/2;
            if (greater(X, Y, mid, width, height) >= K) {
                start = mid+1;
                result = mid;
            } else {
                end = mid - 1;
            }
        }
        return result;
    }

    public static int greater(int X, int Y, int mid, int[] width, int[] height){
        int N = width.length;
        int result = 0;
        int j = N-1;
        for (int i = 0; i < N; i++){
            while (j >= 0 && width[i] * height[j] >= mid) {
                j--;
            }
            result += N-1-j;
        }
        return result;
    }

  public static int[] calculateSideLength(int[] X, int M) {
    int[] length = new int[X.length+1];
    for (int i = 0; i < X.length; i++){
        if ( i == 0) {
            length[i] = X[i] - 0;
        } else {
            length[i] = X[i] - X[i-1];
        }
    }
    length[X.length] = M - X[X.length-1];
    return length;
}

A: 1 3  X: 6

B: 1 5  Y: 7

width: 1 2 3

height: 1 4 2

K’th rectangle size: 6
The calculation of the side length remains the same for both X-axis and Y-axis.

 

Thursday, April 25, 2024

Generative Artificial Intelligence (AI) refers to a subset of AI algorithms and models that can generate new and original content, such as images, text, music, or even entire virtual worlds. Unlike other AI models that rely on pre-existing data to make predictions or classifications, generative AI models create new content based on patterns and information they have learned from training data.

One of the most well-known examples of generative AI is Generative Adversarial Networks (GANs). GANs consist of two neural networks: a generator and a discriminator. The generator network creates new content, while the discriminator network evaluates the content and provides feedback to the generator. Through an iterative process, both networks learn and improve their performance, resulting in the generation of more realistic and high-quality content.

Generative AI has made significant advancements in various domains. In the field of computer vision, generative models can create realistic images or even generate entirely new images based on certain prompts or conditions. In natural language processing, generative models can generate coherent and contextually relevant text, making them useful for tasks like text summarization, translation, or even creative writing.

However, it is important to note that generative AI models can sometimes produce biased or inappropriate content, as they learn from the data they are trained on, which may contain inherent biases. Ensuring ethical and responsible use of generative AI is an ongoing challenge in the field.

Generative AI also presents exciting opportunities for creative industries. Artists can use generative models as tools to inspire their work or create new forms of art. Musicians can leverage generative AI models to compose music or generate novel melodies.

Overall, generative AI holds great potential for innovation and creativity, but it also raises important ethical considerations that need to be addressed to ensure its responsible and beneficial use in various domains.

Some examples of text generation models include ChatGPT, Copilot, Gemini, and LLaMA which are often collectively referred to as chatbots. They generate human-like responses to queries. Image generation models include Stable Diffusion, Midjourney, and DALL-E which create images from textual descriptions. Video generation models include Sora which can produce videos based on prompts. Other domains where Generative AI finds applications are in software development, healthcare, finance, entertainment, customer service, sales, marketing, art, writing, fashion, and product design.

In Azure, a copilot can be developed with no code using Azure OpenAI studio. We just need to instantiate a studio, associate a model, add the data sources, and allow the model to train. The models differ in syntactic or semantic search. The latter uses a concept called embedding that discovers the latent meaning behind the occurrences of tokens in the given data. So, it is more inclusive than the former. A search for time will specifically search for that keyword with the GPT-3 but a search for clock will include the references to time with a model that leverages embeddings. Either way, a search service is required to create an index over the dataset because it facilitates fast retrieval. A database such as Azure Cosmos DB can be used to assist with vector search.


Wednesday, April 24, 2024

 This is a continuation of previous articles on IaC shortcomings and resolutions. In this section, we discuss the IP restrictions between sender and receiver in cloud resources. Let us take the example of an application gateway and app services behind the gateway.  As a layer 7 resource, http proxy and http request rewrite capabilities, the gateway routes traffic by path to different app services. These app services can allow promiscuous web traffic or specify origin ip restrictions via ip restriction rules. With the rules specifying origin as gateway, there is some hardening but there is no restrictions to who the caller are behind the gateway and app services have the potential to determine that as well.

This is where some co-operation is needed between the application gateway and the app services. The gateway automatically adds additional headers to indicate the source ip of the traffic. The app services would have a rule for the gateway indicating the source ip block for the gateway typically in an ip-address-with-/32 prefix notation and additional filters for matching the values of anticipated source ip ranges behind the gateway. There are four headers in all out of which the x_forwarded_host and x_forwarded_for bear the hostnames and the ip ranges. 

The x_forwarded_host seldom works when the gateway hostname is specified. This is not a shortcoming but header is used to identify the original host requested by the client in the Host HTTP request header and since name to ip resolutions might involve several layers of resolution, the values specified in the header for the clients might not agree. It is useful in the case where the reverse proxies such as load balancers or CDNs are involved, the host names and ports may differ from the origin server handling the request. With SSL termination, the source IP becomes that of the gateway and so this header preserves the original url’s location. Leaving this header blank and using the other header works in most deployments.

The X-Forwarded-For header provides information about the client’s IP address (or a chain of proxy IP addresses) that initiated the request. When requests pass through multiple proxies or load balancers, each proxy adds its own IP address to the X-Forwarded-For header. Azure App Services can use this header to determine the original client IP address, even if the request came through intermediate proxies. Specifying the CIDR for IP ranges in a comma separated string works in most deployments to match the clients with the requests and allow them selectively through the gateway’s firehose traffic.

Together with source ip restrictions, these headers enable sufficient hardening to the web traffic at the app service.


Tuesday, April 23, 2024

 This is a continuation of previous articles on IaC shortcomings and resolutions. While IaC code can be used deterministically to repeatedly create, update, and delete cloud resources, there are some dependencies that are managed by the resources themselves and become a concern for the end user when they are not properly cleaned up. Take for instance, the load balancers that compute instances and clusters create when they are provisioned using the Azure Machine Learning Workspaces. These are automatically provisioned. The purpose of this load balancer is to manage traffic even when the compute instance or cluster is stopped. Each compute instance has one load balancer associated with it, and for every 50 nodes in a compute cluster, one standard load balancer is billed. The load balancer ensures that requests are distributed evenly across the available compute resources, improving performance and availability. Each load balancer is billed at approximately $0.33 per day. If we have multiple compute instances, each one will have its own load balancer. For compute clusters, the load balancer cost is based on the total number of nodes in the cluster. One way to avoid load balancer costs on stopped compute instances and clusters, is to delete the compute resources when they are not in use. The IaC can help with the delete of the resources but whether the action is automated or manual, it is contingent on the delete of the load balancers and when delete fails for reasons such as locks on load balancers, then the user is left with a troublesome situation.

An understanding of the load balancer might help put things in perspective especially when trying to find them to unlock or delete. Many cloud resources and Azure Batch services create load balancers and the ways to distinguish them vary from resource groups, tags, or properties. These load balancers play a crucial role in distributing network traffic evenly across multiple compute resources to optimize performance and ensure high availability, they use various algorithms such as round-robin, least connections, or source IP affinity, to distribute incoming traffic to the available compute resources. This helps in maintaining a balanced workload and preventing any single resource from being overwhelmed. They also contribute to high availability by continuously monitoring the health of the compute resources. If a resource becomes unhealthy or unresponsive, the load balancer automatically redirects traffic to other healthy resources. They can seamlessly handle an increase in traffic by automatically scaling up the number of compute resources. Azure Machine Learning Workspace load balancers can scale up or down based on predefined rules or metrics, ensuring that the resources can handle the workload efficiently. Load balancing rules determine how traffic should be distributed. Rules can be configured based on protocols, ports, or other attributes to ensure that the traffic is routed correctly. Load balancers continuously monitor the health of the compute resources by sending health probes to check their responsiveness. If a resource fails the health probe, it is marked as unhealthy, and traffic is redirected to other healthy resources. Azure Machine Learning Workspace supports both internal and public load balancers. Internal load balancers are used for internal traffic within a virtual network, while public load balancers handle traffic from the internet. They can be seamlessly integrated with other Azure services, such as virtual networks, virtual machines, and container services, to build scalable and highly available machine learning solutions. Overall, load balancers in Azure Machine Learning Workspace play a critical role in optimizing performance, ensuring high availability, and handling increased traffic by evenly distributing it across multiple compute resources.

Creating the compute with node public ip set to false and disabling local auth can prevent load balancers from being created but if endpoints are involved, the Azure Batch Service will create them. Load balancers, public ip addresses and associated dependencies are created in the resource group of the virtual network and not the resource group of the machine learning workspace. Finding the load balancers and taking appropriate action on them can allow the compute resources to be cleaned up. This can be done on an ad hoc basis or scheduled basis.

Monday, April 22, 2024

 This is a continuation of a previous article on IaC shortcomings and resolutions. With regard to Azure Machine Learning Workspace, here is a sample request and response:

1. Go to https://learn.microsoft.com/en-us/rest/api/azureml/compute/create-or-update?view=rest-azureml-2023-10-01&tabs=HTTP#code-try-0 and signin with your secondary account:


Specify the following:

PUT https://management.azure.com/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.MachineLearningServices/workspaces/<ml-workspace-name>/computes/<compute-name>?api-version=2023-10-01

Authorization: Bearer <automatically created access token>

Content-type: application/json

{

  "properties": {

    "properties": {

      "vmSize": "STANDARD_DS11_V2",

      "subnet": {

        "id": "/subscriptions/<subscription-id>/resourceGroups/<rg-vnet-name>/providers/Microsoft.Network/virtualNetworks/<vnet-name>/subnets/<subnet-name>"

      },

      "applicationSharingPolicy": "Shared",

      "computeInstanceAuthorizationType": "personal",

      "enableNodePublicIp": false,

      "disableLocalAuth": true,

      "location": "centralus",

      "scaleSettings": {

        "maxNodeCount": 1,

        "minNodeCount": 0,

        "nodeIdleTimeBeforeScaleDown": "PT60M"

      }

    },

    "computeType": "AmlCompute",

    "disableLocalAuth": true

  },

  "location": "centralus",

  "disableLocalAuth": true

}



2. Check the response code to match as shown:

Response Code: 201

azure-asyncoperation: https://management.azure.com/subscriptions/<subscription-id>/providers/Microsoft.MachineLearningServices/locations/centralus/computeOperationsStatus/f6dcbe07-99cf-4bf7-aa71-0fdcfc542941?api-version=2023-10-01&service=new

cache-control: no-cache

content-length: 1483

content-type: application/json; charset=utf-8

date: Sat, 20 Apr 2024 02:28:50 GMT

expires: -1

pragma: no-cache

request-context: appId=cid-v1:2d2e8e63-272e-4b3c-8598-4ee570a0e70d

strict-transport-security: max-age=31536000; includeSubDomains

x-aml-cluster: vienna-centralus-02

x-content-type-options: nosniff

x-ms-correlation-request-id: f15d6510-5d21-426a-98e5-aa800322da83

x-ms-ratelimit-remaining-subscription-writes: 1199

x-ms-request-id: f15d6510-5d21-426a-98e5-aa800322da83

x-ms-response-type: standard

x-ms-routing-request-id: NORTHCENTRALUS:20240420T022850Z:f15d6510-5d21-426a-98e5-aa800322da83

x-request-time: 0.257


Sunday, April 21, 2024

 Given clock hands positions for different points of time as pairs A[I][0] and A[I][1] where the order of the hands does not matter but their angle enclosed, count the number of pairs of points of time where the angles are the same

    public static int[] getClockHandsDelta(int[][] A) {

        int[] angles = new int[A.length];

        for (int i = 0; i < A.length; i++){

            angles[i] = Math.max(A[i][0], A[i][1]) - Math.min(A[i][0],A[i][1]);

        }

        return angles;

    }

    public static int NChooseK(int n, int k)

    {

        if (k < 0 || k > n || n == 0) return 0;

        if ( k == 0 || k == n) return 1;

        return Factorial(n) / (Factorial(n-k) * Factorial(k));

    }

 

    public static int Factorial(int n) {

        if (n <= 1) return 1;

        return n * Factorial(n-1);

    }


    public static int countPairsWithIdenticalAnglesDelta(int[] angles){

        Arrays.sort(angles);

        int count = 1;

        int result = 0;

        for (int i = 1; i < angles.length; i++) {

            if (angles[i] == angles[i-1]) {

                count += 1;

            } else {

                if (count > 0) {

                    result += NChooseK(count, 2);

                }

                count = 1;

            }

        }

        if (count > 0) {

            result += NChooseK(count, 2);

            count = 0;

        }

        return result;

    }


        int [][] A = new int[5][2];

         A[0][0] = 1;    A[0][1] = 2;

         A[1][0] = 2;    A[1][1] = 4;

         A[2][0] = 4;    A[2][1] = 3;

         A[3][0] = 2;    A[3][1] = 3;

         A[4][0] = 1;    A[4][1] = 3;

 1 2 1 1 2 

1 1 1 2 2 

4


Saturday, April 20, 2024

 This is a continuation of previous articles on IaC shortcomings and resolutions. No infrastructure is useful without considerations for usability. As with the earlier example of using Azure Machine Learning workspace to train models using Snowflake data source, some consideration must be given to allow connections to data source and importing data. We cited resolving versions between Spark, Scala and Snowflake libraries within the kernel to allow data to be imported into a dataframe for use with SQL and this could be difficult for end-users if they were to locate the jars and download themselves. While the infrastructure could provide pre-configured kernels such as Almond kernel with appropriate jars such as for Scala, some samples might ease the task for datascientists wrangling with Snowflake data on existing workspaces.

For example, they could stage their action in multiple steps with pulling the data from snowflake and then loading it into a dataframe.

This is a sample code to do so:

from pyspark.sql import SparkSession

from pyspark.sql.types import StructType, StructField, StringType


spark = SparkSession.builder.appName("BytesToDataFrame").getOrCreate()


# Sample raw bytes (replace with your actual data from Snowflake using snowflake-connector cursor)

# https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-example

# either using 

# a)               df = pd.DataFrame(cursor.fetchall())

# or

# b)               df = cursor.fetch_pandas_all()

# or

# c)


raw_bytes = [b'\xba\xed\x85\x8e\x91\xd4\xc7\xb0', b'\xba\xed\x85\x8e\x91\xd4\xc7\xb1']

schema = StructType([StructField("id", StringType(), True)])

rdd = spark.sparkContext.parallelize(raw_bytes)

df = spark.createDataFrame(rdd, schema=schema)

df.show()


In this example, the data is retrieved first with a cursor and then loaded into a dataframe.


Friday, April 19, 2024

 

This is a continuation of previous articles on allowing Spark/Scala/Snowflake code to execute on Azure Machine Learning Compute. The built-in Jupyter kernel of “Azure ML – Python 3.8” does not have pyspark and we discussed the choices of downloading version compatible jars as well as alternative code to get data from Snowflake.

In this article, we will review the steps to set up a Jupyter notebook for Snowpark Scala. An “Almond” kernel can be used to setup Scala and coursier can be used to install the Almond Kernel to specify a supported version of Scala. The Almond Kernel has a prerequisite that a Java Virtual Machine needs to be installed on the system. This can be done by installing AdoptOpenJDK version 8. Then Almond can be fetched with coursier, by downloading its release and running the executable with the command line parameters to install Almond and Scala. Coursier is a Scala application that makes it easy to manage artifacts. It can setup the Scala development environment by downloading and caching artifacts from the web.

The Jupyter notebook for Snowpark can then be configured by defining a variable for the path and directory for classes generated by the Scala REPL and creating it. The Scala REPL generates classes for the Scala code that user writes. This process of configuring the compiler is not complete without adding the directory created earlier as a dependency of the REPL interpreter. Next we create a new session in SnowPark with an example as below:

import $ivy.`com.snowflake:snowpark:1.12.0`

import com.snowflake.snowpark._

import com.snowflake.snowpark.functions._

val session = Session.builder.configs(Map(

    "URL" -> "https://<account_identifier>.snowflakecomputing.com",

    "USER" -> "<username>",

    "PASSWORD" -> "<password>",

    "ROLE" -> "<role_name>",

    "WAREHOUSE" -> "<warehouse_name>",

    "DB" -> "<database_name>",

    "SCHEMA" -> "<schema_name>"

)).create

session.addDependency(replClassPath)

and then the Ammonite Kernel classes can be added for the code.

The session can be used to run a SQL query and populate a dataframe which can then be used independent of the data source.

Previous articles: IaCResolutionsPart107.docx

Thursday, April 18, 2024

 

This is a continuation of previous articles on IaC shortcomings and resolutions. No infrastructure is useful without considerations for usability. As with the earlier example of using Azure Machine Learning workspace to train models using Snowflake data source, some consideration must be given to allow connections to data source and importing data. We cited resolving versions between Spark, Scala and Snowflake libraries within the kernel to allow data to be imported into a dataframe for use with SQL and this could be difficult for end-users if they were to locate the jars and download themselves.

One way to resolve this would be to use a different coding style as shown below:

import snowflake.connector

conn = snowflake.connector.connect(

    account=’<snowflake_account>’,

    host='<account>.east-us-2.azure.snowflakecomputing.com',

    user='<login>',

    private_key=<bytes-to-private_key>,

    role='<data-scientist-role>',

    warehouse='<name-of-warehouse>',

    database='<demo_db>',

    schema='<demo_table>'

)

 

cursor = conn.cursor()

cursor.execute('select ID from <schema> limit 10')

rows = cursor.fetchall()

for row in rows:

    print(row)

cursor.close()

conn.close()

 

When compared to the following code:

spark = (

    SparkSession.builder \

    .appName('SnowflakeSample') \

    .config("spark.jars","/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/snowflake-jdbc-3.12.2.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/snowflake-ingest-sdk-0.9.6.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/spark-snowflake_2.13-2.11.3-spark_3.3.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/scala-library-2.12.19.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/hadoop-azure-3.2.1.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/azure-storage-7.0.0.jar")

    .config(conf = conf) \

    .getOrCreate()

)

print(spark.version)

 

sfOptions = {

            "sfUser" : "<login>",

            "sfURL" : "<account>.east-us-2.azure.snowflakecomputing.com",

            "sfRole" : "<data-scientist-role>",

            "sfWarehouse" : "<warehouse>",

            "sfDatabase" : "<demo_database>",

            "sfSchema" : "<demo_table>",

            "pem_private_key" : <private-key-bytes>

            }

 

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

query = 'select ID from <schema> limit 10'

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \

    .options(**sfOptions) \

    .option("query",query) \

    .load()

 

It becomes clear that the spark.read.format can be difficult to run without the proper jars passed in the config.

Therefore, samples are important to be provided with infrastructure.

Previous articles: IaCResolutionsPart106.docx

Wednesday, April 17, 2024

 Given a wire grid of size N * N with N-1 horizontal edges and N-1 vertical edges along the X and Y axis respectively, and a wire burning out every instant as per the given order using three matrices A, B, C such that the wire that burns is 

(A[T], B[T] + 1), if C[T] = 0 or

(A[T] + 1, B[T]), if C[T] = 1

Determine the instant after which the circuit is broken 

     public static boolean checkConnections(int[] h, int[] v, int N) {

        boolean[][] visited = new boolean[N][N];

        dfs(h, v, visited,0,0);

        return visited[N-1][N-1];

    }

    public static void dfs(int[]h, int[]v, boolean[][] visited, int i, int j) {

        int N = visited.length;

        if (i < N && j < N && i>= 0 && j >= 0 && !visited[i][j]) {

            visited[i][j] = true;

            if (v[i * (N-1) + j] == 1) {

                dfs(h, v, visited, i, j+1);

            }

            if (h[i * (N-1) + j] == 1) {

                dfs(h, v, visited, i+1, j);

            }

            if (i > 0 && h[(i-1)*(N-1) + j] == 1) {

                dfs(h,v, visited, i-1, j);

            }

            if (j > 0 && h[(i * (N-1) + (j-1))] == 1) {

                dfs(h,v, visited, i, j-1);

            }

        }

    }

    public static int burnout(int N, int[] A, int[] B, int[] C) {

        int[] h = new int[N*N];

        int[] v = new int[N*N];

        for (int i = 0; i < N*N; i++) { h[i] = 1; v[i] = 1; }

        for (int i = 0; i < N; i++) {

            h[(i * (N)) + N - 1] = 0;

            v[(N-1) * (N) + i] = 0;

        }

        System.out.println(printArray(h));

        System.out.println(printArray(v));

        for (int i = 0; i < A.length; i++) {

            if (C[i] == 0) {

                v[A[i] * (N-1) + B[i]] = 0;

            } else {

                h[A[i] * (N-1) + B[i]] = 0;

            }

            if (!checkConnections(h,v, N)) {

                return i+1;

            }

        }

        return -1;

    }

        int[] A = new int[9];

        int[] B = new int[9];

        int[] C = new int[9];

        A[0] = 0;    B [0] = 0;    C[0] = 0;

        A[1] = 1;    B [1] = 1;    C[1] = 1;

        A[2] = 1;    B [2] = 1;    C[2] = 0;

        A[3] = 2;    B [3] = 1;    C[3] = 0;

        A[4] = 3;    B [4] = 2;    C[4] = 0;

        A[5] = 2;    B [5] = 2;    C[5] = 1;

        A[6] = 1;    B [6] = 3;    C[6] = 1;

        A[7] = 0;    B [7] = 1;    C[7] = 0;

        A[8] = 0;    B [8] = 0;    C[8] = 1;

        System.out.println(burnout(9, A, B, C));

1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 

8

Alternatively,

    public static boolean burnWiresAtT(int N, int[] A, int[] B, int[] C, int t) {

        int[] h = new int[N*N];

        int[] v = new int[N*N];

        for (int i = 0; i < N*N; i++) { h[i] = 1; v[i] = 1; }

        for (int i = 0; i < N; i++) {

            h[(i * (N)) + N - 1] = 0;

            v[(N-1) * (N) + i] = 0;

        }

        System.out.println(printArray(h));

        System.out.println(printArray(v));

        for (int i = 0; i < t; i++) {

            if (C[i] == 0) {

                v[A[i] * (N-1) + B[i]] = 0;

            } else {

                h[A[i] * (N-1) + B[i]] = 0;

            }

        }

        return checkConnections(h, v, N);

    }

    public static int binarySearch(int N, int[] A, int[] B, int[] C, int start, int end) {

        if (start == end) {

            if (!burnWiresAtT(N, A, B, C, end)){

                return end;

            }

            return  -1;

        } else {

            int mid = (start + end)/2;

            if (burnWiresAtT(N, A, B, C, mid)) {

                return binarySearch(N, A, B, C, mid + 1, end);

            } else {

                return binarySearch(N, A, B, C, start, mid);

            }

        }

    }

1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 

8


Monday, April 15, 2024

 #codingexercise 

There are N points (numbered from 0 to N−1) on a plane. Each point is colored either red ('R') or green ('G'). The K-th point is located at coordinates (X[K], Y[K]) and its color is colors[K]. No point lies on coordinates (0, 0).

We want to draw a circle centered on coordinates (0, 0), such that the number of red points and green points inside the circle is equal. What is the maximum number of points that can lie inside such a circle? Note that it is always possible to draw a circle with no points inside.

Write a function that, given two arrays of integers X, Y and a string colors, returns an integer specifying the maximum number of points inside a circle containing an equal number of red points and green points.

Examples:

1. Given X = [4, 0, 2, −2], Y = [4, 1, 2, −3] and colors = "RGRR", your function should return 2. The circle contains points (0, 1) and (2, 2), but not points (−2, −3) and (4, 4).

class Solution {

    public int solution(int[] X, int[] Y, String colors) {

        // find the maximum

        double max = Double.MIN_VALUE;

        int count = 0;

        for (int i = 0; i < X.length; i++)

        {

            double dist = X[i] * X[i] + Y[i] * Y[i];

            if (dist > max)

            {

                max = dist;

            }

        }

 

        for (double i = Math.sqrt(max) + 1; i > 0; i -= 0.1)

        {

            int r = 0;

            int g = 0;

            for (int j = 0; j < colors.length(); j++)

            {

                if (Math.sqrt(X[j] * X[j] + Y[j] * Y[j]) > i) 

                {

                    continue;

                }

 

                if (colors.substring(j, j+1).equals("R")) {

                    r++;

                }

                else {

                    g++;

                }

            }

            if ( r == g && r > 0) {

                int min = r * 2;

                if (min > count)

                {

                    count = min;

                }

            }

        }

 

        return count; 

    }

}

 

Compilation successful.

 

Example test:   ([4, 0, 2, -2], [4, 1, 2, -3], 'RGRR')

OK

 

Example test:   ([1, 1, -1, -1], [1, -1, 1, -1], 'RGRG')

OK

 

Example test:   ([1, 0, 0], [0, 1, -1], 'GGR')

OK

 

Example test:   ([5, -5, 5], [1, -1, -3], 'GRG')

OK

 

Example test:   ([3000, -3000, 4100, -4100, -3000], [5000, -5000, 4100, -4100, 5000], 'RRGRG')

OK

 


Sunday, April 14, 2024

 Write a SkipList using arbitrary choices for skip levels:

import java.util.Random;

class Skiplist {

    Skiplist next1;

    Skiplist next2;

    Skiplist next3;

    Skiplist next4;

    int data;

 

    public Skiplist() {

        next1 = null;

        next2 = null;

        next3 = null;

        next4 = null;

        data = Integer.MIN_VALUE;

    }

    

    public Skiplist searchInternal(int target) {

        Skiplist skipList = this;

        while (skipList != null && skipList.next4 != null && skipList.next4.data < target ) {

            skipList = skipList.next4;

        }

        // if (skipList == null) {skipList = next3;}

        while (skipList != null && skipList.next3 != null && skipList.next3.data < target) {

            skipList = skipList.next3;

        }

        // if (skipList == null) {skipList = next2;}

        while (skipList != null && skipList.next2 != null && skipList.next2.data < target ) {

            skipList = skipList.next2;

        }

        // if (skipList == null) {skipList = next;}

        while (skipList != null && skipList.next1 != null && skipList.next1.data < target) {

            skipList = skipList.next1;

        }

        // if (skipList != null && skipList.data == target) { return this; }

        if (skipList != null && skipList.data > target) { return null; }

        return skipList;

    }

    public boolean search(int target) {

        Skiplist skipList = searchInternal(target);

        if (skipList == null) return false;

        if (skipList.data == target) return true;

        if ( skipList.next1 != null && skipList.next1.data == target) return true;

        return false;

    }

    

    public void add(int num) {

        Skiplist skipList = searchInternal(num);

        Skiplist obj = new Skiplist();

        obj.data = num;

        if (skipList != null) {

            obj.next1 = skipList.next1;

            skipList.next1 = obj;

            if (skipList.data == Integer.MIN_VALUE) {

                 skipList.next1 = obj;

                 skipList.next2 = obj;

                 skipList.next3 = obj;

                 skipList.next4 = obj;

                 return;

            }

        } else {

            obj.next1 = this;

            obj.next2 = this;

            obj.next3 = this;

            obj.next4 = this;

            return;

        }

        Random r = new Random();

        r.nextInt(1);

        int coinFlip = r.nextInt(1);

        if (coinFlip == 0) {return;}

        if (skipList.next2 != null) {obj.next2 = skipList.next2;}

        coinFlip = r.nextInt(1);

        if (coinFlip == 0) {return;}

        if (skipList.next3 != null) {obj.next3 = skipList.next3;}

        coinFlip = r.nextInt(1);

        if (coinFlip == 0) {return;}

        if (skipList.next4 != null) {obj.next4 = skipList.next4;}

    }

    

    public boolean erase(int num) {

        Skiplist skipList = searchInternal(num);

        if (skipList == null) {

                return false;

        }

        if (skipList.data == num) {

            skipList.data = Integer.MIN_VALUE;

            return true;

        }

        if (skipList.next1 == null && skipList.data != num) {

            return false;

        }

        if (skipList.next1 != null && skipList.next1.data == num) {

            skipList.next1 = skipList.next1.next1;

            return true;

        }

        return false;

    }

}

Given an integer n, return any array containing n unique integers such that they add up to 0.

class Solution {

    public int[] sumZero(int n) {

        int[] A = new int[n];

        int start = 0-n/2;

        for (int i = 0; i < n/2; i++) {

            A[i] = start;

            start++;

        }

        int next  = n/2;

        if (n%2 == 1) {

            A[next] = 0;

            next++;

        } 

        start = 1;

        for (int i = next; i < n; i++) {

            A[i] = start;

            start++;

        }

        return A;

    }

}