Sunday, March 31, 2024

 #codingexercise

Given  a string S and a library of strings B = {b1, ... bm}, construct an approximation of the string S by using copies of strings in B. For example, B = { abab, bbbaaa, ccbb, ccaacc} and S = abaccbbbaabbccbbccaabab.  

The solution consists of taking strings from B and assigning to non-overlapping solutions of S. Strings from B may be used multiple times. There is a cost for unmatched character in S as alpha and there is a cost for mismatched character in B as beta. The mismatch between i and j positions is the number of mismatched characters in b[j] when aligned starting with position i in s. We compute the optimal costs of positions 1 to n in the given string. We determine the optimal cost of position k as a function of the costs of the previous positions.

void GetCost(String S, List<String> B, int alpha, int beta, int[] Optimals)

{

for (int k = 1; k < S.length; k++)

{

    Optimals[k] = Optimals[k-1] + alpha;

    for (int j = 1; j < B.Length; j++)

    {

        int p = k - B[j].length;

        if (p > 0)

        Optimals[k] = Math.min(Optimals[k], Optimals[p-1] + beta x Mismatch(p,j));

     }

}

}

The purpose of the above code is to show that we find the optimal solution by evaluating the cost of each position as a candidate for the use of a string from the given list. Either the current position is a continuation of the previous or a beginning of the next. We assume the former by default even if it is a mismatch and note down the cost as alpha. Then for each of the strings, we take the substring that is permissible up to the current location and add it to the previously evaluated cost at the position where the substring fits in say p. This cost is therefore the cost computed prior to p and the number of mismatched characters in the substring times the unit cost beta. The previous computations aid subsequent computations and we use memoization to avoid recomputing cost for a position.


Saturday, March 30, 2024

 

When securing outbound access with a NAT Gateway in the Azure public cloud, we can choose between two routing options: Microsoft routing and user-defined routing. Let's discuss the benefits and drawbacks of each:

  1. Microsoft Routing: Benefits:
    • Simplicity: Microsoft routing is the default option, and it requires minimal configuration. It automatically handles routing between subnets and virtual networks.
    • Ease of management: As Microsoft handles the routing, we don't need to manage any routing tables or configurations manually.
    • Automatic failover: Microsoft routing provides built-in redundancy and automatic failover, ensuring high availability.

Drawbacks:

    • Limited control: With Microsoft routing, we have limited control over the routing decisions. We can't customize the routing paths or add specific routing rules.
    • Less flexibility: It may not be suitable for complex networking scenarios where more advanced routing options are required.
  1. User-Defined Routing: Benefits:
    • Enhanced control: User-defined routing allows us to have granular control over the routing decisions. We can define custom routing tables and specify the desired paths for outbound traffic.
    • Advanced routing capabilities: With user-defined routing, we can implement complex routing scenarios, such as policy-based routing and route filtering.
    • Integration with on-premises networks: User-defined routing enables us to establish connectivity between Azure and on-premises networks, using VPN or ExpressRoute.

Drawbacks:

    • Increased management complexity: User-defined routing requires manual configuration and management of routing tables, which can be more complex and time-consuming.
    • Potential for misconfiguration: If not properly configured, user-defined routing can lead to connectivity issues or suboptimal routing.
    • Higher cost: User-defined routing may incur additional costs due to the need for more resources and increased management effort.

Ultimately, the choice between Microsoft routing and user-defined routing depends on our specific requirements and the complexity of our networking setup. If we prefer simplicity and don't require advanced routing capabilities, Microsoft routing can be a suitable option. On the other hand, if we need more control and flexibility over routing decisions, or if we have complex networking requirements, user-defined routing may be more appropriate

Previous articles: IaCResolutionsPart100.docx

Friday, March 29, 2024

 

This is a continuation of articles on IaC shortcomings and resolutions. Specifically, we discuss the differences between managed virtual network and bring your own virtual network for shared resources like an azure data factory or an analytics workspace.

In Azure Data Factory, there are two options for creating a virtual network: Managed Virtual Network and Bring Your Own Virtual Network (BYOVN). Let's discuss the differences between these two options:

  1. Managed Virtual Network (MVNet):
    • This is the default option provided by Azure Data Factory.
    • When you create a Data Factory, Azure automatically creates a new virtual network and subnet for it.
    • MVNet allows you to manage and secure the network resources within your Data Factory, such as private endpoints, firewall rules, and network security groups.
    • You have control over the subnet address range and can configure network settings, like DNS servers and custom routes.
    • It simplifies the setup process, as Azure handles the network infrastructure for you.
  2. Bring Your Own Virtual Network (BYOVN):
    • This option allows you to use an existing virtual network in your Azure subscription.
    • With BYOVN, you can connect your Data Factory to your existing network infrastructure, making it easier to integrate with other resources and services within your network.
    • It provides more control and flexibility over your network configuration and allows you to leverage your existing network security measures.
    • BYOVN enables you to use features like service endpoints, network security groups, and custom routes that are already configured in your virtual network.
    • However, you need to ensure that your virtual network meets the necessary requirements and is compatible with Azure Data Factory.

Key differences between Managed Virtual Network and Bring Your Own Virtual Network include:

  1. Ownership and Management: With MVNet, Azure manages the virtual network and subnet for you, while with BYOVN, you own and manage the virtual network.
  2. Setup Complexity: BYOVN requires you to have an existing virtual network, which may involve more initial setup and configuration, whereas MVNet simplifies the setup process by automatically creating the necessary network resources.
  3. Integration and Flexibility: BYOVN allows for better integration with existing network resources and provides more control over network configuration, while MVNet offers a standardized and managed network environment.
  4. Network Security: Both options offer network security features like network security groups, private endpoints, and firewall rules. However, BYOVN allows you to leverage your existing network security measures, while MVNet provides a dedicated network environment managed by Azure.

We must consider our specific requirements, existing network infrastructure, and the level of control and integration we need when choosing between Managed Virtual Network and Bring Your Own Virtual Network for Azure Data Factory but going with the default managed virtual network will benefit in most cases.

Thursday, March 28, 2024

 

Find sum of subsequences of a given sequence of positive integers that are divisible by m

We could translate this problem as find if there is any subset in the sequence that has sum 1x m, 2 x m, 3 x m, ... n x m where n x m <= sequence.sum()

void PrintSubsequenceSumDivisibleBy(List<int>seq, int m)

{

   for (int sum = m; sum <= seq.sum(); sum=sum+m)

         if (IsSubSeqSum(seq, seq.Count, sum))

             Console.WriteLine(sum);

}

bool IsSubSeqSum(List<int> seq, int n, int sum)

{

if (sum == 0) return true;

if (sum < 0) return false;

if (n == 0 && sum != 0) return false;

if (seq[n-1] > sum)

     return IsSubSeqSum(seq, n-1, sum);

return IsSubSeqSum(seq, n-1, sum) || IsSubSeqSum(seq, n-1, sum-seq[n-1]);

}

 

There is an alternative way to find the sum of subsets that are divisible by m. It involves recognizing that subsets will have sum whose modulo varies over 0 to m-1. Each element of the input either counts by itself or as  part of a subset. If we populate a boolean DP table for this range of 0 to m-1 and iterate over the elements then each element sets its corresponding cell in the DP table by itself. When it is a part of a sequence, we can repeat the problem considering any of the previous modulos and setting the corresponding cell in the DP table. This subproblem only happens for each of the modulo in the range only once. Therefore the inner loop is linear and we initialize a new sum array prior to iterating. Each cell of this new sum array for 0 to m-1 can be true or false indicating that the DP table for that corresponding index will be set to true because we have now formed a subset.  When we select the candidate modulo in the inner loop per the iteration, the new sum is the displaced modulo by the inclusion of the current element, The selection only happens when the dp entry for the candidate modulo is true and the new sum entry is false indicating we are forming a subset with a prior modulo and a new subset. As subsets keep forming and as the new sums get set with true, each corresponding DP table entry gets set. Eventually we return the value of the cell corresponding to m=0. In fact, we can bail early if we encounter it any sooner.

static bool HasSubSetSumModuloMEqualsZero(List<int> arr, int n, int m)

        {

            if (n > m)

                return true;

             var DP = new bool[m];

 

             for (int i = 0; i < n; i++)

             {

                 if (DP[0])

                     return true;

                 var temp = new bool[m];

                 for (int j = 0; j < m; j++)

                 {

                     if (DP[j] == true)

                     {

                         if (DP[(j + arr[i]) % m] == false)

                             temp[(j + arr[i]) % m] = true;

                     }

                 }

                 for (int j = 0; j < m; j++)

                     if (temp[j])

                         DP[j] = true;

                 DP[arr[i] % m] = true;

             }

            return DP[0];

        }

 

Tuesday, March 26, 2024

 You are given a 0-indexed integer array nums of length n.

You can perform the following operation as many times as you want:

Pick an index i that you haven’t picked before, and pick a prime p strictly less than nums[i], then subtract p from nums[i].

Return true if you can make nums a strictly increasing array using the above operation and false otherwise.

A strictly increasing array is an array whose each element is strictly greater than its preceding element.

 

Example 1:

Input: nums = [4,9,6,10]

Output: true

Explanation: In the first operation: Pick i = 0 and p = 3, and then subtract 3 from nums[0], so that nums becomes [1,9,6,10].

In the second operation: i = 1, p = 7, subtract 7 from nums[1], so nums becomes equal to [1,2,6,10].

After the second operation, nums is sorted in strictly increasing order, so the answer is true.

Example 2:

Input: nums = [6,8,11,12]

Output: true

Explanation: Initially nums is sorted in strictly increasing order, so we don't need to make any operations.

Example 3:

Input: nums = [5,8,3]

Output: false

Explanation: It can be proven that there is no way to perform operations to make nums sorted in strictly increasing order, so the answer is false.

 

Constraints:

1 <= nums.length <= 1000

1 <= nums[i] <= 1000

nums.length == n

class Solution {

    public boolean primeSubOperation(int[] nums) {

        for (int i = 0; i < nums.length; i++) {

            int min = 0;

            if (i > 1) min = Math.max(nums[i-1], 0);

            int max = nums[i];

            int prime = getPrime(min, max);

            nums[i] -= prime;

        }

        return isIncreasing(nums);   

    }

    public boolean isIncreasing(int[] nums){

        for (int i = 1; i < nums.length; i++){

            if (nums[i] <= nums[i-1]){

                return false;

            }

        }

        return true;

    }

    public int getPrime(int min, int max) {

        for (int i = max-1; i > min; i--){

            if (isPrime(i) && (max – i > min)){

                return i;

            }

        }

        return 0;

    }

    public boolean isPrime(int n){

        for (int i = 2; i < n; i++){

            if (n % i == 0) {

                return false;

            }

        }

        return true;

    }

}


 

Business Continuity and Disaster Recovery (BCDR) is a crucial aspect of any organization's IT strategy, and Azure Public Cloud offers robust solutions to support these needs. Azure provides a comprehensive set of services and features designed to ensure the availability, resilience, and recoverability of our business-critical applications and data.

 

Here are some key aspects of Azure's BCDR capabilities:

 

High Availability: Azure offers a highly available infrastructure with redundant data centers spread across different regions globally. This ensures that our applications remain accessible even in the event of a localized disaster or outage.

 

Azure Site Recovery: Azure Site Recovery (ASR) is a disaster recovery solution that replicates and orchestrates the failover of on-premises virtual machines, Azure virtual machines, and physical servers to Azure. It provides a seamless failover experience and enables rapid recovery in case of disruptions.

 

Azure Backup: Azure Backup is a scalable and cost-effective backup solution that allows us to protect our on-premises and cloud-based workloads. It provides automated backups, incremental backups, long-term retention, and flexible recovery options.

 

Azure VM Resiliency: Azure provides various features to ensure the resiliency of our virtual machines (VMs). Availability Sets allow us to distribute VMs across multiple fault domains and update domains to minimize downtime during planned and unplanned events. Azure Availability Zones offer even higher levels of resiliency by providing physically separate data centers within an Azure region.

 

Azure Storage Replication: Azure offers multiple storage replication options, including Locally Redundant Storage (LRS), Zone Redundant Storage (ZRS), Geo-Redundant Storage (GRS), and Read-Access Geo-Redundant Storage (RA-GRS). These options allow us to choose the level of data durability and availability that best suits our needs.

 

Azure Traffic Manager: Azure Traffic Manager is a DNS-based traffic load balancer that distributes incoming traffic across multiple Azure regions or endpoints. It helps to ensure high availability and performance by directing users to the closest and most available endpoint.

 

Azure DevTest Labs: Azure DevTest Labs allows us to quickly create, deploy, and manage lab environments in Azure. It helps us test our BCDR plans and validate the recoverability of our applications and infrastructure.

 

Azure Governance and Security: Azure provides a robust set of governance and security features to protect our applications and data. These include role-based access control (RBAC), Azure Security Center, Azure Active Directory, Azure Policy, and Azure Firewall.

 

Overall, Azure Public Cloud offers a comprehensive suite of services and features to support Business Continuity and Disaster Recovery, ensuring that our critical applications and data remain available and recoverable in the event of disruptions or disasters.

Monday, March 25, 2024

 

Azure Machine Learning Data Governance:

The following is a list of practices and features to consider:

  1. Restrict Access to Resources and Operations:
  2. Authentication and Identity Management:
  3. Data Encryption:
    • Azure Machine Learning uses various compute resources and data stores on the Azure platform.
    • Data is encrypted both in transit and at rest.
    • Each resource supports encryption to maintain data security and comes with documentation.
  4. Vulnerability Scanning:
    • Vulnerabilities can be scanned in the Azure Machine Learning environment and associated container registry.
    • All mitigations for vulnerabilities follow the same method as for existing registries.
  5. Controlling Notebooks, Jobs, assets, and access control
    • Segregate notebooks by requiring GitHub integration.
    • Secure GitHub independently to allow selective people and teams.
    • Enable Jobs to use datastores.
    • Provide custom permissions to data scientists to use them in jobs
    • Ensure access control on shared adls is respected to isolate pod users.
  6. Configuration Policies:
    • Compliance is enforced by enabling audit.
    • These policies can be implemented with security postures and common modules.
    • Insights and resource graph queries can be added.
    • Azure Monitor based alerts can be setup.
    • A Metadata repository in the form of a structured database can provide knowledge management capabilities and integrate with Kusto query language.
  7. Microsoft Purview and Azure Machine Learning (AzureML) have a powerful integration that enhances data governance and responsible AI practices:
    1. ML Assets can be brought to the Microsoft Purview Data Map:
      • Azure Machine Learning introduces ML assets as a new object in Purview.
      • This integration allows you to associate ML models with the data used for training, enabling emerging ML and AI risk and governance scenarios.
      • ML models are essentially representations of data, and by linking them to their training data, there is better visibility and control over the entire ML lifecycle.
    2. Automatic Metadata Push:
      • When Azure Machine Learning workspace is registered in Microsoft Purview, metadata from the workspace is automatically pushed to Purview on a daily basis.
      • No manual scanning is required; the integration handles metadata synchronization seamlessly.
    3. Supported Capabilities:
      • When scanning the Azure Machine Learning source, Microsoft Purview supports:
        • Metadata Extraction: Extracting technical metadata from Azure Machine Learning, including workspace details, models, datasets, and jobs.
        • Lineage Tracking: Understand the lineage of ML assets and their connections to data.
        • Data Sharing: Facilitate collaboration by sharing metadata across teams.
        • Access Policy: Control access to ML assets.
        • View: ML assets can be visualized within the Purview Data Map
    4. Unified Data Governance:
      • Microsoft Purview provides a holistic view of the data landscape.
      • Features include automated data discoverysensitive data classification, and end-to-end data lineage.
      • Data consumers can access trustworthy data management, ensuring compliance and security.

        Data Scientists and Infrastructure teams can discover, track and govern ML assets throughout the MLOps Lifecycle within the context of Microsoft Purview.
        NOTE: Connecting Azure ML workspaces to Purview requires subscription owner permissions. This is true for connecting even ADF and ADLS to purview.
        The Data Curator role on the root collection of the Microsoft Purview must be assigned to the managed identity of the connected resource.


Sunday, March 24, 2024

 

Spark code execution on Azure Machine Learning workspace allows us to leverage the power of Apache Spark for big data processing and analytics tasks. Here are some key points to know about Spark code execution on Azure Machine Learning workspace:

1.      Integration: Azure Machine Learning workspace provides seamless integration with Apache Spark, allowing us to run Spark code within the workspace. This integration simplifies the process of running Spark jobs, as us don't need to set up and manage a separate Spark cluster.

2.      Scalability: Azure Machine Learning workspace enables us to scale our Spark jobs easily. We can choose the appropriate cluster size based on our workload requirements, and Azure will automatically provision and manage the necessary resources. This scalability ensures that us can handle large-scale data processing tasks efficiently.

3.      Notebook support: Azure Machine Learning workspace supports Jupyter notebooks, which are commonly used for interactive data exploration and analysis with Spark. We can write and execute Spark code in a Jupyter notebook within the workspace, making it convenient to prototype and experiment with our Spark code.

4.      Parallelism and distributed computing: Spark code execution on Azure Machine Learning workspace takes advantage of the parallel processing capabilities of Spark. It allows us to distribute our data across multiple nodes in a cluster and perform computations in parallel, thereby accelerating the processing of large datasets.

5.      Data integration: Azure Machine Learning workspace provides easy integration with various data sources, including Azure Data Lake Storage, Azure Blob Storage, and Azure SQL Database. We can seamlessly read data from these sources into Spark, perform transformations and analytics, and write the results back to the desired output location.

6.      Monitoring and management: Azure Machine Learning workspace offers monitoring and management capabilities for Spark code execution. We can track the progress of our Spark jobs, monitor resource usage, and diagnose any issues that may arise. Additionally, us can schedule and automate the execution of Spark jobs using Azure Machine Learning pipelines.

7.      Collaboration and version control: Azure Machine Learning workspace enables collaboration and version control for Spark code. We can work with our team members on Spark projects, track changes made to the code, and manage different versions of our Spark scripts. This facilitates teamwork and ensures that us can easily revert to previous versions if needed.

Overall, Spark code execution on Azure Machine Learning workspace provides a powerful and flexible platform for running large-scale data processing and analytics workloads using Apache Spark. It simplifies the management of Spark clusters, provides integration with other Azure services, and offers monitoring and collaboration capabilities to streamline our Spark-based projects.

Sample Spark Session with downloaded jars can be invoked using:

spark = (

    SparkSession.builder \

    .appName('SnowflakeSample') \

    .config("spark.jars","/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/snowflake-jdbc-3.13.29.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/spark-snowflake_2.13-2.13.0-spark_3.3.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/snowflake-common-3.1.19.jar,/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pyspark/jars/scala-library-2.13.9.jar")

    .config(conf = conf) \

    .getOrCreate()

 

Saturday, March 23, 2024

 

This is a continuation of a series of articles on Infrastructure-as-platform code, its shortcomings and resolutions.  IaC full service does not stop at just the provisioning of resources. The trust that the clients place on the IaC based deployment service is that their use cases will be enabled and remain operational without hassles. As an example, since we were discussing azure machine learning workspace, one of the use cases is to draw data from sources other than Azure provided storage accounts such as Snowflake. Execution of Snowflake on this workspace requires PySpark library and support from the java and Scala as well as jars specific to Snowflake.

This means that the workspace deployment will only be complete when the necessary prerequisites are installed. If the built-in doesn’t support, some customization is required. And in many cases, these come back to IaC configurations as much as there is automation possible via inclusion of scripts.

In this case of machine learning workspace, a custom kernel might be required for supporting snowflake workloads. Such a kernel can be installed by passing in an initialization script that writes out a kernel specification in yaml file that can in turn be used to initialize and activate the kernel. Additionally, the jars can be downloaded specific to Snowflake that includes their common library, support for Spark code execution and the official scala language execution jars.

Such a kernel might look something like this:

name: customkernel

channels:

  - conda-forge

  - defaults

dependencies:

  - python=3.11

  - numpy

  - pyspark

  - pip

  - pip:

    - azureml-core

    - ipython

    - ipykernel

    - pyspark==3.5.1

When the Spark session is started, the configuration specified can include the path to the jars. These additional steps must be taken to go the full length of onboarding customer workloads. Previous article references: IacResolutionsPart97.docx

Friday, March 22, 2024

 

You are given a 0-indexed integer array nums of length n.

You can perform the following operation as many times as you want:

  • Pick an index i that you haven’t picked before, and pick a prime p strictly less than nums[i], then subtract p from nums[i].

Return true if you can make nums a strictly increasing array using the above operation and false otherwise.

strictly increasing array is an array whose each element is strictly greater than its preceding element.

 

Example 1:

Input: nums = [4,9,6,10]

Output: true

Explanation: In the first operation: Pick i = 0 and p = 3, and then subtract 3 from nums[0], so that nums becomes [1,9,6,10].

In the second operation: i = 1, p = 7, subtract 7 from nums[1], so nums becomes equal to [1,2,6,10].

After the second operation, nums is sorted in strictly increasing order, so the answer is true.

Example 2:

Input: nums = [6,8,11,12]

Output: true

Explanation: Initially nums is sorted in strictly increasing order, so we don't need to make any operations.

Example 3:

Input: nums = [5,8,3]

Output: false

Explanation: It can be proven that there is no way to perform operations to make nums sorted in strictly increasing order, so the answer is false.

 

Constraints:

  • 1 <= nums.length <= 1000
  • 1 <= nums[i] <= 1000
  • nums.length == n

·        class Solution {

·            public boolean primeSubOperation(int[] nums) {

·                for (int i = 0; i < nums.length; i++) {

·                    int min = 0;

·                    if (i > 1) min = Math.max(nums[i-1], 0);

·                    int max = nums[i];

·                    int prime = getPrime(min, max);

·                    nums[i] -= prime;

·                }

·                return isIncreasing(nums);  

·            }

·            public boolean isIncreasing(int[] nums){

·                for (int i = 1; i < nums.length; i++){

·                    if (nums[i] <= nums[i-1]){

·                        return false;

·                    }

·                }

·                return true;

·            }

·            public int getPrime(int min, int max) {

·                for (int i = max-1; i > min; i--){

·                    if (isPrime(i) && (max – i > min)){

·                        return i;

·                    }

·                }

·                return 0;

·            }

·            public boolean isPrime(int n){

·                for (int i = 2; i < n; i++){

·                    if (n % i == 0) {

·                        return false;

·                    }

·                }

·                return true;

·            }

·        }