Monday, February 26, 2024

 

Part 2 – Reducing operational costs of chatbot model deployment.

This is the second part of the chatbot application discussion here.

The following strategies are required to reduce operational costs for the deployed chat model otherwise even idle ones can incur about a thousand dollars per month.

1.       The app service plan for the app service that hosts the chat user interface must be reviewed for CPU, memory and storage.

2.       It should be set to scale dynamically.

3.       Caching mechanisms must be implemented to reduce the load to the app service. Azure Redis cache can help in this regard.

4.       If the user interface has significant assets in terms of JavaScripts and images, Content Delivery Networks could be leveraged.

5.       CDNs reduce latency and offload the traffic from the app service to distributed mirrors.

6.       It might be hard to envision the model as a database but vector storage is  used and there is an index as well. It is not just about embeddings matrix.  Choosing the appropriate database tier and sku and optimizing the queries can help with the cost.

7.       Monitoring and alerts can help to proactively identify performance bottlenecks, resource spikes and anomalies.

8.       Azure Monitor and application insights can track metrics, diagnose issues, and optimize resource usage.

9.       If the chat model experiences idle periods, then the associated resources can be stopped and scaled down during those times.

10.   You don’t need the OpenAI service APIs. You only need the model APIs. Note the following:

a.       Azure OpenAI Model API: this is the API to the GPT models used for text similarity, chat and traditional completion tasks.

b.       Azure OpenAI service API: this encompasses not just the models but also the security, encryption, deployment and management functionalities to deploy models, manage endpoints and control access.

c.       Azure OpenAI Search API allows the chatbot model to retrieve from various data sources.

11.   Storing the vectors and the embeddings and querying the search APIs does not leverage the service API. The model APIs are a must so include that in the deployment but trim the data sources to just your data.

12.   Sample deployment:

 

Sunday, February 25, 2024

 

Exporting and using Word Embeddings:

The following is the example to use word embeddings externally:

import openai

 

# Set up Azure OpenAI configuration

openai.api_type = "azure"

openai.api_version = "2022-12-01"

openai.api_base = "YOUR_RESOURCE_NAME.openai.azure.com"  # Replace with your resource endpoint

openai.api_key = "YOUR_API_KEY"  # Replace with your API key

 

# Initialize the embedding model

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)

 

# Generate embeddings for your text

response = openai.Embedding.create(input="Sample Document goes here", engine="YOUR_DEPLOYMENT_NAME")

embeddings = response['data'][0]['embedding']

print(embeddings)

Saturday, February 24, 2024

 

Azure Machine Learning Datastore differentiations:

This is probably going to be an easy read compared to the previous articles referenced below. The problem an az ml workspace administrator wants to tackle is create different datastore objects so that user A gets one datastore but not others and user B gets another datastore but not others. Permissions are granted by roles and both users A and B have custom roles that have granted the permission to read the datastores with the following enumeration:

-          Microsoft.MachineLearningServices/workspaces/datastores/read

 

This permission does not say Datastore1/read but not Datastore2/read. In fact, both users must get the generic datastores/read permission that the cannot do without. Access controls cannot be granted to datastores as they can be given to files.

The solution to this problem is fairly simple. There are no datastores created by the administrator. Instead, the users create the datastores programmatically passing it either the Shared-Access-Signature Token to an external data storage or an account key. Either way, they must have access to their storage-account/container/path/to/file and can create the SAS token at their choice of scope.

The creation and use of datastores are just like that of credentials or connection objects required for a database. As long as the users manage it themselves, they can reuse it at their will.

If the administrator must be tasked with isolating access to the users to their workspace components and objects, then two workspaces will be created and assigned to groups to which these users can subscribe individually.

If we refer to the copilots for information on this topic, it will be a false positive that custom roles and Role-based Access Control will solve this for you. It will not be wrong in asserting “By properly configuring RBAC, we can control access to datastores and other resources within the workspace” but it is simply not recognizing that the differentiation is being made to the objects of the same kind. That said, there will be a full commentary on the other mechanisms available that include

Role-based Access Control, access control at external resource, generating and assigning different SAS tokens as secrets, generating virtual network service endpoints, exposing datastores with fine-grained common access, or using monitoring and alerts to detect and mitigate potential security threats. It is also possible to combine a few of the above techniques to achieve desired isolation of user access.

Previous articles: IaCResolutionsPart81.docx 

 

Friday, February 23, 2024

 

Shared workspaces and isolation

In a shared Azure Machine Learning workspace, achieving isolation of user access to datastores involves implementing a combination of access control mechanisms. This helps ensure that each user can only access the specific datastores they are authorized to use. Here are the key steps to achieve isolation of user access to datastores in a shared Azure Machine Learning workspace:

1.      Role-based Access Control (RBAC): Azure Machine Learning supports RBAC, which allows us to assign roles to users or groups at various levels of the workspace hierarchy. By properly configuring RBAC, we can control access to datastores and other resources within the workspace. For example:

Built-in role: AzureML Data Scientist Role

Custom-role: AzureML Data Scientist Datastore access role:

    Actions:

-        Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/actions

-        Microsoft.MachineLearningServices/workspaces/datastores/read

                 Data_actions:

-        Microsoft.MachineLearningServices/workspaces/datastores/write

-        Microsoft.MachineLearningServices/workspaces/datastores/delete:

                 not_actions:
                 not_data_actions:

2.      Azure Data Lake Storage (ADLS) Data Access Control: If we're using Azure Data Lake Storage Gen2 as a datastore, we can utilize its built-in access control mechanisms. This includes setting access control lists (ACLs) on directories and files, as well as defining access permissions for users and groups.

3.      Shared Access Signatures (SAS): Azure Blob Storage, another commonly used datastore, supports SAS. SAS allows us to generate a time-limited token that grants temporary access to specific containers or blobs. By using SAS, we can control access to data within the datastore on a per-user or per-session basis.

4.      Virtual Network Service Endpoints: To further isolate access to datastores, we can leverage Azure Virtual Network (VNet) Service Endpoints. By configuring service endpoints, we can ensure that datastores are accessible only from specific VNets, thereby restricting access from outside the network.

5.      Workspace-level Datastore Configuration: Within the Azure Machine Learning workspace, we can define multiple datastores and associate them with specific storage accounts or services. By carefully configuring each datastore's access control settings, we can enforce granular access controls and limit user access to specific datastores.

6.      Monitoring and Auditing: It's important to monitor and audit user access to datastores within the shared Azure Machine Learning workspace. Azure provides various monitoring and auditing tools, such as Azure Monitor and Azure Sentinel, which can help we track and analyze access patterns and detect any potential security threats or unauthorized access attempts.

By following these steps and implementing a combination of RBAC, access control mechanisms within datastores, and network-level isolation, we can achieve effective isolation of user access to datastores in a shared Azure Machine Learning workspace

 

Previous articles: IaCResolutionsPart81.docx 

Thursday, February 22, 2024

 

This is a summary of a book titled “How Successful engineers become great business leaders” written by Paul Rulkens and published by BEP, 2018. As an engineer who transitioned to becoming a boardroom advisor, he draws expertise from his experience and provides tips and valuable insights to others making a leap from technical to business domain. He proposes three power laws as framework and uses them as building blocks for “clarity, focus and execution” to achieve business goals. His tools leverage the pragmatism that engineers are trained and use towards excelling in business world that’s rightfully focused on revenue growth. He explains problem solving is central to both disciplines.
Nearly one in three Fortune 500 CEOs have an engineering background, and they can become business leaders by gaining non-engineering business experience or broadening their knowledge with additional education or training, such as obtaining an MBA. However, both career strategies can carry major downsides for engineers, such as the need for decades of hard work and cookie-cutter curricula. Engineers can make a smooth transition from the engineering side to the business side by carefully positioning themselves within their corporation or industry. To leverage their engineering talents and skills in business, engineers should embrace three "power laws": "prime location, prime time, and prime knowledge."
Prime location refers to where skills can have the greatest impact and gain the greatest recognition. Prime time, on the other hand, refers to when and how your skills can have the greatest impact. Prime knowledge, on the other hand, is the value of extra know-how that can have a multiplier effect on your business leadership career.
To achieve business goals, engineers should have clarity, focus, and execution. Achieving ambitious goals requires better skills and behaviors to solve different problems. In today's business world, corporate leaders, including engineers, should focus on revenue growth, strategic planning, innovative practices, and organizational performance. Engineers can excel in the business world due to their practical nature and ability to help organizations execute. However, developing effective execution cultures requires considerable planning, vision, and communication. Engineers can use storytelling and evocative language to encourage an execution culture and become models of attuned, disciplined, aware, and focused executive behavior. Regularly testing their developmental abilities and monitoring internal activities can help them become accomplished business leaders. Strategic quitting, a process where a company abandons a failed project, is essential when things don't work out as expected. Engineers are strategic problem solvers, making them perfect for executives who have obstacles to overcome and challenges to surmount. Control is essential in engineering training, but ambitious engineers need to let go of control and step into the unknown to achieve business success.
Engineers should identify their best fit for business and focus on their controllable talents and skills. Focus on higher-risk development activities, recognizing that larger goals require more obstacles. Expand their capabilities in reality-based thinking, process design, and accelerated learning. Improve leadership behavior by adopting new conduct and modeling it to employees. Build a referral network to secure new customers. Be mindful of biases and achieve strategic goals quickly with minimal resources and energy. As a business leader, consider available time, extra knowledge, business operations, employee rewards, legacy, and growth goals. Monitor progress, provide value to customers, and abandon dogmatic thinking. Embrace the importance of establishing a legacy, achieving one growth goal, and implementing single behaviors to achieve strategic goals.
Summarizing Software: SummarizerCodeSnippets.docx.

Tuesday, February 20, 2024

 

Databricks is a unified data analytics platform that combines big data processing, machine learning and collaborative analytics tools in a cloud-based environment. As a collaborative workspace to author data driven workflows, it is usually quick to be adopted in any organization and prone to staggering costs from aging. This article explains that it need not be so and instead leverage some best practices to reduce infrastructure costs.  This involves optimization and best practices.

One of the advantages of being a cloud resource is that Databricks workspaces can be spun up as many times and for as many purposes as needed. Given the large number of features, the mixed-use cases of data engineering and analytics, and diverse compute and storage intensive usages such as machine learning and ETL, some segregation of workloads to workspaces even within business divisions and workspace lifetimes is called for.

Usages of databricks for the purposes of leveraging Apache Spark, a powerful open-source distributed computing framework is significantly different from the usages involving Delta Lake, an open-source storage layer that brings ACID transactions over heterogeneous data sources. The former drives compute utilization costs and the latter drives Databricks-Units and network costs. Some of the problems encountered include:

Data skew with uneven distribution of data over partitions lead to bottlenecks and poor execution. This is often addressed by repartitioning using salting or bucketing.

Inefficient data formats increase overhead and prolong query completion. This is addressed with more efficient packing such as parquet file types that offer built-in compression, columnar storage, and predicate pushdown.

Inadequate caching leads to repeated disk access that is costly. Spark’s in-memory caching features can help speed up iterative algorithms.

Large shuffles lead to network congestion, latency, and slower execution. This can be resolved with broadcast joins, filtering data early and using partition aware operations.

Inefficient queries occur when the query parameters and hints are not fully leveraged. Predicate pushdowns, partition pruning and query rewrites can resolve these.

Suboptimal resource allocation occurs when CPU, memory or storage is constrained. Monitoring resource usage and adjusting resource limits accordingly mitigate this.

Garbage collection settings are not proper. Much like resources, these can be monitored and tuned.

Outdated versions and required bug fixes. These can be solved with patching and upgrades.

Similarly, the best practices can be enumerated as:

Turning off compute that are not in use and enabling auto-termination.

Sharing compute between different groups via consolidation at relevant scope and level.

Tracking costs against usages so that they can be better understood.

Auditing usages against users and principals to take corrective action.

Leveraging spot instances for compute that come with a discount.

Using photon acceleration that speeds up SQL queries and Spark SQL API.

Using built-in and custom mitigations for patterns of problems encountered at resource and component levels.

Lastly, turning off features that are not actively used and using appropriate features for their recommended use also help significantly.


Monday, February 19, 2024

 

One of the more recent additions to Azure resources has been Azure Machine Learning Studio. This is a managed machine learning environment that allows you to create, manage, and deploy ML models and applications. It is part of the Azure AI resource, which provides access to multiple Azure AI services with a single setup. Some of the features of Azure Machine Learning Studio resource are:

- It has a GUI-based integrated development environment for building machine learning workflows on Azure.

- It supports both no-code and code-first experiences for data science.

- It lets you use various tools and components for data preparation, feature engineering, model training, evaluation, and deployment.

- It enables you to collaborate with your team and share datasets, models, and projects.

- It allows you to configure and manage compute targets, security settings, and external connections.

When provisioning this resource for use by data scientists, it is important to consider the following best practices:

- The workspace itself must allow outbound connectivity to the public network. One of the ways to do this is to allow it to be accessible from all or selective public Ip addresses.

- The clusters must be provisioned with no node public Ip addresses. This is conforming to the well-known no public Ip addresses aka NPIP patterns. This is done by adding the compute to a subnet in a virtual network with service endpoints for Azure Storage, Key vault and container registry and default routing.

- Since the workspace and its dependent resources namely storage account, key vault, container registry and application insights are independently created, it is helpful to have the same user-assigned managed identity associated with them, which also makes it convenient to customize data plane access to other resources not part of this list such as a different storage account or key vault. The same goes for compute which can also be launched with this identity.

- Permissions granted to various roles on this resource can be customized to be further restricted since this is a shared workspace.

- Code that is executed by data scientists in this studio can be categorized as one of many such as regular interactive python notebook, Spark code, and non-interactive jobs.  Permissions necessary to run each of them must be independently tried out.

- There are various kernels and serverless spark compute available to execute the user-defined code in a notebook. The user-defined managed identity used to facilitate the data access for this code must have both control plane read access to perform actions such as getAccessControl and data plane operations such as blob data read and write. The logged-in user credentials are automatically used over this session created with the managed identity for the user to perform the data access.

- The non-interactive jobs require specific permission to submit  run within an experiment for any user.

Together, the built-in and customizations of this resource can immensely benefit the data scientists to train their models. Previous articles: IacResolutionsPart77.docx