Cluster computing: September 2023

Saturday, September 30, 2023

This is a continuation of a previous article on the use of Artificial Intelligence and Product Development. This article talks about the bias against AI as outlined in reputed journals.

A summary of the bias against AI is that some of it comes from inaccurate information from generative AI. Others come from the bias served up by the AI tools. These are overcome with a wider range of datasets. AI4ALL for instance, works to feed AI a broad range of content to be more inclusive of the world. Another concern has been over-reliance on AI. A straightforward way to resolve this is to balance the use of AI with those requiring skilled supervision.

The methodical approach to managing bias involves three steps: First, data and design must be decided. Second, outputs must be checked and third, problems must be monitored.

Complete fairness is impossible due in part to decision-making committees not being adequately diverse and choosing the acceptable threshold for fairness and determining whom to prioritize being challenging. This makes the blueprint for fairness in AI across the board for companies and situations to be daunting. An algorithm can check whether there is adequate representation or weighted threshold, and this is in common use but unless equal numbers of each class is included in the input data, these selection methods are mutually exclusive. The choice of approach is critical. Along with choosing the groups to protect, a company must determine what the most important issue is to mitigate. Differences could stem from the sizes of the group or accuracy rate between the groups. The choices might result in a decision tree where the decisions must align with the company policy.

Missteps remain common. Voice recognition, for example, can leverage AI to reroute sales calls but might be prone to failures with regional accents. In this case, fairness could be checked by creating a more diverse test group. The final algorithm and its fairness tests need to consider the whole population and not just those who made it past the early hurdles. Model designers must accept that data is imperfect.

The second step of checking outputs involves checking fairness by way of intersections and overlaps in data types. When companies have good intentions, there’s a danger that an ill-considered approach can do more harm than good. An algorithm that is deemed neutral can still result in disparate impact on different groups. One effective strategy is a two-model solution such as the generative adversarial networks approach. This is a balanced approach between the original model and a second model where one checks for individual’s fairness. They converge to produce a more appropriate and fair solution.

The third step is to create a feedback loop. Frequently examining the output and looking for suspicious patterns on an ongoing basis, especially where the input progresses with time, is important. Since bias goes unnoticed usually, this can catch it. A fully diverse outcome can look surprising, so people may reinforce bias when developing AI. This is evident in rare events where people may object to its occurrence and might not object if it fails to happen. A set of metrics such as precision and recall can be helpful. Predictive factors and error rates are affected. Ongoing monitoring can be rewarding. For example, demand forecasting by adapting to changes in data and correction in historical bias can show improved accuracy.

A conclusion is that bias may not be eliminated but it can be managed.

Friday, September 29, 2023

This is a summary of the book titled “The Power of Not Thinking” by Simon Roberts who is a business anthropologist and describes embodied knowledge that is not inculcated into Artificial Intelligence. Embodied knowledge derives from the body through the movement, muscle memory, sight, hearing, taste, smell, and touch. It includes experiences that evoke deep sensory memories that allow us to take actions without thought and pattern recognition. These embedded memories enable us to feel, rather than merely reason our way through many decisions. He makes a case for companies to pair data with experiential learning.

A long tradition has created a dichotomy between mind and body where thinking is part of the brain, but we learn in ways different from computers. He takes the example of driving and explains that we take the wheel, feel the road, engage both our body and brain and our common sense, and master it over time until we can engage in it on autopilot. AI on the other hand is dependent on sensors and recognizing patterns, processing them in milliseconds and responding immediately. Neither can cope with every driving situation, but the more experienced drivers can afford to do so automatically.

The idea that mind and body are different, also called the Cartesian dualism, regards the body as a thing that the mind operates. By dismissing senses and emotions as unreliable inputs, this worldview initiated the scientific method, experimentation, and evidence-based thinking. Human intellect is not merely a product of the brain but also the body’s engagement with the surroundings that forges comprehension of the world. Both the body and the brain gain knowledge. Experience and Routine helps us create embodied knowledge.

Embodied knowledge is acquired through the following five methods:

Observation – this is an experience involving the whole body where for example we feel the grip, hear the racket hitting the ball and trigger the same reactions in the brain and the body when we actually do so.

Practice – We learn to ride a bike by observing others ride. Acquiring new skills like skiing or sailing demands experience, practice, observation, and instruction. With more experience and practice, we can do the activity without thinking.

Improvisation - AI is still governed by supervised learning and big data. On the other hand, judgements based on incomplete information proves crucial. For example, firefighters learn to sense how structures will collapse because they can feel it.

Empathy – is about how another person uses a tool or navigates the world, go beyond reading about it or talking to them

Retention – when we taste or smell, memories flood the mind, demonstrating that recollection resides in the body as well as the brain.

Firms spend a lot to collect and crunch data but through experience, decision makers can better utilize the data. When leaders at Duracell wanted to understand their market for outdoor adventures, they pitched tents in the dark, cooked in the rain, and slept in a range of temperatures. This helped them pair their insights with the data analysis and the resulting campaign was one of the most successful. The author asserts that statistics can tell a relevant story, but they have limited ability to tell a nuanced human story. Policymakers just like business leaders can also benefit from this dual approach and the author provides examples for that as well.

Software developers are improving AI and robots by introducing state read from sequences and they have also found that AI that learns through trials and errors is also able to do better than some of the humans in the most complex games. At the same time, it is our embodiment that makes our intelligence hard to reproduce.

Thursday, September 28, 2023

This is a continuation of a previous article on the use of Artificial Intelligence and Product Development. This article talks about the bias against AI as outlined in reputed journals.

The methodical approach to managing bias involves three steps: First, data and design must be decided. Second, outputs must be checked and third, problems must be monitored.

A conclusion is that bias may not be eliminated but it can be managed.

Wednesday, September 27, 2023

This is a continuation of a previous article on AI for product development. Since marketing is one of the core influences on product development, this article reviews how AI is changing marketing and driving rapid business growth.

Marketers use AI to create product descriptions. Typically, this involves words and phrases that come from research on target audience but when the same is used by marketers over and over again, it can become repetitive. AI rephrasing tools can help teams find new ways of describing the most prominent features of their products.

Content marketers are often caught up in the task of creating more content but it’s equally important to optimize the content that’s already on the site. As content gets older, it becomes dated and less useful which brings down the SERP. When a particular URL is provided, AI can inform the keywords its ranking that URL for and which keywords need a boost. This helps marketers go further.

AI is most used in data analytics. Performance of various content types, campaigns and initiatives used to be time consuming just by virtue of sourcing it from various origins and the tools varied quite widely. Now teams empower themselves to quickly get and analyze the data in which they are interested. Business Intelligence teams continue to tackle complex data, but it is easier to get started with data analytics for most users.

AI can also help optimize marketing activities by providing insights into customer behavior and preferences, identifying trends and patterns, and automating processes such as content creation, customer segmentation and more. AI initiatives achieve better results and help the marketing strategy better connect with the customers.

Website building, personalized targeting, content optimization, or even chatbot assistance for customer support are some well-known areas for AI based enhancements. AI content generation can help accelerate content creation. Fact-checking information in the articles and ensuring that messaging and tone are aligned with the brand voice continue to require supervision.

The right tool for the right job adage holds truer than ever in the case of AI applications. Technology and infrastructure can evolve with business as it grows, and long-term investments certainly help with the establishment of practice. Text to Text and Text-to-Image generators are popularized by tools like ChatGPT and DALL-E 2. These make use of large language models, natural language processing, and artificial neural networks. The caveat here is that different tools are trained on different models. It is also possible to mix and match, for example using ChatGPT to create a prompt and then use the prompt with DALL-E 2 or Midjourney. Social media platforms like Facebook and Instagram offer ad targeting and audience insights. Email marketing platforms like Mailchimp provide AI powered recommendations for subject lines and send times.

Some of the bias against AI comes from inaccurate information from generative AI. Others come from the bias served up by the AI tools. These are overcome with a wider range of datasets. AI4ALL for instance, works to feed AI a broad range of content to be more inclusive of the world. Another concern has been over-reliance on AI. A straightforward way to resolve this is to balance the use of AI with those requiring skilled supervision.

Tuesday, September 26, 2023

Continued from previous post...

Third, AI can change how to collect customer feedback. A minimum viable product is nothing more than a good start and feedback loop with the target audience is essential to taking it to completion. Until recently, product analytics has been largely restricted to structured or numerical data. Notable and eminent AI experts argue that this is merely 20% of the data and that companies have the remaining as unstructured and in the form of documents, emails, and social media chatter. AI is incredibly good with analyzing large amounts of data and even benefits from being tuned with more training data. Compare this with focus groups that are not always accurate representations of customer sentiment, and this leaves the product team vulnerable to potentially creating a product that does not serve its customers well. These same experts also make a case for the generative AI to help convert customer feedback into data for business.

Fourth, AI can help with redefining the ways teams develop products. It involves how engineers and product managers interact with the software. In the past, professionals were trained in the use of software-products-suite to the point where they were designated experts who understood how each piece worked and imparted the same via training to others. With AI, new team members can be onboarded rapidly by letting the AI generate the necessary boilerplates or prefabricated units or provide a more interactive way of getting help on software and hardware tools. What used to be wire diagrams and prototyping can now be replaced with design examples with constraints provided to chatbots. The interface seems just as human as a chat interface, so nothing about the internals of machine learning needs to be known to those wishing to use the interface.

Finally, AI helps with creativity as well. Machine learning algorithms are already used to learn patterns of transforming inputs to outputs and then apply that pattern to unseen data. The new generative models can even take this process a step further by encoding state between the constant stream of inputs which not only helps to get a better understanding of such things as sentiments but also generate suitable output without necessarily understanding or interpreting each input unit of information. This is at the core of capturing how a software engineer creates software, a designer creates a design, or an artist creates an art.

By participating in the thinking behind the creation, AI is poised to extend the abilities of humans past their current restrictions. Terms like co-pilots are beginning to be used to describe this co-operative behavior and come to the aid of product managers, software engineers, and designers.

The ways in which AI and humans can improve each other towards the development of a product is a horizon filled with possibilities and some trends are already being embraced in the industry. Customer experience is shifting in favor of self-service with near human like experience via interactive chats and industrial applications that leveraged machine learning models are actively replacing their v1.0 models with generative v2.0 models. More interactive and engaging experiences in the form of recommendations, or spanning across content, products or frameworks are certainly being envisioned. By virtue of both the data and the analysis models, AI can not only improve but redefine the product development process.

Experimentation at various scopes and levels is one way to increase our understanding of the role AI can play and this is getting a lot easier to get started. It is even possible to delegate the knowledge of machine learning to tools that can work across programmatic interfaces regardless of the purpose or domain of the applications. Just as prioritizing the use cases were a way to improve the return on investment for a product, AI initiatives must also be deliberated to determine the high-value engagements. In similar fashion, leadership and stakeholder buy-ins are necessary to articulate the value addition in the bigger picture as well as to take questions to cast away any rumored concerns such as privacy and data leakages. When convincing the leadership for investments, the limitation of the role of AI to a trusted co-pilot is required. Lastly, the risks of not investing in AI could also be called out.

Monday, September 25, 2023

AI and Product development - Part 1.

This article focuses on the role of Artificial Intelligence in product development. Both in business and engineering, a new product development covers the complete process from concept to realization and introducing it to the market. There are many aspects and interdisciplinary endeavors to get a product and thereby a venture off the ground. A central aspect of this process is the product design, which involves various business considerations and is broadly described as the transformation of a market opportunity into a product available for sale. A product is meant to generate income and technological companies leverage innovation in a rapidly changing market. Cost, time, and quality are the main variables that drive customer needs. Business and technology professionals find the product-market fit as one of the most challenging aspects of starting a business and startups are often constrained to meet this long and expensive process. This is where Artificial Intelligence holds promises for startups and SMBs.

Since the product design involves predicting the right product to build and investing in prototypes, experimentation and testing, Artificial Intelligence can help us be smarter about navigating the product development course. Research studies cite that 35% of the SMBs and startups fail due to no market need. AI powered data analysis can help them to be more accurate with a well-rounded view of the quantitative and qualitative data to determine whether the product will meet customer needs or even whether the right audience has been selected in the first place. Collecting and analyzing are strengths of AI and in this case helps to connect with the customers at a deeper level. One such technique is often referred to as latent semantic analysis in AI which helps to articulate the real customers’ needs. Hidden matrix or latent semantic analysis or SoftMax classification was nearly unknown until 2013. The traditional way of creating software products, especially when it was technologically driven, attributed to the high failure rate. This is an opportunity to correct that.

Second, AI boosts the iteration and time to market cycles by plugging into the CI/CD pipelines and reports. Mockups and prototypes often take time in the range of a few weeks at the least as they overcome friction and unexplored territory. This is a fairly long period of time for all participants in the process to see the same outcome. The time and money spent to create and test a prototype could end up costing the initiative in the first place. If this period could be collapsed by virtue of better insights into what works and what doesn’t, reprioritizing efforts to realize the products, better aligning with a strategy that has more chance towards becoming successful, and avoiding avenues of waste or unsatisfactory returns, the net result is shorter and faster product innovation cycles.

One specific ability of AI is called to attention in this regard. The so-called Generative AI can create content from scratch with high speed and even accuracy. This ability is easily seen in the field of copywriting which can be considered a content production strategy. Only in copywriting, the goal is to convince the reader to take a specific action and achieve it with its persuasive character, using triggers to arouse readers’ interest, to generate conversations and sales. Copyrighting is also an essential part of digital marketing strategy with potential to increase brand awareness, generate higher-quality leads, and acquire new customers. Good copywriting articulates the brand’s messaging and image while tuning into the target audience. This is a process that has parallels to product development. AI has demonstrated the potential to generate content from scratch. The difference between content writing and copywriting remains with these product developers to fill.

Sunday, September 24, 2023

Azure managed instance for Apache Cassandra is an open-source NoSQL distributed database that is trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault tolerance on commodity hardware or cloud infrastructure makes it the perfect platform for mission critical data.

Azure managed instance for Apache Cassandra is a distributed database environment. This is a managed service that automates the deployment, management (patching and node health), and scaling of nodes within an Apache Cassandra cluster. It also provides the capability for hybrid clusters, so Apache Cassandra datacenters deployed in Azure can join an existing on-premises or third party hosted Cassandra ring. This service is deployed using Azure Virtual machine scale sets.

However, Cassandra is not limited to any one form of compute platform. For example, Kubernetes runs distributed applications and Cassandra and Kubernetes can be run together. One of the advantages is the use of containers and another is the interactive management of Cassandra from the command line. The Azure managed instance for Apache Cassandra is notorious for allowing limited form of connection and interactivity required to manage the Cassandra instance. Most of the Database administration options are limited to the Azure command line interface that takes the invoke-command option to pass the actual commands to the Cassandra instance. There is no native invocation of commands directly by reaching the IP address because the Azure Managed Instance for Apache Cassandra does not create nodes with public IP addresses, so to connect to a newly created Cassandra cluster, one will need to create another resource inside the VNet. This could be an application, or a Virtual Machine with Apache’s open-source query tool CSQLSH installed. The Azure Portal may also provide connection strings that have all the necessary credentials to connect with the instance using this tool. Native support for Cassandra is not limited to the nodetool and sstable commands that are permitted via the Azure CLI command options. CSQLSH is a command-line shell interface for interacting with Cassandra using CQL (Cassandra Query Language). It is shipped with every Cassandra package and can be found in the bin/ directory. It is implemented with the Python native protocol driver and connects to the single specified node, and this greatly reduces the overhead to manage the Cassandra control and data planes.

The use of containers is a blessing for developers to deploy applications in the cloud and Kubernetes helps with the container orchestration. Unlike managed Kubernetes instances in Azure that can allow a client to configure the .kubeconfig file with connection configuration using the az cli get-credentials and kubectl switch context commands, the Azure managed instance for Apache Cassandra does not come with the option to use kubectl commands. The use of containers helps with managing add or remove of nodes to the Cassandra cluster with the help of the cassandra.yaml file. It can be found in the /etc/cassandra folder within the node. One cannot access the node directly from the Azure managed instance for Cassandra so a shell prompt in the node is out of the question. The nodetool option to bootstrap is also not available via Invoke-Command but it is possible to edit this file. One of the most important properties of this application is the option to set seed-providers for existing datacenters. This option allows a new node to quickly become ready by importing all the necessary information from the existing datacenter. The seed provider must not be set to the new node but point to the existing node.

Cassandra service on a node must be stopped prior to the execution of some commands and restarted post execution. The database must also be set to read-write for certain commands to execute. These options can be set as command line parameters to the Azure Command-line interface for the managed-cassandra set of commands.

Saturday, September 23, 2023

This is a continuation of the previous articles on Azure Databricks usage and Overwatch analysis. While they talked about configuration and deployment of Overwatch, the data ingestion for analysis was taken to be the event hub which in turn collects it from the Azure Databricks resource. This article talks about the collection of the cluster logs and those from the logging and print instructions from the notebooks that run on the clusters.

The default cluster logs directory is the ‘dbfs:/cluster-logs’ and Databricks instance collects it every five minutes and archives every hour. The spark driver logs are saved in this directory. This location is managed by Databricks, and the cluster logs are saved in this directory in a sub-directory named after each cluster. When the cluster is created to attach a notebook to, the cluster’s logging destination is set to dbfs:/cluster-logs by the user under the advanced configuration section of the cluster creation parameters.

The policy under which the cluster gets created is also determined by the users. This policy could also be administered so that the users only create clusters compliant to a policy. In this policy, the logging destination option can be preset to a path like ‘dbfs:/cluster-logs.’ It can also be substituted with a path like ‘/mnt/externalstorageaccount/path/to/folder’ if a remote storage location is provided but it is preferable to use the built-in location.

The Azure Databricks instance will transmit cluster-logs along with all other opted-in logs to the event hub and for that it will require a diagnostic setting specifying the namespace and the EventHub to send to. Overwatch can read this EventHub data but reading from the dbfs:/cluster-logs location is not specified in the documentation.

There are a couple of ways to do that. First, the cluster log destination can be specified in the mapped-path-parameter in the Overwatch deployment csv, so that the deployment knows this additional location to read the data from. Although documentation suggests that the parameter was introduced to cover those workspaces that have more than fifty external storage accounts, it is possible to include just one that the overwatch needs to read from. This option is convenient for reading the default location but again the customers or the administrator must ensure that the clusters are created to send the logs to that location.

While the above works for new clusters, the second option works for both the new and the existing clusters in that a dedicated Databricks job is created to read cluster log locations and transmit to the location that the Overwatch reads from. This job would use the shell command of ‘rsync’ or ‘rclone’ and perform copying activity that can resume with intermittent network failures and indicate progress. When this job runs periodically, the clusters are unaffected and along with the Overwatch jobs, this job would run to make sure that all the relevant logs not covered by those streaming to the EventHub are also read by Overwatch.

Finally, the dashboards that report the analysis performed by Overwatch, which are also available out-of-the-box, can be scheduled to run nightly so that all the logs collected and analyzed are included periodically.

Friday, September 22, 2023

This is a continuation of the previous articles on Azure Databricks and Overwatch analysis. This section focuses on the role-based access control required for the setup and deployment of Overwatch.

The use of a storage account as a working directory for Overwatch implies that it will need to be accessed from the databricks workspace. There are two ways to do this – one that involves the azure active directory credentials passthrough with ‘abfss@container.storageaccount.dfs.core.windows.net’ name resolution and another that mounts the remote storage account as a folder on the local file system.

The former requires that the cluster be enabled for active directory credentials passthrough and will work for directly resolving the deployment and reports folder but for contents whose layout is dynamically determined, the resolution is expensive each time. The abfss scheme also fails with error 403 when there are tokens demanded for certain activities. Instead, the second way of mounting helps with one time setup. The mount is setup with the help of a service principal and getting OAuth tokens from the active directory. It becomes the prefix for all the temporary files and folders.

Using the credentials with the Azure Active Directory only works when there are corresponding role assignments and container/blob access control lists. The role assignment for the control plane differs from that of the data plane so there are roles for both. This separation of roles allows access to certain containers and blobs without necessarily allowing access to change the storage account and container organization or management. With acls applied to individual files/blobs and folders/container, the authentication-authorization-auditing is completely covered and scoped at the finest granularity.

Then queries like the following can come very helpful:

1. Frequent operations can be queried with:

StorageBlobLogs

| where TimeGenerated > ago(3d)

| summarize count() by OperationName

| sort by count_ desc

| render piechart

2. High latency operations can be queried with:

StorageBlobLogs

| where TimeGenerated > ago(3d)

| top 10 by DurationMs desc

| project TimeGenerated, OperationName, DurationMs, ServerLatencyMs, ClientLatencyMs = DurationMs – ServerLatencyMs

3. Operations causing the most error are caused by:

StorageBlobLogs

| where TimeGenerated > ago(3d) and StatusText !contains "Success"

| summarize count() by OperationName

| top 10 by count_ desc

4. Gives the number of read transactions and the number of bytes read on each container:

StorageBlobLogs

| where OperationName == "GetBlob"

| extend ContainerName = split(parse_url(Uri).Path, "/")[1]

| summarize ReadSize = sum(ResponseBodySize), ReadCount = count() by tostring(ContainerName)

Thursday, September 21, 2023

The following is a list of some errors and resolutions encountered with deploying Overwatch dashboards:

1. Overwatch dashboard fails with errors mentioning missing tables.

The databases that Overwatch needs are consumer database usually named overwatch and the ETL database, usually named overwatch_etl. These databases are deployed with the Overwatch notebook runners and there are two versions the 70 and 71. The latter version requires the storage account to be created and a csv to be uploaded to the deployment folder within the overwatch container or bucket in a public cloud storage account. The csv requires a mount location referred to as the storage prefix where all the files associated with the creation and use of database are kept. There are two files there, one each for overwatch consumer database and overwatch_etl database which persist the database outside the catalog of the databricks instance.

When the notebook runs, the tables are created within the catalog and the associated file on the storage account. Over sixty jobs are run to create these tables and eventually all the tables appear in the catalog. Due to the high number of jobs, failures are common and the tables are not all populated. Rerunning the notebook a few times, helps to close the gap towards a complete database.

2. Overwatch has mismatching files and/or database and must need to be redeployed but the starting point is not clean

Due to the versions of notebook used and the intermittent failures from executing any one, it is quite likely that a redeploy from a clean slate is required. Deleting just the persistence files from the storage account will not help because the catalog and the databricks instance might keep a mention of stale configuration. Although a cleanup script is available along with the Overwatch deployment notebooks, it is best to execute the following command for a speedy resolution:

DROP DATABASE overwatch_etl CASCADE;

DROP DATABASE overwatch CASCADE;

-- CLEAR CACHE;

This will delete the associated files from the storage account as well. It is also advisable that if the Overwatch is being upgraded even for a stale deployment, it could be followed up by recreating the storage account container and mounting it on the databricks cluster.

3. When the storage prefix refers to the remote location via the abfss@container.storage.dfs.core.windows.net naming scheme, frequently the unauthorized error displays.

Although mounts are deprecated and abfss is relatively newer than the mounts, creating a mount initially helps prevent repeated resolution for every name lookup. This can be done with the following script:

configs = {"fs.azure.account.auth.type": "OAuth",

"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",

"fs.azure.account.oauth2.client.id": "<application-id>,

"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-key>",key="<key-name>"),

"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}

#dbutils.fs.unmount("/mnt/overwatch-wd")

dbutils.fs.mount(

source = "abfss://container@storageaccountname.dfs.core.windows.net/",

mount_point = "/mnt/overwatch-wd",

extra_configs = configs)

dbutils.fs.ls("/mnt/overwatch-wd")

Wednesday, September 20, 2023

This is a continuation of the articles on infrastructure deployments. One of the popular instruments to exercise governance on Azure resources is the Azure Policy. The definition and the assignment constitute the policy. Assignments are used to define which resources are assigned to which policies or initiatives. The assignment can determine the values of the parameters for that group of resources at assignment time which makes the definition reusable with different needs for compliance.

Among the properties of the assignment, the enforcement mode stands out. This property provides customers the ability to test the outcome of a policy on existing resources without initiating the policy effect or triggering entries in the Azure Activity Log. It is also referred to as the “What If” scenario and aligns to safe deployment practices. When the mode is set to Enabled, the JSON value is ‘Default’ and the policy effect is enforced during resource creation or update. When the mode is set to Disabled, the JSON value is ‘DoNotEnforce’ policy effect is not enforced during resource creation or update. If the enforcement mode is not specified, the value ‘Default’ applies.

The scope of the assignment includes all child resource containers and child resources. If a child resource container or child resource should not have the definition applied, each can be excluded from the evaluation by setting notScopes which defaults to an empty array [].

The effects currently supported in a policy definition include Append, Audit, AuditIfNotExists, Deny, DenyAction, DeployIfNotExists, Disabled, Manual, Modify. When the policy definition effect is Modify, Append, or DeployIfNotExists, Policy alters the request or adds to it. When the policy definition effect is Audit or AuditIfNotExists, Policy causes an Activity log entry to be created for new and updated resources. And when the policy definition effect is Deny or Deny action, Policy stops the creation or alteration of the request. The effects must always be tried out. Validation of a policy ensures that the non-compliant resources are correctly reported and that false positives are excluded. The recommended approach to validating a new policy definition is by following these steps: tightly defining the policy, auditing existing resources, auditing new or updated resource requests, deploying policy to resources, and continuous monitoring.

A differentiation between Audit and AuditIfNotExists must be called out. Audit generates a warning event in the activity log if a related resource does not exist but does not fail the request. AuditIfNotExists generates a warning event in the activity log if a related resource does not exist. The If condition evaluates a field so a value must be provided for the name of the field. It references fields on the resources that are being evaluated.

Tuesday, September 19, 2023

Pure and mixed templates:

Infrastructure-as-a-code is a declarative paradigm that is a language for describing infrastructure and the state that it must achieve. The service that understands this language supports tags, RBAC, declarative syntax, locks, policies, and logs for the resources and their create, update, and delete operations which can be exposed via the command-line interface, scripts, web requests, and the user interface. Declarative style also helps to boost agility, productivity, and quality of work within the organizations. 

Template providers often go to great lengths to determine the convention, syntax and semantics that authors can use to describe the infrastructure to be setup. Many provide common forms of expressing infrastructure and equivalents that are similar across providers. Authors, however, rely on tools to import and export infrastructure. Consequently, they must mix and match templates.

One such template provider is AWS cloud’s CloudFormation. Terraform is the open-source equivalent that helps the users with the task of setting up and provisioning datacenter infrastructure independent of clouds. These cloud configuration files can be shared among team members, treated as code, edited, reviewed and versioned.

Terraform allows including Json and Yaml in the templates and state files using built-in functions called jsonencode and yamlencode respectively. With the tools to export templates in one of the two well-known forms, it becomes easy to import in Terraform with these two built-in functions. Terraform can also be used to read and export existing cloud infrastructure in its syntax but often they may be exported in ugly compressed hard-to-read format and these two built-in functions allow multi-line display of the same content which makes it more readable.

AWS CloudFormation has a certain appeal for being AWS native with a common language to model and provision AWS and third-party resources. It abstracts the nuances in managing AWS resources and their dependencies making it easier for creating and deleting resources in a predictable manner. It makes versioning and iterating of the infrastructure more accessible. It supports iterative testing as well as rollback.

Terraform’s appeal is that it can be used for multi-cloud deployment. For example, it deploys serverless functions with AWS Lambda, manage Microsoft Azure Active Directory resources, and provision a load balancer in Google cloud.

Both facilitate state management. With CloudFormation, users can perform drift detection on all of their assets and get notifications when something changes. It also determines dependencies and performs certain validations before a delete command is honored. Terraform stores the state of the infrastructure on the provisioning computer, or in a remote site in proprietary JSON which serves to describe and configure the resources. The state management is automatically done with no user involvement by CloudFormation whereas Terraform requires you to specify the remote store or fallback to local disk to save state.

Both have their unique ways for addressing flexibility for changing requirements. Terraform has modules which are containers for multiple resources that are used together and CloudFormation utilizes a system called “nested stacks” where templates can be called from within templates. A benefit of Terraform is increased flexibility over CloudFormation regarding modularity.

They also differ in how they handle configuration and parameters. Terraform uses provider specific data sources. The implementation is modular allowing data to be fetched and reused. CloudFormation uses up to 60 parameters per template that must be of a type that CloudFormation understands. They must be declared or retrieved from the System Manager parameter store and used within the template.
Both are powerful cloud infrastructure management tools, but one is more favorable for cloud-agnostic support. It also ties in very well with DevOps automations such as GitLab. Finally, having an abstraction over cloud lock-ins might also be beneficial to the organization in the long run.

Monday, September 18, 2023

Overwatch deployment issues and resolutions continued.

Issue #6) Dropping the database does not work

The workspace configuration for the etl database might be hardcoded and the file location for the database might still be lingering even if the db file and the database were dropped. There is a cleanup script available from Overwatch that explains the correct order and even comes with a dry run.

· Issue #7) There are stale entries for locations of the etl or consumer databases or there are intermittent errors when reading the data.

The location that was specified as a mount is only accessible by using a service account or a dbx connector. It is not using the same credentials as the logged in user. Access to the remote storage for the purposes of Overwatch must always maintain both the account and the access control. Switching between credentials will not help in this case. It is preferred that Overwatch continues to run with admin credentials while the data is accessed with the token for storage access.

· Issue #8) DB Name is not unique or the locations do not match.

The primordial date must be specified in the form yyyy-MM-dd although the Excel function saves the date in a different format and while this may appear consistent to the user, the error manifests in different forms with complaints mostly about name and location. Specifying this correctly, making sure the validations pass and the databases are correctly created, helps smoothen out the Overwatch operations.

Sunday, September 17, 2023

Drone Formation

Teaching a drone to fly is different from teaching a swarm of drones to fly. A central controller can issue a group command and when each of the drones execute the command, the formation flies. If the formation is unchanged, the group command is merely relayed across to the group members. The drone is one group for the purpose of relaying the same command. When the fleet changes formation, the command changes to individual members. Each unit moves from one position to another without colliding with one another.

The movement has as much degree of freedom as a particle. A drone is often represented as a volumetric pixel or voxel for short. An altimeter and a GPS co-ordinate are sufficient to let the unit maintain its position. When the group command is issued, the movement of the group is specified. Consensus algorithms help with the group behavior without worrying about the exact position of each unit in the group. The flight of any one unit can be written in the form of unicycle model with u1 as the velocity and the u2 as the change in the heading or the angle relative Cartesian co-ordinates. The term unicycle refers to the cosine and sine as the x and y axis displacements. Unicycle consensus algorithms can help the group achieve the intended formation.

One of the most used drone fleet navigations is the Simultaneous Location and Mapping algorithm which provides a framework within which the drones can plan their paths. A drone only needs to know its location, build or acquire a map of its surroundings, plan a path in terms of a series of positions if not the next linear displacement. Consensus helps to determine paths do not have conflicts. Without imminent collision, units can take their time to arrive at their final formation.

Conditions are not always ideal even for the most direct displacements. Wind and obstruction are some of the challenges encountered. A unit might not have the flexibility to move in any direction and must co-ordinate movement to its moving parts to achieve the intended effect. When the current position is hard to maintain and movement to the final position is off by external influence, the path can be included to modify positions to reduce the sum of squares of errors to arrive at the designated position. As a combination of external influence and internal drive to reduce the errors, the points along the alternate path can be determined. An obstruction to a linear displacement for a drone unit would then form a path with positions along a rough semi-circle around the obstruction.

This notion of depth estimation is another navigation technique where the unit’s sensors are enhanced to give a better reference for the surroundings to the unit and then the flight path is optimized. The term comes from the traditional techniques in image processing where it is used to refer to the task of measuring distance of each pixel relative to the camera. Depth is extracted from either monocular or stereo images. Mutli-view geometry helps find the relationships between images.

A cost function helps to minimize the error between the current and the final location which is not a predesignated one but an iterative transition state that is determined by a steepest gradient descent.

Saturday, September 16, 2023

Overwatch deployment issues and resolutions:

· Issue #1) Parameter names have changed

The ETL_STORAGE_PREFIX used to point to the location where the ETL Database and the consumer database were stored. However, since the underlying storage account is used for a wide variety of tasks including calculations and report generation, this has changed to STORAGE_PREFIX. Earlier, the value would typically be a dbfs file location or a /mnt/folder and this now allows values such as ‘abfss#container.storage_account’ convention for locating reports and deployment directories. The /mnt/folder is still the best route to go with Overwatch jobs although the use of mounts is being deprecated in databricks.

· Issue #2) Location migrations with different versions of the Overwatch deployment notebook

Occasionally, the 70 version of the Overwatch deployment notebook is run before the 71 version and even the location specified for storage prefix might change as users become aware of the different ways in which the notebook deploys the schema. They are both independent, but the first location reflects what the hive_metastore will show. Although the table names remain the same between the notebook versions, the version consistency between the notebook, the databases and the dashboards is still a requirement.

· Issue #3) Missing tables or the generic table or view not found error is encountered when using Overwatch

Even though the results from the notebook execution might appear to show that it was successful, there may be messages in there about the validations that were performed. A false value for any validation pass indicates that the database tables would not be as pristine as they would if all the rules were successful. Also, some of the executions do not create all the tables in the consumer database and therefore repeated runs of the deployment notebook are required whenever there are warnings or messages. If all warnings and errors are not removable, it is better to drop and recreate the databases.

· Issue #4) There are stale entries for locations of the etl or consumer databases or there are intermittent errors when reading the data.

· Issue #5) DB Name is not unique or the locations do not match.

Friday, September 15, 2023

This is a summary of a book titled “Win from Within: Build organizational culture for Competitive Advantage” written by James Heskett who is a professor emeritus of Business Logistics at the Harvard Business School. The book was published by Columbia Business School Publishing in 2022. It provides an applicable overview with concrete examples.

The book details 16 steps to change your culture on the premise that evidence does not support most of the common wisdom about organizational culture. An effective culture boosts the bottom line and fosters flexibility, innovation, and learning. Responsibility rests with the leaders to engage and retain employees and an organization’s policies must reflect its values. High-engagement workplaces share several crucial characteristics and experimentation improves your likelihood of success. There might be some challenges presented by remote work, but they are not insurmountable. The risk associated with good cultures going bad is that change becomes difficult.

A strong culture does not imply marketplace success and is not necessarily a winning asset. It could even be toxic. But leaders can shift the culture in a matter of months. The steps listed here are useful to everyone involved in managing organizations.

Culture and strategy are complementary. For example, Satya Nadella simultaneously healed Microsoft’s dysfunctional culture and led a major strategic shift from Windows to cloud computing. On the contrary, resisting new ideas assuming what worked in the past will continue to work, is one of the most common pitfalls.

An effective culture boosts the bottom line, and fosters flexibility, innovation, and learning. The competitive advantage of an effective culture can outlive that of any strategy. Organizations that put their employees first gained long-term market share and later rewarded their shareholders handsomely. Analysts can predict a company’s relative profitability by studying just the culture. There can even be a virtuous feedback loop between cultural changes and impact on profit. For example, Ritz Carlton vets the hirings thoroughly and empowers almost anyone to spend up to 2000$ to redress a guest’s problem. It emphasizes attitude and empathy.

Leaders must engage and retain employees and culture can be a tiebreaker in engaging talent. Organizations with effective culture can be tiebreakers but they could also be pressure cookers. Discontent stems from a lack of training and a lack of being acknowledged.

Companies known for highly engaged employees train their recruiters in employee engagement as a competitive advantage. They seek people with complementary viewpoints and empower them with the necessary skills. The US Marine Corps, the Mayo Clinic and Harvard Business School all have sustained high engagement beyond their founding generation and leverage a team-based structure to maintain the culture. Similarly, Southwest Airlines views the late departure as a team failure, not an individual one. This results in a top on-time record.

Experimentation is key to success.Booking.com authorizes any staffer to run a test without advance approval. Testing is taught and test evidence overrides executive judgment. Failed tests provide lessons. The author asserts that measurement without action is a great way to scuttle the success of a lot of effort that precedes it.

Sometimes, a toxic culture has devastating results. After two Boeing 737 MAX planes crashed, a whistleblower said management had rejected an engineer’s request for a safety measure. Employees feared retaliation for bringing problems to management’s attention. Similarly, the O-Ring failure destroyed the Challenger space shuttle, and the case of Volkswagen’s emissions-testing imbroglio is well-known.

Remote work presents cultural challenges and the best that the leaders of increasingly remote workforces can hope for may be hiring advantages and modest increases in productivity.

James Heskett lists the following steps to accomplish culture change:

1. Leaders acknowledge the need for culture change – Leaders must take note of the metrics and messages emerging from the “shadow culture.”

2. Use discontent with the status quo as a spur for change – Drastic steps might be needed to crystallize and alleviate the concerns people see with change.

3. Share the message of change – Communications must be ongoing, clear, and simple. Listen to the reactions. Repeat.

4. Designate a change team – A team can be tasked with cultural change codifying values, gathering input, meeting deadlines, and maintaining the impetus for change.

5. Install the best leaders – Bring the right people to the fore; tell the wrong people good-bye. Your goal is alignment around change.

6. Generate and maintain urgency – Culture change should take six to 12 months. As John Doerr said, “Time is the enemy of transformation.” Build in a sense of drive.

7. Draft a culture charter – by articulating what must change and how. For example, Microsoft spurred change to empower people “to achieve more.” Compare the current state to the desired future.

8. Promulgate a change statement that involves the whole organization – Communication is crucial. Gather comments; include or reject them; document the outcome.

9. Set up a “monitor team” – This team tracks relevant measurements, checks progress, and ensures that communication continues.

10. Align everything – Changes must align with corporate values. Reward what matters.

11. Put changes into motion – Leaders must walk the talk. McKinsey found that change is more than five times likelier when leaders act the way they want their employees to act.

12. Teach people at every level how to implement change – Training must be imparted.

13. Measure new behaviors – Align your metrics with your new expectations and handle troubles.

14. Acknowledge progress – Milestones are just as much reason to celebrate as the goal.

15. Give big changes time to unfold – Long range habits take time to reach the customer.

16. Keep reminding yourself what culture change requires – This is an ongoing evolution. Frequent check-ins with everyone on the team and recalibrations help.