Cluster computing

Sunday, October 8, 2023

These are some more additions to the common errors faced during the authoring and deployment of Infrastructure-as-Code aka IaC artifacts along with their resolutions:

First, resources might pass the identifier of one to another by virtue of one being created before the other and in some cases, these identifiers might not exist during compile time. For example, the code that requires to assign an rbac based on the managed identity of another resource might not have it during compile time and only find it when it is created during execution time. The rbac IaC will require a principal _id for which the managed identity of the resource created is required. This might require two passes of the execution – one to generate the rbac principal id and another to generate the role assignment with that principal id.

The above works for newly created resources with two passes but it is still broken for existing resources that might not have an associated managed identity and the rbac IaC tries to apply a principal id when it is empty. In such cases, no matter how many times the role-assignment is applied, it will fail due to the incorrect principal id. In this case, the workaround is to check for the existence of the principal id before it is applied.

A second type of case occurs when the application requires ip address to be assigned for explaining the elaborate firewall rules required based on ip address value rather than references and the ip address is provisioned in the portal before the IaC is applied. This IaC then requires to import the existing pre-created ip address into the state so that the IaC and the state match.

Third, there may be objects in the Key Vault that were created as part of the prerequisites for the IaC deployment and now their ids need to be reconciled with the IaC. Again, the import of that resource into the state would help the IaC provider to reconcile the actual with the expected resource.

Fourth, the friendly names are often references to actual resources that may have long been dereferenced, orphaned, changed, expired, or even deleted. The friendly names, also called keys, are just references and hold value to the author in a particular context but the same author might not guarantee that the moniker is in fact consistently used unless there are some validations and review involved.

Fifth, there are always three stages between design and deploy of Infrastructure-as-code which are “init”, “plan” and “apply” and they are distinct. Success in one stage does not guarantee success in the other stage especially holding true between plan and apply stages. Another limitation is that the plan can be easily validated on the development machine but the apply stage can be performed only as part of pipeline jobs in commercial deployments. The workaround is to scope it down or target a different environment for applying.

Sixth, the ordering and sequence can only be partially manifested with corresponding attributes to explain dependencies between resources. Even if resources are self-descriptive, combination of resources must be carefully put-together by the system for a deterministic outcome.

These are only some of the articulations for the carefulness required for developing and deploying IaC.

Saturday, October 7, 2023

This is a summary of the book “How to say it for First-Time Managers” written by Jack Griffin and published by Prentice Hall Press, 2010. This book teaches winning words and strategies for earning your team’s confidence.

Managers must be able to communicate with their reports. If newly appointed managers can’t communicate their ideas, directions and instructions, the areas they supervise will fall apart. By paying attention to what needs to be said and how and when it needs to be said, this books provides invaluable advices to newbies. The author suggests the best words to use and those to avoid and even the body language that an inexperienced manager must adopt.

The language of leadership is both verbal and non-verbal. Effective leadership requires effective communication. The best posture is one that imparts a sense of relaxed energy. The eyes must be wide open during direct communications. Fidgeting or yawning must be avoided. Signaling an engagement by nodding or leaning forward is necessary. Eyes, ears, or nose must not be rubbed because they signal doubt. Similarly, scratching your head signals confusion. Smiling is very helpful.

Leadership language fluency helps new managers establish authority and credibility. The language of business concerns money and time. Words explain, motivate, encourage, discourage, inspire, depress, demand, invite, guide, mislead, clarify, confuse, hearten, and terrify. The author mentions ten touchstones for day-to-day communications which include 1. Accountability where someone is responsible for something, 2. Collaboration where teamwork is essential to business, 3. Decisions where conflicts are resolved and trade-offs are balanced, 4. Ethics for guarding against falls, 5. Evaluations for making value judgements, 6. Excellence – for leading the reports to high-quality work, 7. Learning to involve distilling knowledge from experience, 8. Mission – for a well defined sense of purpose, 9. Performance for continuous improvement, and 10. Quality for business that can succeed with excellence.

“Every Manager needs a useful, effective, and productive vocabulary.” Part of the vocabulary builds with “active listening” because by repeating what the other person says, co-operation is earned. Avoiding shaking the head that signals a rejection, keeping eye contact for the person to feel engaged, never lowering the chin because it signals defensiveness and avoiding or alleviating “rapid breathing” because it suggests anxiety, are some of the ways in which negatives can be balanced.

On the first day as a manager, always speak from knowledge, says the author and if there is doubt, not to say anything. Plan how to conduct the meetings, the preamble, body or the epilogue. Pausing before speaking can imply confidence and self-assuredness. Focusing on what one is going to say is mutually helpful to the speaker and the listener.

Clarity in written and spoken communication depends on speaking to the point and staying focused. The five W’s approach delineating who, what, when, where, and why can help in this regard. Using a step-by-step format in chronological order is much better than a long narrative. All rules, policies and procedures must be written out. Do not delegate work by starting out with a pep talk. Goals must be specified in the order where the intent is laid out, the benefits explained, the fit within the big picture, the reachability of the goal discussed, calling out the tasks that are necessary, delegating those tasks, and explaining what and when a task must be completed. Praise is much better than criticism for motivation but give it with a story. Supportive words include reset, overcome, self-starter, and retry. Negative responses must be provided with an explanation. Meetings must have agenda; it must never be a monologue and ideas must be requested. Ideas must also be examined.

Friday, October 6, 2023

Azure application gateway and app services are created for access from the public internet. When organizations want to take these resources private, they often struggle to maintain business continuity with their own network structures, rules and the limitations and errors when attempting to wire them together. This article explains how these resources can be effectively made private with little or no disruptions.

Both these resources are complicated with many features and configurations feasible. Even the networking section provides many choices under incoming and outgoing sections. Some of the encountered and dreaded errors are 403 and 502. Code hosted in the app service might find that they are able to connect to a store or event hub if they have vnet integration and they might want to have a private dedicated connection with another resource or network, yet when these options are added they have requirements different from one another. For example, to create a private endpoint, the private endpoint network policies must be disabled, the subnet must have no delegation and must have available IP addresses. Disabling the private endpoint network policies might be hard to find on the Management Portal User Interface when the endpoints are created, they must be associated with the privatelink.azurewebsites.net dns zone for them to be reached from other resources. Certain subnets cannot be used simply because they have a conflicting resource already placed there. The private endpoint and the vnet integration must not share the same network.

Consequently, the approach of taking a resource private requires the organization to pre-create subnets and even a DNS zone specifically for ‘privatelink.azurewebsites.net.’ Then the other resources must be connected to the app service. In the case of application gateway, it requires a DNS zone group to be created so that the application gateway can resolve the app services by their names. This step is often overlooked after the endpoints are created on the app services. Similarly private virtual links must be created.

It is in the interest of the deployment to create a single unified virtual network on which all the resources and their networks are placed. Often distinct virtual networks aka vnets result from independent initiatives, and they require peering or links to be established. The same is true when creating too many subnets because they exhaust the IP address ranges which are often underutilized. The connected devices to subnets have their IP address in the subnet’s CIDR and this information comes handy to know which subnets are unused and can be reused for other purposes. Once the subnet and vnet are created, then the options to add network security groups and gateways can be decided. The traffic from the virtual networks and subnets are hard to visualize but by enumerating the resources and their default route to the internet, it is possible to place the gateways appropriately. Otherwise those resources might not have outbound internet connectivity.

Finally, for the application gateway to be allowed access resources and networks as its backend pool members, its address must be allowed on all the access restrictions of those resources and networks.

A working example of this description is available here: network4apps.zip

Wednesday, October 4, 2023

This is a summary of the book “Think Bigger – How to Innovate” by Sheena Iyengar who is a professor of Business in the Management Division at Columbia Business School and teaches choice and decision-making.

This book builds on decades of research on creativity and human psychology and models the real-life creativity process in six specific and actionable steps. It provides a structure for a rigorous idea generation and vetting, from corporate teams to individual artists and entrepreneurs.

She argues that creativity is not a rare and innate gift. The popular distinction between left and right brained people is also incorrect. Creativity is also not a particular type of brain activity. When it is broken down, creativity appears familiar to everyone as building blocks. It is also a skill that we can learn and practice. The killer applications, groundbreaking artwork, disruptive business ideas are all the end results of the same process. Creators recycle existing parts to create something novel. “All thinking is an act of memory in some form.”

The Think Bigger process builds on Learning + Memory, the leading neuroscientific model of the brain. This theory places memory at the center of the human’s mental activity. It argues that even solving a math problem is not purely logical but involves remembering and recombining those memories to find the answer. Going to the point of attributing the quality of an idea to be proportionate to the memories stored on the shelves of the brain, it describes innovation as cognitive tools that we already possess.

Prior research have emphasized the following areas: personal qualities such as curiosity and persistence, workspace where an optimal space, with no distractions, still fosters casual connections with others, structure when people face too many options, and going solo when individuals produce more unique ideas alone than in a group. We can complete each step of Think Bigger on our own before discussing it with others.

Innovation starts by identifying a problem we are motivated to and can feasibly solve. Without a problem, there is a long list of creations that all failed. If we are struggling to define a problem, then taking daily notes may spark a sense of purpose. Phrasing a problem in terms of a question that begins with How is one of the classic ways of getting started with a problem.

With a problem, we can then break it down into parts that we can gather input from experts, potential users and non-experts. As these generate leads for thinking bigger, we move on to the next step when we have clarity over 80% of the problem space.

A good solution satisfies the requirements of the target audiences, the interest from the third-party stakeholders and the desires of the innovator. These three groups are essential for the solution and might warrant different ways of going about them. Articulating one’s own desires in writing while interviewing target audience and stakeholders, we build a list of three to five key wants for each group.

Next, we structure the solution by using a Choice Map and Big Picture score. “The best way to think outside the box is to literally go into other boxes.” We split the search for solutions to sub-problems in two areas: “in domain” and “out of domain”. When the choice map is filled out, we are ready to start combining tactics to find an overall solution.

Before committing to our idea, we must learn how others react to it. By explaining to others, we change, refine, or expand our idea. There are four feedback exercises.

The first is verbalization. Describing the idea to ourselves by reading and writing may be enough to change the way we see it. Describing it to others almost certainly will.

The second exercise gathers experts’ reactions. After describing the problem, the solution and its significance, we ask neutral questions like how we improve our idea.

The third exercise gauges whether others’ impressions of our idea align with our own. Asking non-experts to say it back to us but give it some time to check what they recollect.

The final exercise is to describe the solution again but giving our listeners free rein to reimagine our idea. Their answers will lead to further insights and possibilities.

Software for summarizing text: https://booksonsoftware.com/text/

Tuesday, October 3, 2023

This is a continuation of previous articles on Azure Databricks and Overwatch observability:

One of the frequent usages of Overwatch’s dashboard is to view trends and plots from the data collected. The dashboards that come from Overwatch provide a detailed set of charts under the Workspace, Clusters, Jobs, and Notebooks categories but the tables and custom SQL queries can empower creating new and advanced charts that suit specific business requirements. The following are some dimensions that a comprehensive dashboard for an organization’s databricks workspace monitoring must show, from a best practice perspective.

1. Databricks workload types:

- Jobs Compute for data engineers

- Jobs Light Compute for data analysts

- All Purpose Compute (backwards compatible to execute jobs)

2. Consumption based:

- DBUs

- Virtual Machines

- Public IP addresses

- Blob Storage

- Managed Disk

- Bandwidth

3. Pricing plans

- Pay as you go

- Reservations - DBU/DBCU 1/3 years

- dbu sku

- vm sku

- dbu count for each vm

- region

- duration

4. Tags based:

- Cluster Tags

- Pool Tags

- Workspace Tags

Tags can propagate with

a. clusters created from pools

- DBU Tag = Workspace Tag + Pool Tag + Cluster Tag

- VM Tag = Workspace Tag + Pool Tag

b. clusters not from pools

- DBU Tag = Workspace Tag + Cluster Tag

- VM Tag = Workspace Tag + Cluster Tag

5. Cost calculation:

Quantity = Number of Virtual Machines x Number of hours x DBU count

Effective Price = DBU price based on the SKU

Cost = Quantity x Effective Price

Effective Cost = Organizational markup factor * Cost

Cost/Usage Dashboard - get started in Azure Portal:

Cost Management + Billing

Cost Management + Cost analysis Tab

Cost/Usage Dashboard – get started in Dashboards on Databricks workspace hosting Overwatch:

Sample query:

select sku, isActive, any_value(contract_price) * count(*) as cost from overwatch.`dbucostdetails`

group by sku, isActive

having isActive = true;

sku isActive cost

jobsLight True 0.30000000000000004

interactive True 1.6500000000000001

sqlCompute True 0.66

automated True 0.30000000000000004

Monday, October 2, 2023

Sample customized queries for dashboard visualizations from Overwatch schema:

1. select sku, isActive, any_value(contract_price) * count(*) as cost from overwatch.`dbucostdetails`

group by sku, isActive

having isActive = true;

sku isActive cost

jobsLight True 0.30000000000000004

interactive True 1.6500000000000001

sqlCompute True 0.66

automated True 0.30000000000000004

2. SELECT created_by, count(*) FROM (SELECT DISTINCT cluster_id, created_by FROM overwatch.`cluster`)

GROUP BY created_by

ORDER BY count(*) desc

limit 1000;

created_by count(1)

JobsService 20051

User1 13

User2 13

User3 6

User4 3

User5 2

User6 1

3. SELECT cluster_id, SUM(uptime_in_state_S) as uptime FROM overwatch.clusterstatefact

GROUP BY cluster_id

ORDER BY uptime DESC

limit 1000;

cluster_id uptime

0822-134022-ssn7p7zy 2656586.3910000008

0909-211040-g7gw6ze 2655716.523000001

0914-142202-nx0u3s1a 2634530.8240000005

0907-170325-qf4ypd19 2611126.8639999996

0109-204324-dba1c5o 2602285.5589999994

0831-160354-2gds4r56 2601205.147000001

0728-171334-wqfvw8lm 2599745.636

1220-150950-1xfqwfeq 2533890.514

0828-204151-rqw3um2a 1986805.3609999998

0302-190420-h8rv9prn 1983515.9470000002

0803-144506-g98h4fl2 1975430.0520000001

0908-095703-w31xe9fb 1842740.3310000005

0917-185549-g4n3dqjl 1052153.248

0918-031805-t3zdjacw 1002694.213

4. SELECT created_by, sum(total_dbu_cost) as sum_dbu_cost FROM

(SELECT distinct cluster_id, job_id, created_by, terminal_state, total_dbu_cost from overwatch.jobruncostpotentialfact where terminal_state = "Succeeded")

GROUP BY created_by

HAVING created_by != 'null'

ORDER BY sum_dbu_cost desc

limit 1000;

created_by sum_dbu_cost

User1 253.60490000000007

User2 83.07065199999978

User3 80.84025400000019

User4 58.004314

User5 56.34171099999961

User6 49.40466399999997

User7 12.238729

User8 2.528845

User9 1.4531079999999597

User10 0.4258950000000001

User11 0.30644

User12 0.17414799999999972

Sunday, October 1, 2023

Network for applications

Both these resources are complicated with many features and configurations feasible. Even the networking section provides many choices under incoming and outgoing sections. Some of the encountered and dreaded errors are 403 and 502. Code hosted in the app service might find that they are able to connect to a store or event hub if they have vnet integration and they might want to have a private dedicated connection with another resource or network, yet when these options are added they have requirements different from one another. For example, to create a private endpoint, the private endpoint network policies must be disabled, the subnet must have no delegation and must have available IP addresses. Disabling the private endpoint network policies might be hard to find on the Management Portal User Interface When the endpoints are created, they must be associated with the privatelink.azurewebsites.net dns zone for them to be reached from other resources. Certain subnets cannot be used simply because they have a conflicting resource already placed there. The private endpoint and the vnet integration must not share the same network.

Consequently, the approach of taking a resource private requires the organization to pre-create subnets and even a DNS zone specifically for ‘privatelink.azurewebsites.net’. Then the other resources must be connected to the app service. In the case of application gateway, it requires a DNS zone group to be created so that the application gateway can resolve the app services by their names. This step is often overlooked after the endpoints are created on the app services Similarly private virtual links must be created.

It is in the interest of the deployment to create a single unified virtual network on which all the resources and their networks are placed. Often distinct virtual networks aka vnets result from independent initiatives, and they require peering or links to be established. The same is true when creating too many subnets because they exhaust the IP address ranges which are often underutilized. The connected devices to subnets have their IP address in the subnet’s CIDR and this information comes handy to know which subnets are unused and can be reused for other purposes. Once the subnet and vnet are created, then the options to add network security groups and gateways can be decided. The traffic from the virtual networks and subnets are hard to visualize but by enumerating the resources and their default route to the internet, it is possible to place the gateways appropriately Otherwise those resource might not have outbound internet connectivity.

A working example of this description is available here: network4apps.zip