Cluster computing

Monday, August 28, 2023

Azure managed instance for Apache Cassandra is an open-source NoSQL distributed database that is trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault tolerance on commodity hardware or cloud infrastructure makes it the perfect platform for mission critical data.

Azure managed instance for Apache Cassandra is a distributed database environment. This is a managed service that automates the deployment, management (patching and node health), and scaling of nodes within an Apache Cassandra cluster. It also provides the capability for hybrid clusters, so Apache Cassandra datacenters deployed in Azure can join an existing on-premises or third party hosted Cassandra ring. This service is deployed using Azure Virtual machine scale sets.

However, Cassandra is not limited to any one form of compute platform. For example, Kubernetes runs distributed applications and Cassandra and Kubernetes can be run together. One of the advantages is the use of containers and another is the interactive management of Cassandra from the command line. The Azure managed instance for Apache Cassandra is notorious for allowing limited form of connection and interactivity required to manage the Cassandra instance. Most of the Database administration options are limited to the Azure command line interface that takes the invoke-command option to pass the actual commands to the Cassandra instance. There is no native invocation of commands directly by reaching the IP address because the Azure Managed Instance for Apache Cassandra does not create nodes with public IP addresses, so to connect to a newly created Cassandra cluster, one will need to create another resource inside the VNet. This could be an application, or a Virtual Machine with Apache’s open-source query tool CSQLSH installed. The Azure Portal may also provide connection strings that have all the necessary credentials to connect with the instance using this tool. Native support for Cassandra is not limited to the nodetool and sstable commands that are permitted via the Azure CLI command options. CSQLSH is a command-line shell interface for interacting with Cassandra using CQL (Cassandra Query Language). It is shipped with every Cassandra package and can be found in the bin/ directory. It is implemented with the Python native protocol driver and connects to the single specified node, and this greatly reduces the overhead to manage the Cassandra control and data planes.

The use of containers is a blessing for developers to deploy applications in the cloud and Kubernetes helps with the container orchestration. Unlike managed Kubernetes instances in Azure that can allow a client to configure the .kubeconfig file with connection configuration using the az cli get-credentials and kubectl switch context commands, the Azure managed instance for Apache Cassandra does not come with the option to use kubectl commands. The use of containers helps with managing add or remove of nodes to the Cassandra cluster with the help of the cassandra.yaml file. It can be found in the /etc/cassandra folder within the node. One cannot access the node directly from the Azure managed instance for Cassandra so a shell prompt in the node is out of the question. The nodetool option to bootstrap is also not available via Invoke-Command but it is possible to edit this file. One of the most important properties of this application is the option to set seed-providers for existing datacenters. This option allows a new node to quickly become ready by importing all the necessary information from the existing datacenter. The seed provider must not be set to the new node but point to the existing node.

Cassandra service on a node must be stopped prior to the execution of some commands and restarted post execution. The database must also be set to read-write for certain commands to execute. These options can be set as command line parameters to the Azure Command-line interface for the managed-cassandra set of commands.

Sunday, August 27, 2023

This completes a set of three book summaries. This one is about the book “Viral Justice: How we grow the world we want?” by Ruha Benjamin. She is a professor of African American studies at Princeton University and is also the author of “Race After Technology” and “People’s Science”.

The author uses the term “Viral justice” in the context of promoting collective healing and unlearning dominant narratives. Systemic oppression such as sexism, classism, racism, ableism, and colonialism, operate like viruses. When the “privilege” of the status quo is maintained, this kills people and robs them of the material and social conditions they need to survive. It’s time to treat these societal “viruses” as signals that the status quo is no longer acceptable. When opportunities to dismantle these oppressive systems are actively sought and a more inclusive caring world is built, “viral justice” comes into play.

Systems might indeed be retractable, and the wronged person might be the only victim whose heart is broken, and the shattering might be both emotional and physiological but “viral justice” can be the rallying cry inviting others who desire change to join the individual. The first step in this direction requires us to unlearn patterns of behavior and thought that reinforce dominant narratives. The act of dreaming must be reclaimed and the promotion of collective good must be imagined.

Support networks must be built to weather the stress and physical damage caused by oppressive systems. The term “weathering” here is a public health concept that embodies the stress of living with oppressive systems. If the struggle to make ends meet is one of the principal causes of weathering, then viral justice is about creating social relations that are resuscitating instead of exhausting. Some examples illustrate weathering. Black teenage boys are more likely to die before the age of 65 than teenage boys in Bangladesh. The health of Latinx immigrants deteriorates each generation after their families arrive in the United States. Experiencing traumatic events, ages a person prematurely. There is protection needed from the negative impacts of weathering and this could include cultivating supportive relationships, committing to practices of healing and accountability, and building networks of solidarity.

One of the classic examples is punitive policing which must be replaced with community centered harm reduction policies. Police surveillance affects health of entire communities. Some feel “hunted”, and witnesses report acts of “licensed terror” such as pepper spraying homeless people’s sleeping bags to shooting unarmed civilians. “Viral justice” can be enacted by growing communities of care which does not mean police reform but rather everyday people relating to one another in life-affirming ways. Technology also plays a role. Some apps like GhettoTracker and NextDoor perpetuate systems of oppression, and this manifests as 240 million calls reported annually to 911 for suspicious activity viewing but this can be undone with a more empathetic approach.

Such examples are clearer with racism. For instance, teachers may fail to recognize Black students as gifted and talented, because their image of successful students is white. Researchers found that schools punish Black girls more often and severely for minor infractions – such as having “too much attitude” – than they punish their white female counterparts. A neutral example can be seen with “zero-tolerance” disciplinary approaches which damage students’ self-esteem and rob them of education and life opportunities. “Viral justice” in the educational system can be embraced by advocating reforms such as:

Replacing punitive actions with “restorative practices” where authorities display calm and loving presence.

Prioritizing recruiting and fostering diversity among teachers which can inspire students.

Updating the curriculum to include ethnic studies and Black history.

Hiring counselors to ensure the well-being of students and inviting police to walk the hallways.

Reimagining the place of work in our lives helps workers to thrive. It demands understanding that rest, like healthy foods, clean water, and fresh air, is essential. In a recession or pandemic, the rich could get richer while the poor could become poorer. Imagining a future where rich no longer devalue labor and redistributing wealth to ensure everyone has access to social and economic conditions necessary for living a flourishing life are ways to embrace “Viral justice”.

Similar prospect goes for healthcare institutions. For example, white babies are paying the price for anti-black racism from the time they are born and black babies more so. Institutions must make reparations to victims and their families.

Reimagining a better world as an individual can be broken down into the following steps:

Reflecting on one’s own biases and constantly envisioning a future that embraces all.

Taking micro actions that have a collective bigger impact.

Demonstrating inclusivity by creating spaces where everybody knows they are welcome and safe and influencing others to do the same in gamut such as housing, education, and transportation.

Live poetically to transform oppressive systems and embrace creative ways of thinking.

Saturday, August 26, 2023

One of the most common concerns with this resource is how to connect to it. Azure Managed Instance for Apache Cassandra does not create nodes with public IP addresses, so to connect to a newly created Cassandra cluster, one will need to create another resource inside the VNet. This could be an application, or a Virtual Machine with Apache’s open-source query tool CSQLSH installed. The Azure Portal may also provide connection strings that have all the necessary credentials to connect with the instance using this tool.

CSQLSH is a command-line shell interface for interacting with Cassandra using CQL (Cassandra Query Language). It is shipped with every Cassandra package and can be found in the bin/ directory. It is implemented with the Python native protocol driver and connects to the single specified node.

The configuration options for this tool are in the ~/.cassandra/cqlsh/.csqlshrc file. All CQL commands executed are written to a history file. The three essential operations for connecting to the Cassandra cluster are the database server’s host name or IP address, the correct connection port, and the username and password if using the authentication.

This would look something like this:

export SSL_VERSION=TLSv1_2

export SSL_VALIDATE=false

host=(“<IP>”)

initial_admin_password=”Password provided when creating the cluster”

csqlsh $host 9042 -u cassandra -p $Initial_admin_password --ssl

The az cli command for this resource type allows us to manage the cluster and the datacenters for the instance and most commands start with the az managed-cassandra prefix but they do not help with data plane operations for which the best bet is the CSQLSH once the connectivity is established.

The management operations in Azure Managed Instance for Apache Cassandra include compaction, patching, and maintenance. Out of these the nodetool utility is frequently used for repairs. The nodetool repair is automatically run by the service called reaper. Nodetool repairs one or more tables and performing an anti-entropy node repair on a regular basis helps with the maintenance.

The azure cli command provides a way to invoke the nodetool with the invoke-command option for an instance.

Friday, August 25, 2023

Sequences are excellent source of information that are usually not self-contained in the discrete units of an input stream such as words in a text, symbols in a language, or images in a video, yet they are under-utilized in many machine learning scenarios that have done so much in enhancing the information within the unit by means of features, coming up with various relative distance metrics or finding their relative co-occurrence similarities with classifications. This article explorer conventional and futuristic usages of sequences.

The inherent benefit of the sequence is that it is captured in the form of state that is independent of the units themselves. This powerful concept allows us to work with all kinds of input units be it words, symbols, images, or any other code. The conventional way to work with sequences belongs to a family of neural networks that is steeped in shredding data. It encodes the sequences and later decodes it to form a different output sequence. These recursive neural networks aka RNNs use this state as the essence of the sequence which is almost independent of the forms of the units comprising the sequence and infer the meaning of those units without knowing what they are. The original RNN proposed by Bahdanau et al in 2014 could be used with different kinds of decoder that resulted in different outputs but the sequences remained fixed in size and the state was accrued in a batch manner. In the future, if it could be possible to build one state in an aggregated manner that continuously evolved by leveraging growing size of the input stream from start to finish, that state is likely going to be a better representation of the overall import than ever before. The difference is in building sequences as records in table that are distinct from one another versus enriching the state in a streaming manner. The same state continually updates for each unit one at a time.

TensorFlow is a convenient library to write RNN. As with all machine learning models, at least 80% data is used for training and 20% used to test/predict. The model can be developed on high-performance computing servers and later exported to be used on low-resource-usage devices and clients. The model can be tuned with continuous feedback and its releases versioned.

Let us take an example of predicting the next word from a passage. This goal is particularly suited to the conventional RNNs because a sequence of three words at a time and one labeled symbol will make the neural network predict the next symbol correctly. The model can only understand real numbers so a way to convert a symbol to a number is to assign a unique integer to each symbol based on the frequency of occurrence. The frequency table and a reverse dictionary help to articulate the next symbol.

As with any Softmax classifier used with neural networks, each symbol is associated with a vector of probabilities. The highest probability encountered can then be used towards finding the index in the reverse dictionary for determining the prediction.

Using TensorFlow, this is written as:

def RNN(x, weights, biases):

x = tf.shape(x, [-1, n_input])

x = tf.split(x, n_input, 1)

rnn_cell = rnn.BasicLSTMCell(n_hidden)

outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

return tf.matmul(outputs[-1], weights[‘out’]) + biases[‘out’]

The streaming form of RNN would use a summation form to continuously update the state.

Thursday, August 24, 2023

This is a summary of the book “The Devil never stops: Learning to live in an age of disasters” by Juliette Kayyem which was published in 2022. She is a specialist in crisis management, disaster response and homeland security, serves on the faculty at Harvard’s Kennedy school of Government and is Faculty Chair of the Homeland Security project. She is also a national security analyst for CNN and the author of “Security Mom: An unclassified guide to protecting our homeland and your home.”

She proposes that disasters aren’t anomalies. Planners should assume disasters will occur and people need “situational awareness” to respond effectively to disasters, especially those that repeat. As part of the preparation and response process to disasters, all leaders must be on the same page and a plan for “managed retreat” must be high up in the top choices for response. Controlling the losses, stopping the hemorrhaging are some of the options that also need to be considered. When conditions don’t remain the same over time, a static plan does not help, and the response must be dynamically modified. People tend to disregard near disasters rather than recognizing them as warning signs. History has valuable lessons especially when it comes to fatalities and response can be better articulated with this kind of insight.

There’s a wide variety of phenomena that the general public is already aware of, and these include natural calamities such as hurricanes, tsunamis and earthquakes as well as man-made ones such as financial meltdown, pandemic, war, and cyberattacks. Leaders of all demographics can find universal lessons to their advantage when disruptions occur so that they can ‘fail safer’. People who study disasters tend to divide their duration into two phases – before and after the disaster and the interim is often referred to as the boom. The time before the disaster is an opportunity to adopt measures that will prevent calamities. In the phase after the disaster, people attempt to recuperate from its consequences. Disasters might not be completely avoided but disaster managers can focus on what happens after a disaster to respond and rebuild effectively. In fact, the author asserts that disaster will strike. They should prepare for all hazards as a worst-case scenario.

For example, Boeing 737 max planes crashed, killing 346 people and the crashes resulted from a design error that limited the pilot’s ability to control the planes. A whistleblower cited that the flaw was brought to the attention of management who downplayed it. Even the first disaster was explained away as a rare occurrence and Boeing executives didn’t seriously consider the possibility of such disasters. The second disaster was waiting to happen. A disaster management response in the public or private sector, should include an organizing principle, such as the “Incident Command System” or a hierarchical system that extends from a Public Information Officer and Safety officer to teams for planning, planning, and finance. Some might refer to it as the war room.

As the disaster unfolds, the public must be made aware of the rollout status so that they know when and what is going on. A method for gathering real-time information is necessary. Situational awareness involves keeping a record of what happened, indexed to time, place and location. An SA template includes “perception”, “comprehension”, and “projection”. One of the recurring failures in organizational responses to disasters is how key players find it difficult to understand events as they play out in real time. Gathering a lot of details and furiously removing the noise are key activities for situational awareness. The author cites an example of better awareness as one where the San Francisco mayor notices that the members of the city’s Asian and Asian-American community who have strong ties to the epicenter of the pandemic weren’t attending Lunar New Year events in expected numbers and went ahead to institute social distancing protocol and stay at home orders long before most other mayors did.

Disaster response demands a consolidated strategy and purpose. Empowering social workers to perform at the highest possible level is an advantage. Unfortunately, many institutions approach to security architecture tends to fragment that architecture into different, specialized silos, which impedes unified action. Poor “governance structures” rather than straightforward ignorance exacerbates disasters. The oversimplifying action to reduce safety and security to “gates, guards and guns” and focusing more on buying equipment rather than on setting up effective processes, doesn’t help. Disaster’s consequences and its negative impact can be better managed when people learn to “fail safer”.

A “managed retreat” is sometimes referred to as a backup plan. When British Petroleum built its Deepwater Horizon oil drilling rig in the Gulf of Mexico, there were assurances that a blowout preventer would shut down the rig in the case of a spill. In April 2010, the rig exploded, and the blowout preventer failed resulting in one of the worst oil spills and BP did not have a backup plan. The blowout preventer was also the last line of defense.

A more systemic approach enables us to mitigate negative consequences or in the event of a disaster, render the consequences less awful. There is a literal example for this. The American military began encountering “improvised explosive devices” in Afghanistan and Iraq and the main threat to mortality was that the victims bleed to death. The response included minimizing damage with amputated limbs, transfers to better hospitals, training every soldier as a field medic, developing better tourniquets and blood clotting foam. One Pentagon study found that this response saved the soldiers and their limbs some 90% of the time.

The author asserts that disasters are simply no longer random and rare and that’s where the adage, the devil never sleeps, comes from. Since conditions deteriorate over time, responses must also be dynamic and not remain locked in static plans. In June 2021, a condominium tower in Surfside in South Florida, collapsed suddenly killing 100 residents. A consulting firm had warned about structural conditions several years earlier, but the responses were put off for want of budget. Safety and security systems are designed based on conditions that existed when the structure is built but conditions don’t remain constant.

Even big companies such as Apple fail to read the near misses such as when iPhone 4 was launched in 2010, it dropped calls and interrupted people’s messages. Steve Jobs and Apple absurdly blamed the customers, and no one complained. This kind of “normalizing deviance” can be quite dangerous.

History teaches valuable lessons and one of the lessons that stands out is that a perfectly managed crisis is an oxymoron. A dozen hurricanes made landfall in the United States in 2020 and the response to Hurricane Laura in Louisiana resulted in only 28 deaths who mostly died from carbon monoxide poisoning because of unsafe generators when the electrical grid went down.

Crisis managers must pay attention to what happened and prepare for the next disaster accordingly.

Wednesday, August 23, 2023

This is a summary of the book “How the other half eats? The untold story of food and inequality in America” written by Priya Fielding Singh PhD who is a sociologist at Stanford University. She studies the societal factors that influence people’s health.

There are prevalent assumptions about eating that grossly misunderstand the dietary choices in America. There is societal pressure to be a “good mom” which dictates family dietary choices. The food industry pushes junk food to ease mothers' guilt. Gendered expectations create further frustrations for mothers trying to uphold healthy eating habits. Lack of time and resources often leads to unhealthy dietary compromises. Emotional stress and misguided blame affect diets across the income spectrum.

The author makes recommendations for both mothers without resources who must be prudent to buy the right foods and those who can buy healthful food but who think the choices are not good enough. Her research targets diverse families and shows that Americans’ dietary choices have little to do with personal discipline and, instead, mainly involve family budgets and societal pressures. Personal desires – whether to be a perfect mom or to alleviate the weight of poverty – shape how Americans eat.

The American diet is overwhelmingly unhealthy. The US Department of Agriculture agrees with most nutritionists that a healthy diet is made up of fresh fruits, vegetables, low fat dairy, whole grains, and lean proteins. Most Americans don’t eat this way. The Americans who suffer the most from diets lacking in nutritional value are low-income families of color. They often eat too much sugar and too many processed foods and fatty meats, leading to higher rates of diabetes and heart problems, as well as earlier deaths than more affluent people.

As the disparity between rich and the poor widens, some political figures, such as Michelle Obama, have sought to mitigate some of the causes behind this issue. However, those efforts operate on two assumptions about why some Americans eat unhealthily. First, low-income families can’t afford healthier foods and second, low-income families don’t have physical access to grocery stores that sell healthy foods.

The second assumption is false. For example, The Healthy Food Financing Initiative invested more than $650 million dollars in building supermarkets in communities that lacked nearby grocery stores. Yet, making healthful food more available brought about little or no dietary changes within low-income communities. The author asserts that geographical access was not a contributing factor to dietary choices. Most people have cars and don’t mind traveling to get the food they want.

A mother who makes ends meet lacks the resources to take her kids out for fun activities, such as visiting a water park. Her lack of financial security impedes her ability to provide for her children. She constantly denies her daughters’ requests for new clothes, electronics, or toys. This makes her feel guilty and leaves her wondering if she’s a terrible mother. However, she can say yes to junk food because it’s cheap. Buying her daughters powdered donuts or a bag of Doritos puts smiles on their faces and is often the only thing she can do to ease the hardship of poverty.

On the opposite end of the economic spectrum, an affluent mother often says no to her kid’s junk food requests. However, she can say yes to most of their other requests. She can provide her children with private school, concert tickets, summer camp and consistent, healthy dietary choices.

Intensive mothering dooms moms to feelings of inadequacy and the sense that they never do enough — that they never are enough. This behavior creates a racial and economic inequality gap concerning who gets to be a good mother. Gold standard mothering now means giving your kids every opportunity to grow and learn, buying them whatever they need to thrive and providing them with nutritious food. By those unfair criteria, only the financially secure can afford to be good moms.

The food industry pushes junk food to ease mothers’ guilt. Because many low-income Americans are people of color, food choices may also reflect racial inequalities. Americans often associate childhood obesity with being Black or Hispanic – and often blame mothers instead of scrutinizing the food industry’s practices. The author states that the dads she met did not need to devote themselves to feeding their kids to feel like they were good dads.

Single mothers who work labor-intensive jobs have greater difficulty making healthy choices. Lack of time is an issue for most working parents across economic brackets. They often face long hours and long commutes, leaving them with less time to shop for food, cook or clean. Mothers often feel they must choose between spending quality time with their kids or cooking a healthy meal. This is also true for moms who are somewhat better off, though some wealthier moms can afford to hire household help to compensate for their lack of parenting time.

The author says that as moms, we deserve to live in a society built of infinitely more empathy, appreciation, and support.” The narrative of blaming mothers will never fix these issues. The government should hold employers and corporations responsible.

Reducing Trials and Errors

Model: When trials and errors are scattered in their results, an objective function that can measure the cost or benefit will help with convergence. If the samples are large, a batch analysis mode is recommended. The approach to minimize or maximize the objective function is also possible via gradient descent methods but the use of simulated annealing can overcome local minimum even if the cost is higher because it will accept with a certain probability. In Simulated annealing, the current cost is computed, and the new cost is based on the direction of change. If the cost improves, the temperature decreases.
Sample implementation follows:

def annealingoptimize(domain,costf,T=10000.0,cool=0.95,step=1):
     # Initialize the values randomly
     vec=[float(random.randint(domain[i][0],domain[i][1]))
          for i in range(len(domain))]
     while T>0.1:
          # Choose one of the indices
          i=random.randint(0,len(domain)-1)
          # Choose a direction to change it
          dir=random.randint(-step,step)
          # Create a new list with one of the values changed
          vecb=vec[:]
          vecb[i]+=dir
          if vecb[i]<domain[i][0]: vecb[i]=domain[i][0]
          elif vecb[i]>domain[i][1]: vecb[i]=domain[i][1]

          # Calculate the current cost and the new cost
          ea=costf(vec)
          eb=costf(vecb)
          p=pow(math.e,(-eb-ea)/T)
          # Is it better, or does it make the probability
          # cutoff?
          if(eb<ea or random.random( )<p):
               vec=vecb
          # Decrease the temperature
          T=T*cool
     return vec