Cluster computing

Monday, December 18, 2023

This is a summary of the book titled “The Wisdom of the BullFrog” – Leadership made simple (But Not Easy) written by Admiral William H.McRaven Grand Central, 2023. He is a former Navy SEAL and Commander of the US Special Operations Forces. He has served as the Chancellor of the University of Texas System. In his book, he keeps account of all the personal anecdotes that helped him to become “the Bull Frog”, the Navy’s longest tenured active-duty frogman and SEAL.

He believes admirable leaders demonstrate essential character traits, attitudes, and habits. Good communication, detailed planning, and a code of honorable conduct are obvious but hard to keep up. He suggests aspiring and incumbent leaders to be honest and fair in all their activities. To point out the opposite, he cites that unscrupulous people may achieve great material success, but their moral shortcomings often undermine their accomplishments.

Cultivating trust is a long-term project but it endears a committed following. Developing solid plans and keeping your promises are some of the ways to earn trust but showing that you care about them and value their contributions are a few others. Sailors in the Navy respect officers who lend a hand in the 120 degree boiler room, acknowledge their efforts and listen to them.

Good communication is essential up and down the chain of command. He brings an example from his experience where he was prolific about his vision for his teams’ values and goals as well as his intent. Communication is not only about the narrative but about listening to those addressed. By gathering feedback, grievances and just paying attention to interests, a leader can get a pulse on his team. Walkarounds, inspecting the facilities, and talking with soldiers are some of the ways he did this.

Being bold, confident, and proactive enables a leader to meet every challenge with an all-out effort. This will inspire everyone on his team to do the same. But just as this is important, it is equally important to lay stretch goals for the team. Setting the bar high and challenging the employees to clear it, will build grit and fortitude.

If a leader takes pride in the little jobs, people would think him/her worthy of bigger jobs. When he met his team leader, he was given tasks that were far from heroic but doing the best for those tasks helped him become a leader.

A leader rushes to the center of a crisis and takes charge of resolving it. The author cites narratives from history where the protagonist turned the tide of events by taking the war to their opponents. There is prudence involved because a good commander has a high tolerance for necessary risk but strives to reduce the risk accompanying a decision.

When in doubt, overload. Navy frogmen on Underwater Demolition Teams have a guideline for determining the volume of explosives they need to destroy any obstacles that impede an amphibious landing. Their rule is that if they don’t know how much to use, they use more.

Sunday, December 17, 2023

Applying MicrosoftML rxNeuralNet algorithm: 

While Logistic regression is used to model binary outcomes, the rxNeuralNet is a neural network implementation that helps with multi class classification and regression. It is helpful for applications say signature prediction, OCR, click prediction. A neural network is a weighted directed graph arranged in layers where the nodes in one layer are connected by a weighted edge to the nodes in another layer. This algorithm tries to adjust the weights on the graph edges based on the training data.

Logistic regression helps to detect root causes of payment errors. It uses statistical measures, is highly flexible, takes any kind of input and supports different analytical tasks. This regression folds the effects of extreme values and evaluates several factors that affect a pair of outcomes.  Regression is very useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction. Errors demonstrate elongated scatter plots in specific categories. Even when the errors come with different error details in the same category, they can be plotted with correlation. This technique is suitable for specific error categories from an account.  

Default detection rates can be boosted, and false positives can be reduced using real-time behavioral profiling as well as historical profiling. Big Data, commodity hardware and historical data going as far back as three years help with accuracy. This enables payment default detection to be almost as early as when it is committed. True real time processing implies stringent response times.

The algorithm for the least squares regression can be written as:  

1. Set the initial approximation   

2. For a set of successive increments or boosts each based on the preceding iterations, do  

3. Calculate the new residuals  

4. Find the line of search by aggregating and minimizing the residuals  

5. Perform the boost along the line of search  

6. Repeat 3,4,5 for each of 2. 

Conjugate gradient descent can be described with a given input matrix A, b, a starting value x, a number of iterations i-max and an error tolerance  epsilon < 1 in this way:

set I to 0       

set residual to b - Ax    

set search-direction to residual.   

And delta-new to the dot-product of residual-transposed.residual.   

Initialize delta-0 to delta-new   

while I < I-max and delta > epsilon^2 delta-0 do:    

    q = dot-product(A, search-direction)   

    alpha = delta-new / (search-direction-transposed. q)    

    x = x + alpha.search-direction   

    If I is divisible by 50    

        r = b - Ax    

    else    

        r = r - alpha.q    

    delta-old = delta-new   

    delta-new = dot-product(residual-transposed,residual)   

    Beta = delta-new/delta-old   

    Search-direction = residual + Beta. Search-direction   

   I = I + 1 

Sample application: 

#! /bin/python 
import numpy
import pandas
from microsoftml import rx_neural_network, rx_predict
from revoscalepy.etl.RxDataStep import rx_data_step
from microsoftml.datasets.datasets import get_dataset
iris = get_dataset("iris")
import sklearn

if sklearn.__version__ < "0.18":
from sklearn.cross_validation import train_test_split
else:
from sklearn.model_selection import train_test_split

irisdf = iris.as_df()
irisdf["Species"] = irisdf["Species"].astype("category")
data_train, data_test, y_train, y_test = train_test_split(irisdf, irisdf.Species)

model = rx_neural_network(
    formula=" Species ~ Sepal_Length + Sepal_Width + Petal_Length + Petal_Width ",
    method="multiClass",
    data=data_train)

# RuntimeError: The type (RxTextData) for file is not supported.
score_ds = rx_predict(model, data=data_test,
extra_vars_to_write=["Species", "Score"])

# Print the first five rows
print(rx_data_step(score_ds, number_rows_read=5))

CodingExercise-12-17-2023.docx

Saturday, December 16, 2023

Problem statement: Data and logic often exist as close to each other as possible. Take MySQL server, for instance, or any relational database server and there is convenience to query and process the data stored. In fact, SQL is not just a language but a standard for structured storage and data that does not fit in a spreadsheet continues to be saved in tables. Many SQL server instances were created on-premises and usually in a central department or company headquarters but with the move to cloud they became universally reachable and demanded more from authentication and authorization. The de facto directory in the cloud for any organization continues to be the active directory aka AD and most SQL logins saved in the MySQL server, are now replaced by users and groups in the AD so that they can authenticate as themselves. But the onboarding of every user continues to draw up chores that an administrator must take in the form of SQL statements such as for audit or assignment. How do we automate this?

Solution: Every runtime, host, and cloud, provide management features for authentication and authorization that determine principal, role, and permissions to allow the control to pass over these barriers successfully before data is accessed. Out of these, the inner most circle has the data and logins saved so any automation can be successfully applied there. One of the features of the MySQL runtime is that it supports triggers that can execute INSERT, REPLACE, UPDATE and DELETE. System events where a trigger can run include STARTUP, SHUTDOWN, SERVERERROR, LOGON, LOGOFF, CREATE, DROP and ALTER. MySQL provides hooks for init_connect, init_file, and init_slave. In the my.cnf file, we can add a SQL script file which can be executed on database startup (init_file). These three hooks can be used to build the LOGON and the STARTUP trigger.

The implementation of a logon trigger requires writing a procedure that will execute on each LOGON event.

-- DROP PROCEDURE test.logon_trigger;

DELIMITER //

CREATE PROCEDURE test.logon_trigger()

SQL SECURITY DEFINER

BEGIN

CREATE AADUSER CURRENT_USER()

flush privileges;

-- Create roles

CREATE ROLE 'ReadOnlyGroup';

-- Grant permissions to that role

GRANT SELECT ON testdb.* TO 'ReadOnlyGroup';

-- Assign roles to the AD Service Principal.

GRANT 'ReadOnlyGroup' TO CURRENT_USER()

END;

DELIMITER;

The CURRENT_USER() / CONNECTION_PROPERTY(‘user’) can be replaced with a caller supplied parameter.

The trigger can be tested with:

SHOW GLOBAL VARIABLES LIKE 'init%';

CALL test.logon_trigger;

When everything works fine, the hook must be connected to the procedure to build the trigger:

SET GLOBAL init_connect=’CALL test.logon_trigger()’;

A caveat here must be mentioned that just like all stored procedures, permissions must be granted to execute the logon trigger.

This can be done with:

GRANT EXECUTE TO PROCEDURE test.logon_trigger TO ‘*’;

The hook in the my.cnf file must be fixed otherwise it is not activated in the next system restart.

It is also possible to put the call procedure statement in a .sql file so that this script can be accessed from remote.

Finally, if the administrator centric duties must not be invoked as part of every logon event, the same trigger can be run as a procedure over a batch of user accounts belonging to the team as a one-time event and invoked specifically by the administrator on an adhoc or scheduled basis.

Automation with a cloud runbook that call azure cli commands on the target mysql instance is also possible and provides convenience to the end users but the SQL statements cannot be avoided and must be passed to the runbook in some form of script/file that can be run as part of the runbook. This makes logon triggers and such stored procedures convenient to encapsulate and run the statements.

Previous articles: https://1drv.ms/w/s!Ashlm-Nw-wnWhNNf6kdqJbe7IHHQEA?e=wqZK1g

Thursday, December 14, 2023

Applying MicrosoftML rxNeuralNet algorithm: 

The algorithm for the least squares regression can be written as:  

1. Set the initial approximation   

2. For a set of successive increments or boosts each based on the preceding iterations, do  

3. Calculate the new residuals  

4. Find the line of search by aggregating and minimizing the residuals  

5. Perform the boost along the line of search  

6. Repeat 3,4,5 for each of 2. 

Conjugate gradient descent can be described with a given input matrix A, b, a starting value x, a number of iterations i-max and an error tolerance  epsilon < 1 in this way:

set I to 0       

set residual to b - Ax    

set search-direction to residual.   

And delta-new to the dot-product of residual-transposed.residual.   

Initialize delta-0 to delta-new   

while I < I-max and delta > epsilon^2 delta-0 do:    

    q = dot-product(A, search-direction)   

    alpha = delta-new / (search-direction-transposed. q)    

    x = x + alpha.search-direction   

    If I is divisible by 50    

        r = b - Ax    

    else    

        r = r - alpha.q    

    delta-old = delta-new   

    delta-new = dot-product(residual-transposed,residual)   

    Beta = delta-new/delta-old   

    Search-direction = residual + Beta. Search-direction   

   I = I + 1 

Sample application: 

if sklearn.__version__ < "0.18":
from sklearn.cross_validation import train_test_split
else:
from sklearn.model_selection import train_test_split

irisdf = iris.as_df()
irisdf["Species"] = irisdf["Species"].astype("category")
data_train, data_test, y_train, y_test = train_test_split(irisdf, irisdf.Species)

model = rx_neural_network(
    formula=" Species ~ Sepal_Length + Sepal_Width + Petal_Length + Petal_Width ",
    method="multiClass",
    data=data_train)

# RuntimeError: The type (RxTextData) for file is not supported.
score_ds = rx_predict(model, data=data_test,
extra_vars_to_write=["Species", "Score"])

# Print the first five rows
print(rx_data_step(score_ds, number_rows_read=5))

Wednesday, December 13, 2023

The public and private plane network connectivity are complimentary to each other with public plane mostly permitted by public IP address assignment and IP access restriction rules while private plan consolidates and becomes the default path for traffic. By placing resources in a virtual network with subnets each for different address ranges, inbound and outbound access to those resources is secured. As resources are assigned to subnets, traffic can be contained within the virtual network. By providing inbound and outbound access only to the virtual network, resources eliminate their accessibility from the internet. This improves security for those resources, but some cannot do away with public IP connectivity between them. One way to provide internet access is with the use of a NAT Gateway that consolidates traffic and issues an outbound IP address conforming to an IP CIDR prefix that works well with IP ACLs. This is the equivalent of vnet integration in public plane. For example, a Layer-7 resource such as Azure application gateway or frontdoor could allow its frontend and backend to communicate via a NAT gateway and its access remains internal to resource.
Azure Network Address Translation (NAT) Gateway is a fully managed and highly resilient NAT service that provides outbound connectivity to the internet through the deployment of a NAT gateway resource. A NAT gateway can be used so that instances in a private subnet can connect to services outside the VPC, but external services cannot initiate a connection with those instances. A NAT gateway can be attached to multiple subnets within a virtual network to provide outbound connectivity to the internet. Some benefits of NAT include the reuse of private IP addresses, enhancing of security for private networks by keeping internal addressing private from the external network, and connecting many hosts to the global Internet using a smaller number of public (external) IP address, thereby conserving IP address space.
If the virtual network address space has multiple address ranges defined, Azure creates an individual route for each address range. Azure automatically routes traffic between subnets using the routes created for each address range. Gateways don’t need to be defined for Azure to route traffic between subnets. The system default route specifies the 0.0.0.0/0 address prefix. If Azure’s default routes are not overridden, Azure routes traffic for any address not specified by an address range within a virtual network to the Internet. There's one exception to this routing. If the destination address is for one of Azure's services, Azure routes the traffic directly to the service over Azure's backbone network, rather than routing the traffic to the Internet. Traffic between Azure services doesn't traverse the Internet, regardless of which Azure region the virtual network exists in, or which Azure region an instance of the Azure service is deployed in. Azure's default system route for the 0.0.0.0/0 address prefix can be overridden with a custom route.
An Azure NAT Gateway in a subnet can provide outbound connectivity for all private resources in that subnet. This includes traffic from other subnets within the same virtual network, but they must be associated. A NAT gateway can be attached to multiple subnets for outbound connectivity. It can be assigned up to 16 public IP addresses or a /28 size public IP prefix. It takes precedence over a load balancer with or without outbound rules. It becomes the next hop type for all internet destined traffic. A NAT gateway can't span beyond a single virtual network. It provides source network address translation (SNAT) for private instances within subnets of an Azure virtual network.

Codibg exercise:

https://1drv.ms/w/s!Ashlm-Nw-wnWhNQR3xNXIC3CG665Tg?e=jVlorS

Tuesday, December 12, 2023

This is a continuation of previous articles on IaC shortcomings and resolutions. One of the favored patterns is the shared nothing pattern that provides complete isolation, protection from cross-contamination, and improved scalability and security. Case in point is the app services to app service plan mapping.

If there is a one-to-one relationship between app services and their plans, then each app service plan provides independent scalability for the app service. The use of a subnet with the app service plan allows customizations for just that app service including its inbound and outbound connections.

The other pattern is a shared pattern. For example, a gateway might have more than a dozen app services as its backend members and each member might be allowing public access. If the gateway must consolidate access to all the app services, then there are changes required on the gateway to route traffic to each app service as intended by the client and a restriction on the app services to allow only private access from the gateway.

Between the application of these patterns, one of the factors determining the outcome is their purpose and utility. For example, a smaller subnet with only /28 suffix will have a smaller ip range and these can be issued as many as needed from the same virtual network. On the other hand, consolidation of app service plans across app services allows us to use just one subnet as outbound for traffic from all the app services because the VM is assigned only one NIC card This is favored when the set of app services belong to the same category and the traffic does not impact the performance of the app services.

An upgrade of all the subnets to the latest IaC definitions and the creation of new subnets to replace the non-functional subnets will be an ongoing maintenance. As the deployment grows with age, the versions of the toolset and definitions also change. New properties get added while the old ones get deprecated. Changing the definitions poses a risk across many resources and resource types that are hosted on the subnets, so they must be changed methodically.

The changes could be initiated progressively and in an incremental manner such that the round of tests is minimized and only applied to the changes made. The first step is to update the toolset before any changes are made. Then the existing definitions are assigned to a reference for their source before the new definition is used. If they did not have a reference, they would have pointed to the latest and updating the source to a new definition would affect more resources than intended. Next, the common definition for the subnet was updated, and a new tag was generated so that it could be referenced only from those callers that required the new definition. Once the reference is available, all the resources in the deployment that require the new definition are pointed to the new reference

Some app services might start out the same way but end up differently. The initialization of resources as well as their potential growth are important factors for the choice of patterns.

In this way, the choice of patterns must be decided on a case-by-case basis.

CodingExercise-12-12b-2023.docx