Sunday, June 30, 2024

 #codingexercise 

Get the max sum level in a binary tree

int GetMaxLevel(Node root, Node delimiter)

{

List<Integer> nodes = new ArrayList<>() :

if (root == null) return -1;

var q = new Queue<Node>();

q.Enqueue(root);

q.Enqueue(delimiter);

Nodes.add(root):

Nodes.add(delimiter);

var node = q.Dequeue();

while (node)

{

if (node == delimiter) {q.Enqueue(delimiter); nodes.add(delimiter);}

else{

if (node.left) {

      q.Enqueue(node.left) ;

      nodes.add(node.left) :

if (node.right) 

      q.Enqueue(node.right);

      nodes.add(node.right);

}

node = q.Dequeue();

}

var levels = nodes.split(delimiter);

int max  = INT_MIN;

int result =  0;

for (int level = 0; level < levels.Count; level++)

   if (levels[level].Sum > max)

          result = level;

return result;

}

     5

1 3 6 8

 2.    7

Result: 1


Saturday, June 29, 2024

 This is the summary of a book titled: “Becoming a changemaker” written by Alex Budak in 2022. Changemakers look optimistically towards the possibilities of future and empower themselves to lead the change. We must stop waiting for permission to do so in these times when changes are happening all around us and at a rapid pace. This book helps us assess our strengths and weaknesses to do so.

Changemakers are individuals who disrupt the status quo and identify opportunities at the intersection of disciplines. They are not afraid to challenge the status quo, take smart risks, and combine multiple perspectives to find the best solutions. They embrace hope and take action to create a brighter future. Changemakers cultivate "learned optimism," which means navigating challenges with the hope of creating a brighter future. They compartmentalize setbacks and recognize that adversity is temporary. To become a changemaker, one must commit to becoming one themselves, seek and collaborate with other changemakers, and assist others on their journeys. To assess their progress, one can take the Changemaker Index self-assessment annually, which measures five dimensions: "Changemaker awareness," "Changemaker mindset," "Changemaker leadership," "Changemaker action," and "Changemaker effectiveness." By assessing these dimensions, one can measure their progress and contribute to a broader impact on society.

To foster a changemaker's mindset, become a humble, flexible servant leader who is not afraid to fail. Humility is a significant strength, as it leads to less employee turnover, greater satisfaction, diverse managers, and larger profit margins. Humility is also less inclined to believe fake news, deal with uncertainty, and admit mistakes. To be a changemaker, embrace failure and prioritize the interests of those served above one's own. Develop a broad vision and be patient in achieving change, aiming to serve a compelling picture of the change over the next few decades.

To become a successful changemaker, one must have the courage to take action, even when feeling vulnerable. The "changemaker impact equation" helps determine the necessity of action by multiplying actions by leadership and mindset. Develop the art of agency, which allows for frustration and hopelessness while knowing that even inaction is an action. Take small steps to get your project rolling and view obstacles with a fresh perspective. Channel your courage into action by finding people who believe in your message and act as early champions. Break down your goals into manageable blocks using the Changemaker Canvas.


For example, write a concise, clear vision for your desired change, identify your core problem, and understand the Four S's of change. Test your ideas using the lean start-up model and plan strategies to ensure resilience. Collaborate with collaborators, such as "doers" and "evangelists," and embrace the appropriate mindset and leadership approach. Put everything learned into action by leveraging strengths like optimism and the desire to serve.

Previous book summary: BookSummary114.docx

Summarizing Software: SummarizerCodeSnippets.docx    

 


Friday, June 28, 2024

 This is a continuation of IaC shortcomings and resolutions. In Azure, a storage account without private endpoints can be accessed by compute resources that do not have public IP addresses through the use of Azure's internal networking capabilities. Here's how it works:

1. Virtual Network (VNet): Both the storage account and the compute resources reside within an Azure VNet, which is a private network within Azure.

2. Service Endpoints: While private endpoints are not used, we can enable service endpoints for Azure Storage within the VNet. This allows us to secure our storage account so that it can only be accessed from specific subnets within the VNet.

3. Network Security Groups (NSGs): NSGs are used to control inbound and outbound traffic to network interfaces (NIC), VMs, and subnets. We can configure NSGs to allow traffic between the compute resources and the storage account within the VNet.

4. Azure Bastion: For secure, remote access to the compute resources from outside the VNet, we can use Azure Bastion, which provides RDP and SSH connectivity via the Azure portal without the need for public IP addresses.

5. VPN Gateway or ExpressRoute: To connect to the Azure VNet from on-premises networks securely, we can use a VPN Gateway or ExpressRoute with private peering. This allows on-premises compute resources to access the Azure storage account as if they were part of the same local network.

6. DNS Configuration: Proper DNS configuration is necessary to resolve the names of the storage account for the compute resources within the Azure VNet. Azure provides DNS services that can be used for name resolution within VNets. A compute resource from a different virtual network can reach the storage account via the private endpoint, provided the necessary dns configuration is in place and the virtual networks are peered or there is line-of-sight private ip routing between the caller and the callee.

7. Outbound Connectivity: If the compute resources need to access the internet, we can configure outbound connectivity using Azure NAT Gateway or Load Balancer outbound rules, even if the compute resources don't have public IP addresses.

By configuring the VNet, NSGs, and DNS settings correctly, and using service endpoints, we can ensure that compute resources without public IP addresses can securely access an Azure storage account without private endpoints. This setup maintains the security and isolation of our resources within Azure while allowing necessary communication between them.



Thursday, June 27, 2024

 Even when a vector database might be a straightforward choice for specific use cases involving drone data, the choice of vector database matters. For example, usages of vector embeddings and vector similarity search are two different use cases. The embedding model is a neural network that transforms raw data into a vector embedding, or a vector of numbers that represents the original data. Querying the vector database requires similarity search between the query vectors and the vectors in the database. The result of the search can be the most relevant vectors. The scope of the search can be limited to a subset of the original set of vectors in the embeddings and this is done with the help of metadata filtering. So, the difference between the two is that the first is geared for storing and retrieving large number of high-dimension numerical data vectors and the latter optimizes for selectivity and high computation over a subset of the data. Metadata might include dates, times, genres, categories, names, types, descriptions, and depending on our use-case, something custom including tags and labels. Frameworks like LangChain and LlamaIndex offer capabilities to automatically tag incoming queries with metadata. Cloud vector searches like Azure Cognitive Search can automatically index vector data from two primary sources: Azure Blob indexers and Azure Cosmos DB for NoSQL Indexers. Azure Cognitive Search also includes scoring algorithms for vector search which are primarily of two types: exhaustiveKnn that calculates the distance between the query vector and data points and Hierarchical Navigable Small World aka hnsw that organizes high-dimensional data points into a hierarchical graph structure. Amazon also offers bountiful cloud resources for varying purposes which is not all tightly integrated into a single platform like Vertex AI, Databricks or Snowflake do. A large number of Databricks users in organizations also use Snowflake. Vector databases also include pure form such as Pinecone, full-text search databases like ElasticSearch, vector libraries like Faiss, Annoy and Hnswlib, vector-capable NoSQL databases such as MongoDB, CosmosDB and Cassandra, and vector capable SQL databases like SingleStoreDB and PostgreSQL. Rockset is a leader in this quadrant.

When functionalities are met, choices are often prioritized by efficient storage, storing and retrieving with high performance, and the variety of metrics that can be used to perform similarity searches. Pure vector databases provide efficient similarity search with indexing techniques, scalability for large datasets and high query workloads, support high dimensional data, support HTTP  and JSON-based APIs, native support for vector operations including dot-products. Their main drawback is usually that indexing is time consuming especially given that there might be various parameters for indexing and incorrect values may introduce inefficiencies. Full-text search work great for text and work well with indexing libraries like Apache Lucene and vector libraries. If we want off-the-shelf vector computations such as fast nearest neighbor search, recommendation systems, image search and NLP, vector libraries are useful and more and more are being added to open source continually.  Their main drawback is that we must bring our own infrastructure. 


Wednesday, June 26, 2024

 Gen AI created a new set of applications that require a different data architecture than traditional systems. Traditional databases cannot help innovating in this space and now there are existing applications that are enhanced with AI. The demands on the data architecture that allow people to build applications quickly and efficiently at scale is the most important need of the hour. Even the data structures expected to store records are changing. A search analytics database that stores vector embeddings and indexes vector embeddings so that we can extract value from both your structured and unstructured data. You also need observability along with databases. There is a need to have multiple aspects from the data stores that power these AI-era applications. There are structure and data governance requirements surrounding the storing and use of this data especially with renewed emphasis on building trust by leveraging privacy and data protection capabilities. There is also a need to unify data whether they are from event streams to bring in real-time data into the system or whether they are transactions or change data captures. Performance considerations have also changed from benchmarks that look obsolete in the face of what is required to train the models. From 2015 to today, the main emphasis in data architectures has been the separation of compute from storage at cloud scale that is evident in the way models are trained, tested, and released today as well as the release of successful products like Snowflake and Databricks. This is going to change in the case of AI application data architectures with primary use case of Uber-like applications because it is real-time like people, places and things as well as unifying all the different data sources. There are two really important sides where one side involves data training or tuning with proprietary data sets that comes with infrastructure that allows us to aggregate all this data and build really efficient models and then the other side is the inference side where we take these models and extract embeddings and this comes with serving tier with which to build AI enhanced applications. Both of these need to be enhanced so that there are very fast iterative cycles. And then one more aspect of building both of these subsystems is the enablement of real-time data collection and analysis. Temporal and spatial capabilities also matter as aspects to this data architecture. Also, vectors are important for identifying context, but a new kind of data set needs to be behavioral which comes from metadata filtering where the search space is reduced. Applications that empower drones include Retrieval Augmented Generation, pattern matching, anomaly detection, and recommendation systems just like many other AI applications. Contextual, behavioral, accuracy and personalization of data and search characterizes this architecture. 

Reference: DroneData.docx

Tuesday, June 25, 2024

 Challenges in storing drone data.

Unlike traditional data architectures involving an online transaction processing data system, where atomicity, consistency, isolation and durability guarantees enable proper inventory registration and calculations, and an online analytical processing system with its reporting, temporal aggregation and analysis capabilities, the lines between transactions and analysis blur for near real-time data analysis of drones as both the inventory and the associated processing must continuously adapt. It could probably be compared to data and event pipelines built for large-scale commercial applications such as AirBnb and with the potential to become real-time processing by eliminating the cost for network latency and storage access. 

Flights for drones remain undisturbed whether stationary or linear until the next update from the controller or local flight path change determination. Clearly, the capabilities of the drone units might vary from fleet to fleet. Individual drone, degree of freedom, motion capabilities, and variety of non-flight actions that the unit can take must need to be differentiated so they can be used selectively. When the entire fleet moves the same as a single unit in the fleet, there are far fewer updates to the data stored for the drones than otherwise. With updates varying from few to large scale, rare to frequent, the data and the events generated must be handled for any volume and rate. At all processing modules, virtualizations that cover the variety possible from the model and type of drone units mandate consistency in the data types used to represent them, so that the interface remains clean with just the right levers and validations for the associated concerns. A unified Api is not just an evolutionary step for drone data management but also a necessity from the start. It might be customary to build data pipeline on Spark and Scala that aids bookkeeping and double-entries for tallying so that the accounting information can then be segregated. Drone data and events have a lot of similarity to edge computing and streaming data pipelines and stores. The need for these approaches must be balanced by the hard performance goals for regular routines.

Cloud databases such as Azure Cosmos DB provide such a general purpose and easily programmable starter stores which can scale and keep up its amazing response times for requests. They remain starter stores because there is little forethought to the specializations in routines needed with data access and storage of drone data and metadata. That said, location information that is rapidly updated for drones can be continuously updated and maintained in these cloud databases. There is convenience in adding various dimensions to the same store as warehouses have taught and such a schema works well for drones as well. But as noted earlier, specializations in components may mandate hybrid data architectures that might not all fit nicely within a single product even if the product were a starting point.

In conclusion, drones data architectures must be articulated with the full palette of storage options, microservices, shared architectures, events, pipelines and automation that may become just as big as Master Data Management systems are today. The good news is that this needs to be done only once for multi-tenant systems.


Monday, June 24, 2024

 A vector database and search for positioning drones involves the following steps:

The first step would be to install all the required packages and libraries. We use Python in this sample:

import warnings 

warnings.filterwarnings(‘ignore’) 

from datasets import load_dataset 

from pinecone import Pinecone, ServerlessSpec 

from DLAIUtils import Utils 

import DLAIUtils  

import os 

import time 

import torch 

From tqdm.auto import tqdm 

We assume the elements are mapped as embeddings in a 384-dimensional dense vector space. 

A sample query would appear like this: 

query = `what is node nearest this element?` 

xq = model.encode(query) 

xq.shape 

(384,) 

The next step is to set up the Pinecone vector database to upsert embeddings into it. These database index vectors make search and retrieval easy by comparing values and finding those that are most like one-another 

utils = Utils() 

PINECONE_API_KEY = utils.get_pinecone_api_key() 

if INDEX_NAME in [index.name for index in pinecone.list_indexes()]: 

       pinecone.delete_index(INDEX_NAME) 

print(INDEX_NAME) 

pinecone.create_index(name=INDEX_NAME, dimension=model.get_sentence_embedding_dimension(), metric=’cosine’,spec=ServerlessSpec(cloud=’aws’, region=’us-west-2’)) 

index = pinecone.Index(INDEX_NAME) 

print(index) 

Then, the next step is to create embeddings for all the elements in the sample space and upsert them to Pinecone. 

batch_size=200 

vector_limit=10000 

elements=element[:vector_limit] 

import json 

for i in tqdm(range(0, len(elements), batch_size)): 

        i_end = min(i+batch_size, len(elements)) 

        ids = [str(x) for x in range(i, i_end)] 

        metadata = [{‘text’: text} for text in elements[i:i_end]] 

        xc = model.encode(elements[i:i_end]) 

        records = zip(ids, xc, metadata) 

        index.upsert(vectors=records) 

index.describe_index_stats() 

Then the query can be run on the embeddings and the top matches can be returned. 

def run_query(query): 

        embedding = model.encode(query).tolist() 

        results = index.query(top_k=10, vector=embedding, include_metadata=True, include_value) 

        for result in results[‘matches’]: 

                print(f”{round(result[‘score’], 2)}: {result[‘metadata’][‘node’]}”) 

run_query(“what is node nearest this element?”) 

With this, the embeddings-based search over elements is ready. In Azure, cosmos DB offers a similar semantic search and works as a similar vector database. 

The following code outlines the steps using Azure AI Search 

# configure the vector store settings, vector name is in the index of the search

endpoint: str = "<AzureSearchEndpoint>"

key: str = "<AzureSearchKey>"

index_name: str = "<VectorName>"

credential = AzureKeyCredential(key)

client = SearchClient(endpoint=endpoint,

                      index_name=index_name,

                      credential=credential)


# create embeddings 

embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(

    azure_deployment=azure_deployment,

    openai_api_version=azure_openai_api_version,

    azure_endpoint=azure_endpoint,

    api_key=azure_openai_api_key,

)

# create vector store

vector_store = AzureSearch(

    azure_search_endpoint=endpoint,

    azure_search_key=key,

    index_name=index_name,

    embedding_function=embeddings.embed_query,

)

# create a query

docs = vector_store.similarity_search(

    query=userQuery,

    k=3,

    search_type="similarity",

)

collections.insert_many(docs)

reference: https://github.com/ravibeta/Node-Element-Predictions


Sunday, June 23, 2024

 Some of fleet management data science algorithms are captured via a comparison table of well-known data mining algorithms as follows:

Data Mining Algorithms Description Use case

Classification algorithms This is useful for finding similar groups based on discrete variables

It is used for true/false binary classification. Multiple label classifications are also supported. There are many techniques, but the data should have either distinct regions on a scatter plot with their own centroids or if it is hard to tell, scan breadth first for the neighbors within a given radius forming trees or leaves if they fall short.

Useful for categorization of fleet path changes beyond the nomenclature. Primary use case is to see clusters of service request that match based on features. By translating to a vector space and assessing the quality of cluster with a sum of square of errors, it is easy to analyze large number of changes as belonging to specific clusters for management perspective.

Regression algorithms This is very useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction. Fleet path changes demonstrate elongated scatter plots in specific categories. Even when the path changes come demanding different formations in the same category, the reorientation times are bounded and can be plotted along the timeline. One of the best advantages of linear regression is the prediction about time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than coming up with a model that behaves like a good fit for all the data points.

Segmentation algorithms A segmentation algorithm divides data into groups or clusters or items that have similar properties. Path change stimuli segmentation based on fleet path change feature set is a very common application of this algorithm. It helps prioritize the response to certain stimuli.

Association algorithms This is used for finding correlations between different attributes in a data set Association data mining allows these users to see helpful messages such as “stimulii who caused a path change for this fleet type also caused a path change for this other fleet formation”

Sequence Analysis Algorithms This is used for finding groups via paths in sequences. A Sequence Clustering algorithm is like a clustering algorithm mentioned above but instead of finding groups based on similar attributes, it finds groups based on similar paths in a sequence.  A sequence is a series of events. For example, a series of web clicks by a user is a sequence. It can also be compared to the IDs of any sortable data maintained in a separate table. Usually, there is support for a sequence column. The sequence data has a nested table that contains a sequence ID which can be any sortable data type. This is very useful to find sequences of fleet path changes opened across customers. Generally, a transit failure could result in a cascading failure across the transport network. This sort of sequence determination in a data driven manner helps find new sequences and target them actively even suggesting the same to the stimulii who cause ath changes to the fleet formations so that they can be better prepared for failures across relays.


Sequence Analysis also helps with interactive formation changes as described here.


Outliers Mining Algorithm Outliers are the rows that are most dissimilar. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R, find rows in R which are dissimilar to most point in R. The objective is to maximize dissimilarity function in with a constraint on the number of outliers or significant outliers if given. 

The choices for similarity measures between rows include distance functions such as Euclidean, Manhattan, string-edits, graph-distance etc and L2 metrics. The choices for aggregate dissimilarity measures is the distance of K nearest neighbors, density of neighborhood outside the expected range and the attribute differences with nearby neighbors The steps to determine outliers can be listed as: 1. Cluster regular via K-means, 2.  Compute distance of each tuple in R to nearest cluster center and 3. choose top-K rows, or those with scores outside the expected range. Finding outliers is sometimes humanly impossible because the number of path changes can be quite high. Outliers are important to discover new strategies to encompass them. If there are numerous outliers, they will significantly increase costs. If they were not, then the patterns help identify efficiencies.

Decision tree This is probably one of the most heavily used and easy to visualize mining algorithms. The decision tree is both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well. A Decision Tree algorithm uses the attributes of the external stimulii to make a prediction such as the reorientation time on a next path change. The ease of visualization of split at each level helps throw light on the importance of those attributes.  This information becomes useful to prune the tree and to draw the tree

Logistic Regression This is a form of regression that supports binary outcomes. It uses statistical measures, is highly flexible, takes any kind of input and supports different analytical tasks. This regression folds the effects of extreme values and evaluates several factors that affect a pair of outcomes. Path changes based on stimulii category can be used to predict the likelihood of a path change from a category of stimulii. It can also be used for finding repetitions in requests 

Neural Network This is a widely used method for machine learning involving neurons that have one or more gates for input and output. Each neuron assigns a weight usually based on probability for each feature and the weights are normalized across resulting in a weighted matrix that articulates the underlying model in the training dataset. Then it can be used with a test data set to predict the outcome probability. Neurons are organized in layers and each layer is independent of the other and can be stacked so they take the output of one as the input to the other. Widely used for SoftMax classifier in NLP associated with fleet path changes. Since descriptions of stimulii, fleet formation changes, path adjustments and adjustment time to modified path and formation captured by spatial and temporal are conformant to narratives with metric-like quantizations, Natural Language Processing could become a significant part of the data mining and ML portfolio

Naïve Bayes algorithm This is probably the most straightforward statistical probability-based data mining algorithm compared to others.

The probability is a mere fraction of interesting cases to total cases. Bayes probability is conditional probability which adjusts the probability based on the premise. This is widely used for cases where conditions apply, especially binary conditions such as with or without. If the input variables are independent, their states can be calculated as probabilities, and if there is at least a predictable output, this algorithm can be applied. The simplicity of computing states by counting for class using each input variable and then displaying those states against those variables for a give value, makes this algorithm easy to visualize, debug and use as a predictor.  

Plugin Algorithms Several algorithms get customized to the domain they are applied to resulting in unconventional or new algorithms. For example, a hybrid approach on association clustering can benefit determining relevant associations when the matrix is quite large and has a large tail of irrelevant associations from the cartesian product. In such cases, clustering could be done prior to association to determine the key items prior to this market-basket analysis. Fleet path changes are notoriously susceptible to apply with variations even when pertaining to the same category. These path changes do not have pre-populated properties from a template, and spatial and temporal changes can vary drastically along one or both. Using a hybrid approach, it is possible to preprocess these path changes with clustering before analyzing such as with association clustering. 

Simultaneous classifiers and regions-of-interest regressors Neural nets algorithms typically involve a classifier for use with the tensors or vectors. But regions-of-interest regressors provide bounding-box localizations. This form of layering allows incremental semantic improvements to the underlying raw data. Fleet path changes are time-series data and as more and more are applied, specific time ranges become as important as the semantic classification of the origin of path changes and their descriptions. Using this technique, underlying issues can be discovered as tied to internal or external factors. The determination of root cause behind a handful of path changes is valuable information.


Collaborative filtering Recommendations include suggestions for a knowledge base, or to find model service requests. To make a recommendation, first a group sharing similar taste is found and then the preferences of the group is used to make a ranked list of suggestions. This technique is called collaborative filtering. A common data structure that helps with keeping track of people and their preferences is a nested dictionary. This dictionary could use a quantitative ranking say on a scale of 1 to 5 to denote the preferences of the people in the selected group.  To find similar people to form a group, we use some form of a similarity score. One way to calculate this score is to plot the items that the people have ranked in common and use them as axes in a chart. Then the people who are close together on the chart can form a group. Several approaches mentioned earlier provide a perspective to solving a problem. This is different from those in that opinions from multiple participants or sensors in a stimuli creation or recognition agens are taken to determine the best set of fleet formation or path changes to recommend.

Collaborative Filtering via Item-based filtering This filtering is like the previous except that it was for user-based approach, and this is for item-based approach. It is significantly faster than the user-based approach but requires the storage for an item similarity table. There are certain filtering cases where divulging which stimuli/sensors go with what formation/path change, is helpful to the fleet manager or participants. At other times, it is preferable to use item/flight path-based similarity. Similarity scores are computed in both cases. All other considerations being same, item-based approach is better for sparse dataset. Both stimuli-based and item-based approach perform similarly for the dense dataset.

Hierarchical clustering Although classification algorithms vary quite a lot, hierarchical algorithm stands out and is called out separately in this category. It creates a dendrogram where the nodes are arranged in a hierarchy.  Specific domain-based ontology in the form of dendrogram can be quite helpful to mining algorithms. 

NLP algorithms Popular NLP algorithms like BERT can be used towards text mining. NLP models come very useful for processing flight path commentary and associated artifacts in the fleet flight management.

Algorithm Implementations:

https://jsfiddle.net/g2snw4da/

https://jsfiddle.net/jmd62ap3/

https://jsfiddle.net/hqs4kxrf/ 

https://jsfiddle.net/xdqyt89a/

#codingexercise https://1drv.ms/w/s!Ashlm-Nw-wnWhPAI9qa_UY0gWf8ZPA?e=PRMYxU


Saturday, June 22, 2024

 

This is a summary of the book titled “Glad We Met – The Art and Science of 1:1 Meetings” written by Steven G. Rogelberg and published by Oxford UP in 2024. The workplace involves plenty of 1:1 meetings and almost half of them do not achieve the desired results. Drawing on extensive research, the author provides a framework on setting up, conducting, and following through on one-on-one meetings. Since career advancement depends on performance evaluation by manager for his or her reports, the author encourages managers to ask the right questions, foster engagement, illuminating each person’s progress. It works both ways for the manager to educate their reports and for their own leadership journey.

These one-on-one meetings do benefit from a framework, argues the author, and those between the manager and direct reports already come with an agenda. Weekly sessions are helpful to the managers and the meeting locations and questions to ask must be planned. Staying positive, sharing mutual priorities, and covering new material, asking for feedback, and saying thank you are all part of it. Regularly conducting these sessions gives more practice to both parties.

One-on-one meetings are crucial for team members and organizations, as they address their priorities, goals, problems, productivity, and employee development. With about a billion business meetings daily, 20% to 50% of these sessions could cost $1.25 billion daily. However, participants often report suboptimal results. Managers can improve their one-on-one meetings to gain a better return on time and money. These meetings strengthen ties within teams and organizations, supplement performance appraisals, and fuel communication between direct reports and managers. To maximize the benefits of one-on-one meetings, create an agenda using the "listings" approach, with the employee covering their list first and the team leader going down. This approach covers immediate work issues and long-range topics, such as career growth and development.

One-on-one meetings are crucial for managerial success, team success, and employee learning and engagement. They promote diversity and inclusion, strengthen relationships, and produce better outcomes. Before meetings, provide context for the topics and ask usual questions. Establish a routine for meetings and explain that they represent the manager's decision to prioritize employees' needs. Stay open-minded and explain shared objectives.

Hold weekly sessions, especially with remote employees, to avoid micromanagement. Choose a schedule that aligns with your needs and preferences, giving your directs some agency. If employees operate from the same office, consider deferring to their preferences.

Plan the location and questions for the meetings, including the office, direct report's office, or outdoor locations. Involve your employee in planning the setting and direct the conversation. The quality of questions asked will determine the quality of dialogue.

Effective one-on-one meetings are essential for team success, fostering better outcomes, strengthening relationships, and promoting diversity and inclusion. Focus on building connections with your employees, their engagement, setting priorities, giving feedback, and fostering career growth and development. Avoid asking personal questions or gossip and maintain a cheerful outlook. Take notes, cover new ground, and ask for feedback. Work through tactical and personal issues, ask for candid feedback, and implement "five key behaviors" to improve your performance. Both parties should feel free to ask for help, and the meeting should end on time. Wrap up the meeting and record important takeaways. Follow up on all commitments made during the meeting.

One-on-one meetings can occur between managers and their direct reports, or with employees meeting individually with their managers' manager or a higher-up executive. Regular one-on-one sessions help ensure your success as a leader, as they provide valuable insights, foster relationships, and help you make better decisions. The Chinese proverb "If you want happiness for an hour, take a nap, go fishing, inherit a fortune, but if you want happiness for a lifetime, help somebody" suggests that fostering relationships and helping others can lead to long-term happiness.

Previous book articles: BookSummary111.docx

Friday, June 21, 2024

 This is a summary of the book titled “The Digital Coaching Revolution – how to support employee development with Coaching Tech” written by Anna Tavis and Woody Woodward. Technology not only augments professional coaching services but also provide a platform for coaches to embrace the potential of data analysis and digital delivery. The authors provide a comprehensive review of digital coaching in corporate environments, sports, healthcare and so on. Digital coaching platforms are maturing towards full autonomy. They help to deliver tailored interventions. Holistic coaching improves individual well-being and productivity. AIs and chatbots have forced coaches to re-imagine their role. Digital coaching has potential to become universities with proper credentials for students. It can help improve diversity related outcomes.

Coaching is a collaborative process that helps individuals and organizations achieve their potential through affective, cognitive, skill-based, and results-based growth. It involves assessing a client's initial needs, agreeing to a schedule, writing a contract, conducting coaching sessions, measuring outcomes, and providing feedback. Although academic research on coaching is still in its infancy, a meta-analysis of scientific studies suggests it improves work satisfaction and well-being more than training. Professional coaching has its roots in Socratic questioning and has evolved with the rise of digital coaching, which uses technology to enhance coaching practices. Digital coaching platforms use machine learning to match coaches and coachees, offer group chats and Google Docs for remote communication, and provide dashboard tools for performance tracking. The multi-billion dollar coaching industry is expected to continue growing and disrupt the $360 billion L&D industry. Digital coaching platforms are maturing toward full autonomy, with four distinct phases: emergent, expansion, maturity, and optimization.

Digital coaches and platforms are increasingly using data to deliver personalized, scalable, and accessible approaches for their clients. This data allows providers to collect empirical evidence demonstrating the value of coaching interventions, such as the ROI Calculator released by CoachHub in 2023. Digital coaching platforms are also experimenting with sentiment analysis and biometrics to guide their services. However, coaches must balance the use of data with relational interactions to drive individual and team performance. Holistic coaching can improve both individuals' well-being and business productivity, as it addresses rising rates of pessimism, anxiety, and depression among workers. Digital health and well-being platforms, such as BetterUp, EZRA, and CoachHub, offer coaching services that cover physical health, mental health, work-life balance, and personal development. As AI and chatbots continue to develop, coaches must re-imagine their role, using AI tools to recommend targeted learning resources and streamline feedback. Integrating AI and chatbots into digital coaching platforms makes personalized learning accessible to a broader audience.

Digital coaching tools can enhance diversity-related outcomes by helping companies achieve their DEIB (Diversity, Equity, Inclusion, and Belonging) goals. These tools can be customized and applied to both upper management and junior employees. BetterUp focuses on the belonging aspect of DEIB, ensuring all employees have access to a coach. As coaching education moves from private companies to universities, it becomes more critical to ensure coaches have appropriate education, training, and credentials. Professional coaching associations offer recognized credentials based on their own standards. Universities are stepping up their coaching education opportunities to advance research in coaching technology and train a new generation of practitioners.


Thursday, June 20, 2024

 This is a continuation of articles on IaC shortcomings and resolutions. As a recap of several articles, this section brings to light what has worked and what hasn’t across a diverse and complex set of deployments. 

1. Cloud is our friend in its ability to provide management free, elastic, performant and highly available resources that can be reached from around the world. Transition from on-premises while requiring migration and modernization concerns are not insurmountable and the final state as cloud native deployments is very appealing in the long run.

2. Resources transition from preview to general acceptance as various cloud services become more mainstream. Since they are developed and made generally available independent of each other, many mature and lower the users concerns in their usage. Leveraging these built-in features over customizations holds us in good stead as it is maintenance free.

3. Anything managed is good including networks, storage and compute if the resource or set of resources to be deployed give that option. Taking the example of an Azure Machine Learning Workspace over an Azure Databricks instance, storage accounts, key vaults, managed virtual network are tightly integrated with the managed option and eschews rediscovery by hands-on configuration.

4. Complex deployments have a significant number of configuration parameters that can quickly get out of hand and require a large test matrix. Capturing them in Infrastructure-as-code with source control is helpful to both repeatable deployments as well as curating best practices.

5. The final state of the resources on their deployment must meet all the criteria so instead of working through the order of various steps and getting lost in the choices, it is better to work backwards from the final state once that has gained deliberation and buy-ins from stakeholders.

6. Zonal redundancy and regional disaster recovery are aspects that must be considered as early as design and implementation. They are not an afterthought and must be judiciously chosen to conserver resources and costs without detriment to the business continuity

7. Most workspaces and resources with public ip connectivity grow feet in the form of private endpoints to gain private plane connectivity through virtual networks so that the connectivity and traffic are secured and there is less attack surface from the ubiquitous internet. Designating subnets for providing addresses to private endpoints from various resources is good practice. Only when private connectivity interferes with usage of resources via workspace notebooks or management views and requires jump hosts or Bastions, then both public and private connectivity might be required. For public connectivity alone, simple firewall rules to enumerate the source and destinations can thwart many attacks that might require otherwise have required costly setup like Defender

8. Cost is always a concern as they have a tendency to balloon with usage and keeping them nanageable requires attention to features that are used and those that aren’t and the process must be repeated at setup as well as an ongoing basis.

9. Just like built-in features, most resources come with diagnostics, troubleshooting and documentation as well as support, so leveraging them avoids lengthy investigations.

10. Naming convention is important to capture the variations possible to resources and do very weill with prefixes and suffixes that have well-known meanings.  

Wednesday, June 19, 2024

 

This is a summary of the book titled “The Digital Coaching Revolution – how to support employee development with Coaching Tech” written by Anna Tavis and Woody Woodward. Technology not only augments professional coaching services but also provide a platform for coaches to embrace the potential of data analysis and digital delivery. The authors provide a comprehensive review of digital coaching in corporate environments, sports, healthcare and so on. Digital coaching platforms are maturing towards full autonomy. They help to deliver tailored interventions. Holistic coaching improves individual well-being and productivity. AIs and chatbots have forced coaches to re-imagine their role. Digital coaching has potential to become universities with proper credentials for students. It can help improve diversity related outcomes.

Coaching is a collaborative process that helps individuals and organizations achieve their potential through affective, cognitive, skill-based, and results-based growth. It involves assessing a client's initial needs, agreeing to a schedule, writing a contract, conducting coaching sessions, measuring outcomes, and providing feedback. Although academic research on coaching is still in its infancy, a meta-analysis of scientific studies suggests it improves work satisfaction and well-being more than training. Professional coaching has its roots in Socratic questioning and has evolved with the rise of digital coaching, which uses technology to enhance coaching practices. Digital coaching platforms use machine learning to match coaches and coachees, offer group chats and Google Docs for remote communication, and provide dashboard tools for performance tracking. The multi-billion dollar coaching industry is expected to continue growing and disrupt the $360 billion L&D industry. Digital coaching platforms are maturing toward full autonomy, with four distinct phases: emergent, expansion, maturity, and optimization.

Digital coaches and platforms are increasingly using data to deliver personalized, scalable, and accessible approaches for their clients. This data allows providers to collect empirical evidence demonstrating the value of coaching interventions, such as the ROI Calculator released by CoachHub in 2023. Digital coaching platforms are also experimenting with sentiment analysis and biometrics to guide their services. However, coaches must balance the use of data with relational interactions to drive individual and team performance. Holistic coaching can improve both individuals' well-being and business productivity, as it addresses rising rates of pessimism, anxiety, and depression among workers. Digital health and well-being platforms, such as BetterUp, EZRA, and CoachHub, offer coaching services that cover physical health, mental health, work-life balance, and personal development. As AI and chatbots continue to develop, coaches must re-imagine their role, using AI tools to recommend targeted learning resources and streamline feedback. Integrating AI and chatbots into digital coaching platforms makes personalized learning accessible to a broader audience.

Digital coaching tools can enhance diversity-related outcomes by helping companies achieve their DEIB (Diversity, Equity, Inclusion, and Belonging) goals. These tools can be customized and applied to both upper management and junior employees. BetterUp focuses on the belonging aspect of DEIB, ensuring all employees have access to a coach. As coaching education moves from private companies to universities, it becomes more critical to ensure coaches have appropriate education, training, and credentials. Professional coaching associations offer recognized credentials based on their own standards. Universities are stepping up their coaching education opportunities to advance research in coaching technology and train a new generation of practitioners.

 

Tuesday, June 18, 2024

 This is a summary of the book titled “Leader as Healer” written by Nicholas Janni and published by LID Publishing in 2022. This book is an interesting take on leadership from a person who emphasizes spirituality, mental healing, mindfulness, consciousness and peak performance from a Zen-like state. He is unusual as a business leader and author because he teaches acting and heads his own theatre company. He says that innovative leaders should tap into enlightened consciousness. Today’s problems require a heightened awareness and leaders who are aware and empathetic can be more than just executors. They can become emotional and spiritual healers and the journey begins with self-care and leading a purposeful life.

Technology has transformed the world, and heightened consciousness can transform individuals. CEOs and senior executives must invest their energy in innovative, mindful leadership to address today's challenges. Major disruptions are impacting industries, and society must prioritize communal, intuitive, and metaphysical principles over greed, self-interest, and competition. To fix today's problems, leaders must cultivate higher awareness and become emotional and spiritual healers.

Leaders who practice elevated thinking, being, and acting can become emotional and spiritual healers, inspiring others and transforming fragmented organizations into "coherent wholes." They can reawaken a company's vitality, build internal connections, and imbue an organization with energy. The "Leader-as-healer" model is an advanced leadership construct that replaces the outdated "Leader as executor" model, which prioritizes profits, compelled action, and discipline.

In today's fractious times, innovative leaders and a radical attitude adjustment are needed to address the challenges and promote a more compassionate and sustainable world.

The "Leader as executor" role is outdated and should be replaced by "Leader-as-healer." This model prioritizes profit, discipline, and action, but does not address the human side of business. Leaders-as-healers have practical wisdom and recognize the value of giving focused attention to their leaders, providing a "coherent presence" and showing employees that they are always ready to help.

To cope effectively with today's complex, disruptive world, leaders and employees must practice self-care, paying close attention to their emotional, spiritual, and physical needs, including their health. True leadership sometimes requires excising moral and spiritual tumors from organizations or nations. To tune into one's body and emotions, consider following a somatic mindfulness practice, paying close attention to breathing, and focusing on the body's inner and outer sensations.

Simple awareness exercises can help develop significant personal insights and rewire neural pathways, allowing leaders to better understand and navigate the challenges of today's complex world. By practicing self-care, leaders can become more receptive to new ideas and maintain an open spirit.

Self-care means paying close attention to one’s emotional, spiritual, and physical needs, including one’s health. When leaders do this, they develop significant personal insights and this helps to deepen the scope of their leadership. Mindfulness is practiced by focusing on their minds with contemplative exercises such as meditation which dates back to pre-Colombian civilization, early Hebrew mystics, Indian yogis, and Native American shamans. We set aside 20 minutes daily for meditation, taking notes on our inner emotional, mental, and physical state.

Lead a life of purpose – the right purpose – to avoid navigating life like a ship in stormy waters without a compass or rudder. A leader who operates unemotionally, based only on rationality, might translate purpose as increased profits and productivity, but their life's purpose should be internally worthwhile. Define your values and consider the contributions you can make to improve the world.

A strong sense of purpose is crucial for navigating life effectively. A leader should focus on internal, worthwhile goals rather than external ones. Emotions are the gateway to deeper humanity and a richer, more heartfelt relationship with life and leadership. Leaders should embrace their emotional side and abandon archaic thinking that has failed in the past. They should practice enlightened leadership that fortifies organizations and employees by combining thinking and feeling facets of themselves.

#codingexercise

Position eight queens on a chess board without conflicts:

    public static void positionEightQueens(int[][] B, int[][] used, int row) throws Exception {

        if (row == 8) {

            if (isAllSafe(B)) {

                printMatrix(B, B.length, B[0].length);

            }

            return;

        }

        for (int k = 0; k < 8; k++) {

            if ( isSafe(B, row, k) && isAllSafe(B)) {

                B[row][k] = 1;

                positionEightQueens(B, used, row + 1);

                B[row][k]  = 0;

            }

        }

    }

    public static boolean isSafe(int[][] B, int p, int q) {

        int row = B.length;

        int col = B[0].length;

        for (int i = 0; i < row; i++) {

            for (int j = 0; j < col; j++) {

                if (i == p && j == q) { continue; }

                if (B[i][j] == 1) {

                    boolean notSafe = isOnDiagonal(B, p, q, i, j) ||

                            isOnVertical(B, p, q, i, j) ||

                            isOnHorizontal(B, p, q, i, j);

                    if(notSafe){

                        return false;

                    }

                }

             }

        }

        return true;

    }
    public static boolean isAllSafe(int[][] B) {

        for (int i = 0; i < B.length; i++) {

            for (int j = 0; j < B[0].length; j++) {

                if (B[i][j]  == 1 && !isSafe(B, i, j)) {

                    return false;

                }

            }

        }

        return true;

    }

    public static boolean isOnDiagonal(int[][] used, int r1, int c1, int r2, int c2) {

        boolean result = false;

        int row = used.length;

        int col = used[0].length;

        for (int k = 0; k < 8; k ++) {

            if (r2 - k >= 0 &&  c2 - k >= 0 && r1 == r2 - k && c1 == c2 - k) {

                return true;

            }

            if (r2 + k < row && c2 + k < col && r1 == r2 + k && c1 == c2 + k) {

                return true;

            }

            if (r2 - k >= 0 && c2 + k < col && r1 == r2 - k && c1 == c2 + k) {

                return true;

            }

            if (r2 + k < row  && c2 - k >= 0 && r1 == r2 + k && c1 == c2 - k) {

                return true;

            }

        }

        return result;

    }

    public static boolean isOnVertical(int[][] used, int r1, int c1, int r2, int c2) {

        boolean result = false;

        int row = used.length;

        int col = used[0].length;

        for (int k = 0; k < 8; k++) {

            if (c2 - k >= 0  && c1 == c2 - k && r1 == r2 ) {

                return true;

            }

            if (c2 + k < row && c1 == c2 + k && r1 == r2) {

                return true;

            }

        }

        return result;

    }

    public static boolean isOnHorizontal(int[][] used, int r1, int c1, int r2, int c2) {

        boolean result = false;

        int row = used.length;

        int col = used[0].length;

        for (int k = 0; k < 8; k++) {

            if (r2 - k >= 0  && r1 == r2 - k && c1 == c2 ) {

                return true;

            }

            if (r2 + k < row && r1 == r2 + k && c1 == c2) {

                return true;

            }

        }

        return result;

    }

 

Sample output:

1 1 2 1 1 1 1 1
1 1 1 1 1 2 1 1
1 1 1 2 1 1 1 1
1 2 1 1 1 1 1 1
1 1 1 1 1 1 1 2
1 1 1 1 2 1 1 1
1 1 1 1 1 1 2 1
2 1 1 1 1 1 1 1


Monday, June 17, 2024

 This is a summary of the book “Never say whatever: How small decisions make a big difference?” written by Richard A. Moran and published by McGraw-Hill in 2023. The author hosts a CBS syndicated radio program and is a venture capitalist. He contends that indifference is a waste and often loses its essence and purpose. He calls this writing an “anti-whatever workbook” and advocates empathy saying that we must be social and not just need-based like creatures. Detachment has a high correlation with losing direction and caring while making intelligent choices makes us better and even minor decisions are vital. Among the highlights of the book, one can find wisdom such as starting early with even high school students who are rarely taught how to make decisions. Smart decision makers are self-aware and trust their guts. Saying whatever is not only self-damaging, but also damaging the workplace we are in. Entrepreneurs just cannot have a “whatever” attitude and people with such attitude aggravate everyone. Often indifferent people come to regret their careless attitude. Good advisors can help us make sound decisions. Our choices shape our future so we must be decisive.

"Whatever" is a common term used by Americans to express disengagement and dismissal, affecting not only the speaker but also others around them. This attitude is risk-averse and hinders decision-making, as decisions and actions make life meaningful. Intentional people understand that even minor decisions can be vital, and they make choices with forethought, intention, and purpose. Decision-making can be simplified by sticking with the status quo, making minor adjustments, or doing a complete about-face. Intentions always point the way to actions and results, and a clear intention leads to sound choices. For example, if you want to secure a raise or a promotion, you must ask for it, practice it, and establish a deliberative process that makes the most of your strengths. A "whatever" attitude does not ensure the same number of planes take off and land each day.

High school students often lack the skills to make intelligent decisions, which are crucial for making major choices such as college, career, and marriage. To teach students the basics of intelligent decision-making, schools should teach them the importance of "whatever" and frame options as "if/then" scenarios. A 10-year study by Bain & Company found that organizations that make timely, intelligent decisions outperform those that appear indecisive. The "two-minute rule" demonstrates the power of quick, effective decision-making when there are small choices, but the appropriate amount of time and consideration should be given to all major decisions. Smart decision-makers are self-aware and trust their guts, knowing their strengths and weaknesses and their emotions. They recognize the consequences of their decisions and actions and are willing to make "gut" decisions under the right circumstances. A "whatever" attitude can damage careers and organizations, as it can undermine a company's direction and undermine employees' sense of purpose.

Entrepreneurs must not have a "whatever" attitude to succeed, as they must be nonstop decision-makers who can make timely and cost-effective choices. They often use pattern recognition to clarify their decision-making and draw analogies between challenges and past ones. "Whatever" people can aggravate everyone, leading to trouble and strife at work and home. Housemates who struggle with making decisions can be troublemakers.

Regret often arises from careless attitudes, as long-term regrets often stem from decisions we did not make. What we choose to do today often affects what is possible in the future. Small but bad choices can snowball. It is important to remember that every choice creates a "ripple effect" that affects other aspects of life in positive and negative ways. Every decision involves some degree of risk, and it is better to make a good decision than to do nothing. Consulting with trusted advisors can help make sound decisions and creating a "personal board of directors" with wise people can provide numerous expert perspectives. Finally, the author recommends us to be decisive and avoid using the word "whatever" to indicate that we do not care. This will ensure that our choices shape our future. Sam Alemayhu, a respected businessman, and investor grew up in an impoverished village in Ethiopia and emigrated to America. His accomplishments inspire others to be decisive, care about their choices, and never say "whatever."


#codingexercise

Find minimum in a rotated sorted array:

class Solution {

public int findMin(int[] A) {

If (A == null || A.length == 0) { return Integer.MIN_VALUE; }

int start = 0;

int end = A.length -1;

while (start < end) {

int mid = (start + end) / 2;

// check monotonically increasing series

if (A[start] <= A[end] && A[start] <= A[mid] && A[mid] <= A[end]]) { return A[start];};

// check if only [start, end]

if (mid == start || mid == end) { if (A[start] < A[end]) return A[start]; else return A[end];}

// detect rotation point

if (A[start] > A[mid]){

end = mid;

} else {

if (A[mid] > A[mid+1]) return A[mid+1];

start = mid + 1;

}

}

return A[0];

}

}

Works for:

[0 1 4 4 5 6 7]

[7 0 1 4 4 5 6]

[6 7 0 1 4 4 5]

[5 6 7 0 1 4 4]

[4 5 6 7 0 1 4]

[4 4 5 6 7 0 1]

[1 4 4 5 6 7 0]

[1 0 0 0 0 0 1]

Sunday, June 16, 2024

 This is a continuation of a study involving a software application that responds to a chat like query on the data contained in the ten-year collection of my blog posts from https://ravinote.blogspot.com. Each article collected on the blog post is a daily routine and contains mostly unstructured text on quality explanations of software engineering practices and code samples from personal, enterprise and cloud computing. The earlier part of the study referred to leveraging Azure OpenAI search service to perform a semantic search based on the chat like query to create a generated response. This part of the study follows up on taking the data completely private so that the model built to respond to the query can be hosted on any lightweight compute including handheld devices using mobile browsers. The lessons learned in this section now follows:

First, a brief introduction of the comparison of the search methodologies between the two: 

1. Azure AI Toolkit for VS Code:

o Approach: This simplifies generative AI app development by bringing together AI tools and models from Azure AI catalog. We specifically use the Phi-2 small language model. It also helps to fine-tune and deploy models to the cloud.

o Matching: It matches based on similarity between query vectors and content vectors. This enables matching across semantic or conceptual likeness (e.g., “dog” and “canine”). Phi-2 is a 2.7 billion-parameter language model not the order of trillions in large language model but sufficiently compact to demonstrate outstanding reasoning and language understanding capabilities. Phi-2 is a Transformer based model with a next-word prediction objective that was originally trained on a large mixture of synthetic datasets for NLP and coding. 

o Scenarios Supported: 

Find a supported model from the Model Catalog.

Test model inference in the Model Playground.

Fine-Tune model locally or remotely in Model Fine-tuning

Deploy fine-tuned models to cloud via command-palette for AI Toolkit.

o Integration: Works seamlessly with other Azure services.

2. Azure OpenAI Service-Based Search:

o Approach: Uses the Azure OpenAI embedding model such as GPT-models to convert queries into vector embeddings. GPT-models are large language models while Phi-2 models are small language models. The dataset can include the web for the Chat Completions API from Azure OpenAI service. 

o Matching: Performs vector similarity search using the query vector in the vector database usually based on the top k-matching content based on a defined similarity threshold.

o Scenarios supported: 

Similarity search: Encode text using embedding models (e.g., OpenAI embeddings) and retrieve documents with encoded queries.

Hybrid search: Execute vector and keyword queries in the same request, merging results.

Filtered vector search: Combine vector queries with filter expressions.

o Cost:

increases linearly even for infrequent use at the rate of few-hundred dollars per month. The earlier application leveraging completion API had to be taken down for this reason.

Both approaches leverage vector embeddings for search, but toolkit and Phi-2 model are better for customizations while Azure OpenAI Completions API is useful for streamlined applications and quick chatbots.

And now the learnings follow:

- Fine-Tuning: A pretrained transformer can be put to different use during fine-tuning such as question-answering, language generation, sentiment analysis, summarization, and others. Fine-tuning adapts the model to different domains. Phi-2 behaves remarkably well in this regard. Fine-tuning LLMs are so cost prohibitive that it can be avoided. On the other hand, small language models are susceptible to overfitting where the model learns specifics of the training data that cannot be applied to the query.

- Parameter-Efficient Fine-tuning: This is called out here only for rigor. Costs for tuning LLMs can be reduced by fine-tuning only a subset of the model’s parameter but it might result in “catastrophic forgetting”. These techniques include LoRA, prefix tuning and prompt tuning. The Azure Toolkit leverages QLoRA. LoRA stands for Low-Rank Adaptation of Large Language Models and introduces trainable rank decomposition matrices into each layer of transformer architecture. It also reduces trainable parameters for downstream tasks while keeping the pre-trained weights frozen. QLoRA combines quantization with LORA with quantization data types to use as one of nf4 (4-bit normal float) or fp4 and adjustable batch size for training and evaluation per GPU. The data type is for compression to 4-bit precision as opposed to native floating-point-32-bit precision.

- Problem with outliers:  As with all neural networks that create embeddings, outliers are significant because while model weights are normally distributed, the inclusion of outliers directly affects the quality of the model. Iterative study involving different range of inclusions was too expensive to include in the study.

- Dequantization – this is the process of taking quantized weights which are frozen and not trained to be dequantized back to 32-bit precision. It is helpful to inference when quantized values and quantization constant can be used to backpropagate calculated gradients.

- Paged optimizers are necessary to manage memory usage during training of the language models. Azure NC4AS_T4_v3 family VMs handle this well but choice of sku is in initial decision not something that we can change during flight.

- BlobFuseV2 to load all the data stored in private storage accounts as local filesystem is incredibly slow for read over this entire dataset. Toolkit is more helpful to run on notebooks and laptops with VS Code, GPU, and customized local Windows Sub-system for Linux.


Saturday, June 15, 2024

 

Drone Formation Commercial Software:

This is another addendum to the discussion about Drone Formation Commercial Software as described in this document. In this section, we describe surface detection. Drones can help with surface detection by using an encompassing space around the surface to detect. Let us assume that an object, say saucer like, needs to be surrounded by drones. The points of reference along the object as positions around which the drones must organize themselves can be easily determined by the drones themselves. We further assume each drone can detect the distance to the object by themselves or passed to each drone from some central sensor in real-time. If the scouts can figure out the overall matrix of X-Y-Z coordinates that is sufficiently large enough to encompass the object, then the rest of the drones can make use of this initial grid to spread themselves out before each converges to a point of reference closest to it and independent of others as long as it safe and not colliding with other drones.

The scouts do a two-pass flyover the objects while the distance remains within a range. If the range is exceeded, the boundary of the encompassing grid is known. With the initial pass determining the grid a subsequent pass to converge as close as possible to the surface of the object while maintaining a distance from each other, helps the drone to determine the points of reference on the surface. With the full strength of the drone formation then spreading over and distributing themselves against the points of reference will help to cover the surface.

When the number of drones is in excess of those required to cover the surface, they can form ever-increasing layers over the underlying points of reference. These points adjust to be on one layer for the layer above and the logic remains the same for additional drones to work out the new positions to occupy. Wave propagation and Fourier transform can predict how soon the drones can cover the object or the number of layers to form and the rate or duration for full coverage by all the drones.

Distance from each other as well as to a point of reference is sensor-specific and merely translates as an input for the drone to determine the actual position to occupy as the output. This works for all stationary objects in an outward-grid-enclosing-on-to-the-surface-of-the-object manner for the drone formation. Selection of the initial number and identities of the drones for the scouts can be determined by a threshold for the maximum distance allowed for a scout. After a scout reaches this threshold, another scout covers the next interval to the allowed threshold.

Moving objects have the characteristics that their shape remains constant during the flight and if the end points for the object can be tracked as distinct positions over time, then they the frame of reference can be adjusted to include the initial and final positions for a given time slice.

Friday, June 14, 2024

 This is the summary of the book titled “The art of explanation” written by Ros Atkins and published by Headline Publishing in 2023. He has been serving as the analysis editor for BBC and contends that the ability to explain oneself well is a necessary art. To be clear and concise at once takes practice and one that can be worked on each day. It applies to relationships. job interviews, email writing, teaching children and engaging any audience and is thus immeasurably useful and fortunately learnable. Crafting a quality explanation starts with the right questions, a certain degree of familiarity with the audience and their needs, preparations for controlled environments, and even training for impromptu speeches. Verbalizing and memorizing ahead of time helps. Anticipating and soliciting feedback helps to refine. Short form is preferable but the ability to delve in depth on demand is also important. Practice in everyday life can be transformational.

Explanation is a crucial skill that many people lack, as it can significantly impact how they communicate and achieve their goals. It is essential to convey our message clearly and impactfully, as it increases the likelihood of being understood and achieving desired results. For example, entrepreneurs can attract investors by explaining their business models, teachers can enhance students' enjoyment by explaining complex information, patients can stick to diet plans by explaining their benefits, and government agencies can gain interest in their services by explaining how to access them. To craft a quality explanation, we must start with the right questions and master the art of explanation by addressing 10 key features: simplicity, essential detail, complexity, efficiency, precision, context, no distractions, engaging, useful, and clarity of purpose.

To create a memorable and engaging message, it is essential to understand our audience and our needs. This involves understanding demographics, age ranges, and what they already know and want to know. Crafting a bespoke message that resonates with our audience and applying oour knowledge to engage them are essential. Lastly, communicating our credibility on the subject matter wraps it up.


Preparation is crucial for effective explanations. In controlled scenarios, these seven steps can be followed:

1. Set-up: Clarify our audience and the purpose of our explanation.

2. Find the information: Gather information, including a summary, questions, research areas, and subject matter.

3. Distill the information: Filter out irrelevant elements and evaluate the data.

4. Organize the information: Divide it into strands or sections, focusing on high-impact elements or stories.

5. Link the information: Write down our first draft to ensure smooth flow and authenticity.

6. Tighten the explanation: Eliminate unnecessary elements and practice thoroughly.

7. Deliver the explanation: Rehearse our explanation thoroughly, using methods like reading from a script, flashcards, or memorizing. Record ourselves running through the explanation and address any gaps.

In uncontrolled situations also, it is essential to prepare and communicate clearly. Following the same steps as for controlled settings, including setting up, finding information, and distilling the information helps. In dynamic settings, organizing our information differently, with no more than five strands. Each strand should have a primary point, three supporting facts, and relevant context. Verbalizing and memorizing our dynamic explanation ahead of time, using bridging phrases to connect different elements and communicating our desired information seamlessly will have a tremendous impact. Practicing using memory techniques, such as creating a "memory palace" to visualize different strands of our explanation aids our retention. Anticipating questions or feedback others might have regarding our explanation, as people tend to be predictable and pattern-based, helps cover our base. Preparing for the worst-case scenario by imagining how we might respond if the roles were reversed is good preparation. We could preemptively formulate answers and research the situation before giving our explanation. This will help us prepare for both what we hope will happen and what may surprise us.

We must prepare for short-form explanations in just a couple of minutes by asking ourselves three questions before an interaction that requires an explanation. This will help us include the right details and make the conversation more engaging. Even with short emails, it's good for everyone to understand our message clearly. We must make your first sentence and subject line engaging and use formatting to highlight important details. We must avoid adding recipients who don't need to be included in the conversation. The art of explanation is a multifaceted process that must be adaptable to our unique circumstances. It's like a home chef testing recipes from a cookbook, and we can choose which aspects of our life to transform first with the art of explanation.

Previous book summary: BookSummary107.docx

Summarizing Software: SummarizerCodeSnippets.docx  



Thursday, June 13, 2024

 This is a continuation of articles on IaC shortcomings and resolutions. The following article describes how data scientists leveraging the cloud infrastructure tend to think about individual files and archives rather than filesystems. Most data used by data scientists to train their models either lives in a remote blob store, filestore or some form of data store such as structured and unstructured databases and virtual data warehouses. Distributed file systems in operating systems and intercompatibility protocols between heterogeneous operating systems such as Linux and Windows have long addressed the problem of viewing remote file systems as local paths via mounts and mapped drives, yet the diligence to setup and tear down entire filesystems on local compute instances and clusters is often ignored. 

Part of the reason for such limited use of files and archives has been the popularity of signed URIs for remote files that facilitate sharing on a file-by-file basis as well as the adoption of new file formats like parquet and zip archives for convenient data transfer. When changes are made to these files, they often require unpacking and packing and one-time update at the remote location. 

With the convenience of BlobFuse2 technology, mounted file systems can persist changes to remote location near instantaneously and are available for blob stores just as much as the technology is available for file stores. BlobFuse is a virtual system driver for Azure Blob Storage. It can be used to access existing blob data through the Linux File system. Page blobs are not supported. It uses libfuse open-source library to connect to the Linux FUSE kernel module. It implements filesystem operations by using Azure Storage REST APIs. Local file caching improves subsequent access times. An azure blob container on a remote Azure Data Lake Storage Gen 2 file system is mounted on Linux and its activities and resource usage can be monitored. The version 2 provides more management support through the Command-Line Interface  

On the other hand, the Azure File Storage offers fileshares in the cloud using the standard SMB protocol. Enterprise applications that rely on fileservers can find this transition easier. File shares can be mounted even from virtual machines running in Azure and on-premises applications that support SMB 3.0.

To mount the file share from a virtual machine running Linux, an SMB/CIFS client needs to be installed and if the distribution does not have a built-in client, it can be installed with the cifs-utils package. Then a mount command can be specified to make a mount point by giving the type, remote location, options, and local path as parameters. Mount shares can be persisted across reboots by adding a setting in the /etc/fstab file.

Lastly, as with all cloud resources and operations, all activities can be logged and monitored. They come with role-based access control for one-time setup and control plane operations can be automated with command-line interface, REST API calls, user-interface automations, and Software Development Kits in various languages.

Previous write-up: IaCResolutionsPart135.docx


Wednesday, June 12, 2024

 


Problem::

Make Array Zero by Subtracting Equal Amounts

You are given a non-negative integer array nums. In one operation, you must:

Choose a positive integer x such that x is less than or equal to the smallest non-zero element in nums.

Subtract x from every positive element in nums.

Return the minimum number of operations to make every element in nums equal to 0.

 

Example 1:

Input: nums = [1,5,0,3,5]

Output: 3

Explanation:

In the first operation, choose x = 1. Now, nums = [0,4,0,2,4].

In the second operation, choose x = 2. Now, nums = [0,2,0,0,2].

In the third operation, choose x = 2. Now, nums = [0,0,0,0,0].

Example 2:

Input: nums = [0]

Output: 0

Explanation: Each element in nums is already 0 so no operations are needed.

 

Constraints:

1 <= nums.length <= 100

0 <= nums[i] <= 100


import java.util.*;

import java.util.stream.*;

class Solution {

    public int minimumOperations(int[] nums) {

        List<Integer> list = Arrays.stream(nums).boxed().collect(Collectors.toList());

        var nonZero = list.stream().filter(x -> x > 0).collect(Collectors.toList());

        int count = 0;

        while(nonZero.size() > 0) {

            var min = nonZero.stream().mapToInt(x -> x).min().getAsInt();

            nonZero = nonZero.stream().map(x -> x - min).filter(x -> x > 0).collect(Collectors.toList());

            count++;

        }

        return count;

    }

}


Input

nums =

[1,5,0,3,5]

Output

3

Expected

3


Input

nums =

[0]

Output

0

Expected

0




SQL Schema

 

Table: Books

+----------------+---------+

| Column Name    | Type    |

+----------------+---------+

| book_id        | int     |

| name           | varchar |

| available_from | date    |

+----------------+---------+

book_id is the primary key of this table.

 

Table: Orders

+----------------+---------+

| Column Name    | Type    |

+----------------+---------+

| order_id       | int     |

| book_id        | int     |

| quantity       | int     |

| dispatch_date  | date    |

+----------------+---------+

order_id is the primary key of this table.

book_id is a foreign key to the Books table.

 

Write an SQL query that reports the books that have sold less than 10 copies in the last year, excluding books that have been available for less than one month from today. Assume today is 2019-06-23.

Return the result table in any order.

The query result format is in the following example.

 

Example 1:

Input: 

Books table:

+---------+--------------------+----------------+

| book_id | name               | available_from |

+---------+--------------------+----------------+

| 1       | "Kalila And Demna" | 2010-01-01     |

| 2       | "28 Letters"       | 2012-05-12     |

| 3       | "The Hobbit"       | 2019-06-10     |

| 4       | "13 Reasons Why"   | 2019-06-01     |

| 5       | "The Hunger Games" | 2008-09-21     |

+---------+--------------------+----------------+

Orders table:

+----------+---------+----------+---------------+

| order_id | book_id | quantity | dispatch_date |

+----------+---------+----------+---------------+

| 1        | 1       | 2        | 2018-07-26    |

| 2        | 1       | 1        | 2018-11-05    |

| 3        | 3       | 8        | 2019-06-11    |

| 4        | 4       | 6        | 2019-06-05    |

| 5        | 4       | 5        | 2019-06-20    |

| 6        | 5       | 9        | 2009-02-02    |

| 7        | 5       | 8        | 2010-04-13    |

+----------+---------+----------+---------------+

Output: 

+-----------+--------------------+

| book_id   | name               |

+-----------+--------------------+

| 1         | "Kalila And Demna" |

| 2         | "28 Letters"       |

| 5         | "The Hunger Games" |

+-----------+--------------------+



SELECT DISTINCT b.book_id, b.name

FROM books b 

LEFT JOIN Orders o on b.book_id = o.book_id

GROUP BY b.book_id, b.name, 

DATEDIFF(day, DATEADD(year, -1, '2019-06-23'), o.dispatch_date),  

DATEDIFF(day,  b.available_from, DATEADD(month, -1, '2019-06-23')) 

HAVING SUM(o.quantity) IS NULL OR 

DATEDIFF(day, DATEADD(year, -1, '2019-06-23'), o.dispatch_date) < 0 OR 

(DATEDIFF(day, DATEADD(year, -1, '2019-06-23'), o.dispatch_date) > 0 AND DATEDIFF(day,  b.available_from, DATEADD(month, -1, '2019-06-23')) > 0 AND SUM(o.quantity) < 10);



Case 1

Input

Books =

| book_id | name | available_from |

| ------- | ---------------- | -------------- |

| 1 | Kalila And Demna | 2010-01-01 |

| 2 | 28 Letters | 2012-05-12 |

| 3 | The Hobbit | 2019-06-10 |

| 4 | 13 Reasons Why | 2019-06-01 |

| 5 | The Hunger Games | 2008-09-21 |

Orders =

| order_id | book_id | quantity | dispatch_date |

| -------- | ------- | -------- | ------------- |

| 1 | 1 | 2 | 2018-07-26 |

| 2 | 1 | 1 | 2018-11-05 |

| 3 | 3 | 8 | 2019-06-11 |

| 4 | 4 | 6 | 2019-06-05 |

| 5 | 4 | 5 | 2019-06-20 |

| 6 | 5 | 9 | 2009-02-02 |

| 7 | 5 | 8 | 2010-04-13 |

Output

| book_id | name |

| ------- | ---------------- |

| 2 | 28 Letters |

| 1 | Kalila And Demna |

| 5 | The Hunger Games |

Expected

| book_id | name |

| ------- | ---------------- |

| 1 | Kalila And Demna |

| 2 | 28 Letters |

| 5 | The Hunger Games |