Cluster computing

Wednesday, July 24, 2024

The shift from dbms to catalogs is already underway. Earlier, the databases were the veritable access grantors but with heterogenous data stores, this has shifted to catalogs like the Unity Catalog for databricks and the Horizon catalog for Snowflake. This is a deliberate attempt from the perspective of these platforms even though they fight for their ecosystems. The end-users and the organizations that empower them are rapidly making this shift themselves.

For example, the Databricks Unity Catalog offers centralized access control, auditing, lineage, and data discovery capabilities across multiple Databricks workspaces. It includes user management, metastore, clusters, SQL warehouses, and a standards-compliant security model based on ANSI SQL. The catalog also includes built-in auditing and lineage, allowing for user-level audit logs and data discovery. The metadata store is a top-level container, while the data catalog has a three-level namespace namely catalog.schema.table. The catalog explorer allows for creation of tables and views, while the tables of views and volumes provide governance for nontabular data. The catalog is multi-cloud friendly, allowing for federation across multiple cloud vendors and unified access. The idea here is that you can define once and secure anywhere.

Databricks Unity Catalog consists of a metastore and a catalog. The metastore is the top-level logical container for metadata, storing data assets like tables or models and defining the namespace hierarchy. It handles access control policies and auditing. The catalog is the first-level organizational unit within the metastore, grouping related data assets and providing access controls. However, only one metastore per deployment is used. Each Databricks region requires its own Unity Catalog metastore.

There is a Unity catalog quick start notebook in Python. The key steps include creating a workspace with the Unity Catalog meta store, creating a catalog, creating a managed schema, managing a table, and using the Unity catalog in the Pandas API on Spark. The code starts with creating a catalog, selecting show, and then creating a managed schema. The next step involves creating and managing schemas, extending them, and granting permissions. The table is managed using the schema created earlier, and the table is shown and all available tables are shown. The final step involves using the Pandas API on Spark, which can be found in the official documentation for Databricks. This quick start is a great way to get a feel for the process and to toggle back and forth with the key steps inside the code.

The Unity Catalog system employs object security best practices, including access control lists (ACLs) for granting or restricting access to specific users and groups on securable objects. ACLs provide fine-grain control, ensuring access to sensitive data and objects. Less privilege is used, limiting access to the minimum required, avoiding broad groups like All Users unless necessary. Access is revoked once the purpose is served, and policies are reviewed regularly for relevance. This technique enhances data security and compliance, prevents unnecessary broad access, and controls a blast radius in case of security breaches.

The Databricks Unity Catalog system offers best practices for catalogs. First, create a separate catalog for loose coupling, managing access and compliance at the catalog level. Align catalog boundaries with business domains or applications, such as marketing analytics or HR. Customize security policies and governance within the catalog to drill down into specific domains. Create access control groups and roles specific to a catalog, fine-tune read-write privileges, and customize settings like resource quotas and scrum rules. These fine-grain policies provide the best of security and functionality in catalogs.

To ensure security and manage external connections, limit visibility by granting access only to specific users, groups, and roles, and setting lease privileges. Limit access to only necessary users and groups using granular access control lists or ACLs. Be aware of team activities and avoid giving them unnecessary access to external resources. Tag connections effectively for discovery using source categories or data classifications, and discover connections by use case for organizational visibility. This approach enhances security, prevents unintended data access, and simplifies external connection discovery and management.

Databricks Unity Catalog Business Unit Best Practices emphasize the importance of providing dedicated sandboxes for each business unit, allowing independent development environments, and preventing interference between different workflows. Centralizing shareable data into production catalogs ensures consistency and reduces the need for duplicate data. Discoverability is crucial, with meaningful naming conventions and metadata best practices. Federated queries via Lakehouse architecture unify data access across silos, governing securely via contracts and permissions. This approach supports autonomy for units, increases productivity through reuse, and maintains consistency with collaborative governance. This approach supports autonomy, increases productivity, and maintains consistency.

In conclusion, the Unity catalog standard allows centralized data governance and best practices for catalogs, connections, and business units.

https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html#enable-workspace

https://docs.databricks.com/en/data-governance/unity-catalog/create-metastore.html

Tuesday, July 23, 2024

This is a summary of the book titled “Active Listening Techniques – 30 Practical tools to hone your Communication Skills” written by Nixaly Leonardo and published by Callisto in 2020. The author offers insights into active listening building off a decade of social work. She covers listening skills such as mindfulness, empathy, non-verbal cues, and effective questioning techniques – all of which lead to a deeper understanding of others. Her five-point agenda includes empathizing with others before interacting, being aware of the tensions so as to respond not react, acknowledging one’s negative emotions, involving loved ones in the journey, writing journal entries about our reactions and being aware of this emotional state. This will help us adjust our communications and persuading others by acknowledging their needs, projecting confidence, and choosing the right words, dealing with stressful situations by validating other people’s emotions, easing tension, and refocusing the conversations.

Active listening is a crucial communication skill that involves paying attention, understanding people's emotions, and giving time for others to talk. It is applied in various situations, including work, personal relationships, and therapy. Active listening helps individuals feel supported and heard, and it demonstrates respect for others. To improve communication skills, seven fundamentals can be applied: paraphrasing, using nonverbal language, emotional labeling, silence, redirection, mirroring, and validating.

Paraphrasing involves restating what someone says to ensure understanding, while nonverbal cues like eye contact, gestures, posture, and facial expressions help convey the message. Emotional labeling involves noticing and repeating what others feel, while silence allows for time to think and express thoughts without interruption. Redirecting the conversation back to the original topic helps maintain direction and reduce tension. Mirroring involves faking the speaker's body language and tone of voice to create a sense of connection and rapport. Validating others' emotions allows them to experience their emotions and hold their beliefs, making them feel understood and supported.

Active listening involves being present and mindful during conversations, ignoring distractions and staying open-minded. It helps us accept that we all experience negative emotions and stress and understand how our experiences shape our perceptions and interpretations of others' messages. To challenge and move through assumptions, empathize with others, be aware of tension, apologize when you react negatively, involve loved ones, and write journal entries about your reactions.

Be aware of your emotional state during conversations, as strong emotions can interfere with attentive listening. Adjust your communication to ensure others hear and understand you, considering other people's communication styles and preferences. Navigate situations tactfully by asking questions instead of directly challenging your supervisor's idea, describing or praise their vision, and seeking details to address your concerns without undermining their creativity or judgment.

Know your audience wisely, choosing when and where to raise critical issues and choosing the appropriate mode of communication. Electronic communication such as texting and email can be more effective than face-to-face conversations. By following these steps, you can become a better active listener and maintain a productive dialogue.

Persuasion involves acknowledging others' needs, projecting confidence, and choosing the right words. It is a matter of giving and taking and understanding why someone might not agree with your viewpoint is crucial. Acknowledging their needs helps build respect and build a stronger bond. Using precise language is essential in handling sensitive situations, avoiding hurting others and conveying your intended message. Confidence is key, so pretending to be confident can help.

To deal with stressful situations, validate others' emotions, easing tension, and refocusing the conversation. Addressing emotional concerns fosters stronger connections and genuine conversations. Calming others can ease tensions by recognizing escalating situations, lowering your tone, seeking clarification, taking responsibility for your contribution, and addressing the speaker's concerns. If tensions continue to rise, repeat the steps, or suggest a break. Set boundaries and communicate potential consequences if the conversation escalates.

When a conversation goes awry, refocus on the original subject to avoid defensiveness and avoid resolving the issue. Address communication challenges by rephrasing statements, acknowledging shifts, asking for thoughts, and validating the listener's feelings. This ensures both parties hear and understand each other, preventing a recurrence of arguments. By following these steps, you can ensure effective communication.

Summarizing Software: SummarizerCodeSnippets.docx

Monday, July 22, 2024

The well-known Knuth-Morris-Pratt algorithm.

This algorithm can be explained in terms of the sequence matching between input and patterns this way:

void KMP(string pattern, string text, vector<int> *positions) {

int patternLength = pattern.length();

int textLength = text.length();

int* next = PreProcess(pattern);

if (next == 0) return;

int i = 0;

int j = 0;

while ( j < textLength )

{

while(true)

if (text[j] == pattern[i]) //matches

{

i++; // yes, move on to the next state

if (i == patternLength) // maybe that was the last state

{

// found a match;

positions->push_back(j-(i-1));

i = next[i];

}

break;

}

else if (i == 0) break; // no match in state j = 0, give up

else i = next[i];

j++;

}

int* PreProcess( string pattern) {

int patternLength = pattern.length();

if (patternLength == 0) return 0;

int * next = new int[patternLength + 1];

if (next == 0) return 0;

next[0] = -1; // set up for loop below; unused by KMP

int i = 0;

int j = -1;

// next[0] = -1;

// int len = pattern.length();

while (i < patternLength) {

next[i + 1] = next[i] + 1;

while ( next[i+1] > 0 &&

pattern[i] != pattern[next[i + 1] - 1])

next[i + 1] = next[next[i + 1] - 1] + 1;

i++;

}

return next;

}

Usage: DroneDataAddition.docx

Sunday, July 21, 2024

Knuth-Morris-Pratt method of string matching

Public void KMP-Matcher(String text, String pattern) {

Int n = text.length();

Int m = pattern.length();

Int[] prefixes = ComputePrefixFunction(pattern);

Int noOfCharMatched = 0;

for ( int I = 1; I <= n; I++) {

While (noOfCharMatched > 0 && pattern[noOfCharMatched + 1] != Text[I])

NoOfCharMatched = prefixes[nofOfCharMatched]

If (pattern[noOfCharMatched + 1] == text[I])

NoOfCharMatched = NoOfCharMatched + 1;

If (noOfCharMatched == m) {

System.out.println(“Pattern occurs at “ + I);

NoOfCharMatched = prefixes[NoOfCharMatched];

}

Public int[] ComputePrefixFunction(String pattern) {

Int m = pattern.length();

Int[] prefixes = new int[m+1];

Prefixes[1] = 0;

Int k = 0;

For (int q = 2; q <=m ; q++) {

While (k > 0 && Pattern[k + 1] != Pattern[q])

K = pattern[k];

If (pattern[k+1] == Pattern[q]) {

K = k + 1;

}

Pattern[q] = k;

}

Return prefixes;

}

Saturday, July 20, 2024

The steps to create a machine learning pipeline in Azure Machine Learning Workspace:

1. Create an Azure Machine Learning Workspace:

○ If you don't have one already, create an Azure Machine Learning workspace. This serves as the central hub for managing your machine learning resources.

2. Set Up Datastores:

○ Datastores allow you to access data needed in your pipeline. By default, each workspace has a default datastore connected to Azure Blob storage. You can register additional datastores if necessary [4].

3. Define Your Pipeline Steps:

○ Break down your ML task into manageable components (steps). Common steps include data preparation, model training, and evaluation.

○ Use the Azure Machine Learning SDK to create these steps. You can define them as PythonScriptStep or other relevant step types.

4. Configure Compute Targets:

○ Set up the compute targets where your pipeline steps will run. Options include Azure Machine Learning Compute, Azure Databricks, or other compute resources.

5. Orchestrate the Pipeline:

○ Use the Azure Machine Learning pipeline service to automatically manage dependencies between steps.

○ Specify the order in which steps should execute and how they interact.

6. Publish the Pipeline:

○ Once your pipeline is ready, publish it. This makes it accessible for later use or sharing with others.

7. Monitor and Track Performance:

○ Monitor your pipeline's performance in real-world scenarios.

○ Detect data drift and adjust your pipeline as needed.

This workspace provides an environment to create and manage the end-to-end life cycle of Machine Learning models. Unlike general purpose software, Azure machine learning has significantly different requirements such as the use of a wide variety of technologies, libraries and frameworks, separation of training and testing phases before deploying and use of a model and iterations for model tuning independent of the model creation and training etc. Azure Machine Learning’s compatibility with open-source frameworks and platforms like PyTorch and TensorFlow makes it an effective all-in-one platform for integrating and handling data and models which tremendously relieves the onus on the business to develop new capabilities. Azure Machine Learning is designed for all skill levels, with advanced MLOps features and simple no-code model creation and deployment.

Friday, July 19, 2024

Sample program to count the number of different triplets (a, b, c) in which a occurs before b and b occurs before c from a given array.

Solution: Generate all combinations in positional lexicographical order for given array using getCombinations method described above. Select those with size 3. When selecting the elements, save only their indexes, so that we can determine they are progressive.

class solution {

public static void getCombinations(List<Integer> elements, int N, List<List<Integer>> combinations) {

for (int i = 0; i < (1<<N); i++) {

List<Integer> combination = new ArrayList<>();

for (int j = 0; j < elements.size(); j++) {

if ((i & (1 << j)) > 0) {

combination.add(j);

}

List<Integer> copy = new ArrayList<Integer>(combination);

combinations.add(copy);

}

public static void main (String[] args) {

List<Integer> elements = Arrays.asList(1,2,3,4);

List<List<Integer>> indices = new ArrayList<Integer>();

getCombinations(elements, elements.size(), indices);

indices.stream().filter(x -> x.size() == 3)

.filter(x -> x.get(0) < x.get(1) && x.get(1) < x.get(2))

.forEach(x -> printList(elements, x));

}

public static void printList(List<Integer> elements, List<Integer> indices) {

StringBuilder sb = new StringBuilder();

for (int i = 0; i < indices.size(); i++) {

sb.append(elements.get(indices.get(i)) + " ");

}

System.out.println(sb.toString());

}

/* sample output:

1 2 3

1 2 4

1 3 4

2 3 4

Thursday, July 18, 2024

This is a summary of the book titled “The Canary Code: A guide to Neurodiversity, Dignity and Intersectional Belonging at work” written by Ludmila Praslova and published by Berrett-Koehler in 2024. This book is about how to foster an inclusive workplace that celebrates neurodiversity and intersectional dignity and where everyone feels valued and respected. Neurodivergent people with conditions such as autism spectrum disorder, attention deficit disorder, dyslexia, or obsessive-compulsive disorder that impacts the way brain processes information, have suffered to keep up with the rest of the workforce because they don’t fit the status quo. The flexibility provided during Covid-19 came to their rescue and more steps can be taken to promote inclusivity such as hiring and onboarding them by understanding their needs, heeding their input on office space design and workflow flexibility. Leaders should create psychologically safe workplaces by listening, communicating clearly, and aiming for objective performance reviews. When these same employees become leaders, they could pass on the same benefits to others.

Inclusive workplaces promote diverse interaction, communication, and productivity styles by challenging neuronormative standards. Neurodiversity acknowledges the vast variations in human cognition, emotion, and perception, including conditions like ADHD, autism, and dyslexia. Myths about neurodiversity perpetuate exclusion in the workplace, as they lump people together while ignoring neurodivergent needs and preferences. To enable neurodivergent employees to do their best work, organizations must create flexible, inclusive environments that respect individual differences in social, cognitive, emotional, and sensory needs.

The "Canary Code" framework promotes inclusivity for neurodivergent employees. It emphasizes the importance of involving marginalized employees in decision-making, focusing on outcomes, ensuring flexibility, promoting organizational justice, and maintaining transparency, and using appropriate decision-making tools. By adopting these principles, organizations can create a more productive and inclusive workplace.

Companies like Deloitte, Infinite Flow, Legalite, Call Yachol, Ultranauts, and Dell have implemented these principles to ensure a diverse workforce. These practices have improved onboarding processes, engagement, and overall performance. Companies like Dell have also implemented neurodiversity programs, allowing candidates to showcase their abilities. Overall, these principles promote a more inclusive and productive workplace.

To make hiring and onboarding more inclusive, organizations should understand the needs of neurodivergent employees and conduct thorough analyses to ensure job descriptions accurately reflect the position's requirements. This includes separating essential qualifications from desirable ones, using plain language, and focusing on outcomes rather than methods. Onboarding should integrate new employees into the organization, offering a quality "preboarding" experience, providing clear information, and tailoring training methods.

Inclusive office spaces should accommodate a wide range of sensory, physical, and cognitive needs, with employees' input in the design process. Flexible work arrangements, such as flexible schedules, remote work options, and hybrid models, can enhance productivity for neurodivergent employees. Psychologically safe workspaces should be created by listening, communicating clearly, and aiming for objective performance reviews. A toxic work environment features non-inclusive, disrespectful, unethical, cutthroat, and abusive behaviors, which can negatively impact all employees' well-being and performance. By implementing these strategies, organizations can create a more inclusive and supportive work environment for all employees.

Neurodivergent leaders can create more inclusive workspaces by overcoming myths and stigmas that limit recognition and development of diverse leadership talents. By leveraging individual strengths, creating growth tracks, and fostering a culture that values diverse perspectives, companies can unlock innovation, improve morale, and build more resilient leadership teams. Neurodivergent leaders can overcome biases by embracing unique experiences, fostering empathy, and fostering inclusivity within their teams. By remaining authentic, trusting in their unique perspective, and focusing on transparent communication, neurodivergent leaders can inspire others and promote a culture of acceptance and understanding.