Cluster computing

Tuesday, September 3, 2024

Container Image Scanning:

In a summary of the book titled "Effective Vulnerability Management”, we brought up how container images have become relevant in today’s security assessment. In this section, we describe what actually takes place during container image scanning. Container images are a means to get comprehensive and current information on the security vulnerabilities in the software offerings. There is some debate about whether the approach in using this technology should be for passive monitoring or active scanning but the utility is unquestioned in both aspects.

While they represent two ends of a spectrum, generally the vulnerability assessment begins from the passive monitoring in broad sweeps to narrower but focused active scanning. Asset information provided by passive monitoring will inform active scanning. Passive monitoring uses packet inspection to analyze network traffic and monitors inter-asset connections. Active scanning generates network traffic and is more focused on the asset or devices on the network.

Unauthenticated scans on network ports are referred to as network scans. They examine device from outside-in. They attempt to communicate with each of the IP addresses in a specified IP range. Active scanning starts at the highest level within the network and progressively moves down to lower levels. This step-down occurs in graded manner and over an evaluation period

When a scan is run, a container is seen as a form of layers. Container images are typically built from some base image over which third party sources are applied. These images and libraries may contain obsolete or vulnerable code. Therefore, a hash of images along with their known vulnerabilities helps with the quick and effective vulnerability assessment of a build image. Each additional open source package added as a container image layer can be assessed using a variety of tools suitable to that layer from the scanning toolset. Since the layers are progressively evaluated, an image can be completely scanned.

Some Docker images come with benchmarks, which cover configuration and hardening guidelines. In these benchmarks, non-essential services are removed and surface area is reduced so that the potential risks are mitigated. Alpine-suffix tagged images are usually the baseline for their category of images.

As with all asset management, images can also be classified as assets. Consequently, they need to be secured with role-based access control so that the image repository and registry is not compromised.

These salient features can be enumerated as steps with the following list:

1. Know the source and content of the images.

2. Minimize risks from the containers by removing or analyzing layers.

3. Reduce the surface area in images, containers and hosts

4. Leverage the build integration tools to do it on every image generation

5. Enforce the role segregation and access control for your Docker environment

6. Automate the detection actions and enforcement such as failing a build

7. Routinely examine the registries and repositories to prevent sprawl.

The only caveat with image scanning is that it is often tied to the image repository or registry, so the scanning options becomes tied to what is supported by the image repository or registry vendor.

#codingexercise

Problem: Count the number of ways to climb up the staircase and we can modify the number of steps at any time to 1 or 2

Solution: int getCount(int n)

{

int [] dp = new int[n+2];

dp [0] = 0;

dp [1] = 1;

dp [2] = 2;

for (int k = 3; k <= n; k++) {

dp [k] = dp [k-1] + dp [k-2];

}

return dp [n];

}

This is a summary of the book titled “Effective vulnerability management” written by Chris Hughes and Nikki Robinson and published by Wiley in 2024. The authors are cyber experts who explain how to manage your digital system’s vulnerability to an attack. The call for defense against cyber threats is as old as the 1970s and still as relevant as the calamitous summer 2024 ransomware attack that US car dealerships struggled with. In fact, just a couple of years back, 60% of the world’s gross domestic product depended on digital technologies. Asset management is crucial in protecting against digital vulnerability. Companies need a continuous, automated patch management protocol. Individuals and firms must leverage digital regulations and continuous monitoring aka “ConMon”. Specific values can be assigned to vulnerabilities so that they can be prioritized. Attackers generally exploit multiple vulnerabilities at once. Continuous vigilance requires human involvement. Open-source information can be used to determine threats.

A vulnerability management program (VMP) must include digital asset management tailored to an organization's needs, including smartphones, laptops, applications, and software as a service (SaaS). Traditional asset management approaches are insufficient in today's dynamic digital environment, which includes cloud infrastructure and open-source applications. Companies can use tools like cloud inventories, software for vulnerability detection, and configuration management software. Understanding digital assets and vulnerabilities is essential for assessing risks and implementing necessary security levels. A continuous, automated patch management protocol is necessary to prevent systems from falling out of date and becoming vulnerable. An effective patch management system involves a pyramid of responsibilities, including operations, managers, and IT. Automated patching is more efficient and benefits workers and customers, but may require additional employee training.

Digital regulations are essential for individuals and firms to protect against vulnerabilities in software and cloud services. Misconfigurations, errors, or inadequacy within information systems can lead to significant data breaches. Companies must adopt professionally designed guidelines to ensure the best security practices. Vulnerability management requires continuous monitoring and vigilance, as assets and configurations change over time. Malicious actors continuously seek to identify vulnerabilities, exploit weaknesses, and compromise vulnerable systems, software, and products.

Ongoing vulnerability management involves setting up a vulnerability management process, automating patch management, and performing vulnerability scans at regular intervals. Vulnerability scoring helps prioritize responses to potential harm. Most firms use the Common Vulnerability Scoring System (CVSS), which divides vulnerabilities into four categories: Base, Threat, Environmental, and Supplemental. The Exploit Prediction Scoring System (EPSS) enhances CVSS by providing information on the likelihood of a cybercriminal exploiting a particular vulnerability. However, bad actors exploit only 2% to 7% of vulnerabilities.

Cybersystem attackers exploit numerous vulnerabilities, with over half of corporate vulnerabilities dating back to 2016 or earlier. They can use older vulnerabilities to launch critical vulnerability chaining attacks, which can be direct or indirect. Cybersecurity professionals use open-source information to assess threat levels and generate alerts to identify and block attacks. There are four types of threat intelligence: technical, tactical, strategic, and operational.

Human involvement is crucial in managing vulnerabilities, as it helps organizations understand how users and IT practitioners interact with systems. Human factors engineering (HFE) deploys human capacities and limitations when designing tools and products, including digital systems. Cybersecurity professionals should be educated about human psychology to gain insights into cybercrime perpetrators and avoid fatigue and burnout.

Leaders should construct their organizations with security in mind, and firms must incorporate security into their initial development of systems and software. Engineers often develop software and digital systems without incorporating security measures in the development stage.

Monday, September 2, 2024

With the surge of data science and analytics projects, many data scientists are required to build a chatbot application for their data. This article covers some of the ways to do that. We assume that a workspace is used by these data scientists to bring their compute and data together. Let us say that this is a databricks workspace and the data in available via the catalog and delta lake and the compute cluster has been provisioned as dedicated to this effort. The example/tutorial we refer to is published by the Databricks official documentation but is compared with the ease of use of exporting the user interface to an app service.

Part 1.

The example for Databricks separates the model and the user interface in this way :

Step 1. Set up the environment:

%pip install transformers sentence-transformers faiss-cpu

Step 2. Load the data into a Delta table:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Chatbot").getOrCreate()

# Load your data

data = [

{"id": 1, "text": "What is Databricks?"},

{"id": 2, "text": "How to create a Delta table?"}

]

df = spark.createDataFrame(data)

df.write.format("delta").save("/mnt/delta/chatbot_data")

Step 3. Generate embeddings using a pre-trained model:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

texts = [row['text'] for row in data]

embeddings = model.encode(texts)

# Save embeddings

import numpy as np

np.save("/dbfs/mnt/delta/embeddings.npy", embeddings)

Step 4. Use FAISS to perform vector search over the embeddings.

import faiss

# Load embeddings

embeddings = np.load("/dbfs/mnt/delta/embeddings.npy")

# Create FAISS index

index = faiss.IndexFlatL2(embeddings.shape[1])

index.add(embeddings)

# Save the index

faiss.write_index(index, "/dbfs/mnt/delta/faiss_index")

Step 5. Create a function to handle user queries and return relevant responses.

def chatbot(query):

query_embedding = model.encode([query])

D, I = index.search(query_embedding, k=1)

response_id = I[0][0]

response_text = texts[response_id]

return response_text

# Test the chatbot

print(chatbot("Tell me about Databricks"))

Step 6. Deploy the chatbot as

Option a) Databricks widget

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")

if query:

response = chatbot(query)

print(f"Response: {response}")

else:

print("Please enter a query.")

Option b) a rest api

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chatbot', methods=['POST'])

def chatbot_endpoint():

query = request.json['query']

response = chatbot(query)

return jsonify({"response": response})

if __name__ == '__main__':

app.run(host='0.0.0.0', port=5000)

Step 7. Test the API:

For option a) use the widgets to interact with the notebook:

# Display the widgets

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")

if query:

response = chatbot(query)

displayHTML(f"<h3>Response:</h3><p>{response}</p>")

else:

displayHTML("<p>Please enter a query.</p>")

For option b) make a web request:

curl -X POST http://<your-databricks-url>:5000/chatbot -H "Content-Type: application/json" -d '{"query": "Tell me about Databricks"}'

Part 2.

The example for app service leverages the following query and user interface in this way:

The code hosting the model and completing the results of the query comprises of the following:

import openai, os, requests

openai.api_type = "azure"

# Azure OpenAI on your own data is only supported by the 2023-08-01-preview API version

openai.api_version = "2023-08-01-preview"

# Azure OpenAI setup

openai.api_base = "https://azai-open-1.openai.azure.com/" # Add your endpoint here

openai.api_key = os.getenv("OPENAI_API_KEY") # Add your OpenAI API key here

deployment_id = "mdl-gpt-35-turbo" # Add your deployment ID here

# Azure AI Search setup

search_endpoint = "https://searchrgopenaisadocs.search.windows.net"; # Add your Azure AI Search endpoint here

search_key = os.getenv("SEARCH_KEY"); # Add your Azure AI Search admin key here

search_index_name = "undefined"; # Add your Azure AI Search index name here

def setup_byod(deployment_id: str) -> None:

"""Sets up the OpenAI Python SDK to use your own data for the chat endpoint.

:param deployment_id: The deployment ID for the model to use with your own data.

To remove this configuration, simply set openai.requestssession to None.

"""

class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter):

def send(self, request, **kwargs):

request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"

return super().send(request, **kwargs)

session = requests.Session()

# Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id`

session.mount(

prefix=f"{openai.api_base}/openai/deployments/{deployment_id}",

adapter=BringYourOwnDataAdapter()

)

openai.requestssession = session

setup_byod(deployment_id)

message_text = [{"role": "user", "content": "What are the differences between Azure Machine Learning and Azure AI services?"}]

completion = openai.ChatCompletion.create(

messages=message_text,

deployment_id=deployment_id,

dataSources=[ # camelCase is intentional, as this is the format the API expects

{

"type": "AzureCognitiveSearch",

"parameters": {

"endpoint": search_endpoint,

"key": search_key,

"indexName": search_index_name,

}

]

)

print(completion)

The user interface is simpler with code to host the app service as a react web app:

npm install @typebot.io/js @typebot.io/react

import { Standard } from "@typebot.io/react";

const App = () => {

return (

<Standard

typebot="basic-chat-gpt-civ35om"

style={{ width: "100%", height: "600px" }}

);

};

This concludes the creation of a chatbot function using the workspace.

Sunday, September 1, 2024

This is a summary of the book titled “Small Data: The Tiny clues that uncovers Huge Trends” written by Martin Lindstrom and published by St. Martin’s Press in 2017. What Sherlock Holmes was to clues solving a mystery, Martin Lindstrom strives to be that investigator for interpreting the buying preferences of individuals. As a marketing expert, he uses this to help individuals be more objective about their own preferences while empowering brands to understand customers’ unfulfilled and unmet desires. While data privacy advocates may balk at the data being scrutinized, the author teaches how small data can uncover insights in a set of 7 steps. Paying attention to cultural imbalances in people’s lives, freedom to be oneself, embodying one’s perspectives and owning universal moments help customers articulate their desires and demands. Small data helps to understand people’s desire motivated “twin-selves”. Then the narrative can be tuned to help customers connect with brands.

Small data researchers can uncover insights into consumer desires that big data misses. As an adviser for Lego, Martin used ethnographic insights from a 11-year-old German boy to inform its strategy, reducing the size of its building bricks and increasing the demands of Lego construction challenges. By 2014, Lego had become the largest global toy maker, surpassing Mattel. Small data can include habits, preferences, gestures, hesitations, speech patterns, decor, and online activity.

Small data can also reveal cultural imbalances that indicate what is missing in people's lives. For example, in the Russian Far East, colorful magnets covered refrigerator doors, symbolizing foreign travel, escape, and freedom. This led to the concept for Mamagazin – Mum’s Store, an e-commerce platform built for and by Russian mothers.

Freedom to be yourself is the greatest untapped American desire. Lindstrom helped Lowes Foods conceive a new strategy for stores in North Carolina and South Carolina, revealing that Americans value security and are often fearful. He concluded that freedom was not prevalent in everyday US culture, making it an untapped desire.

Lindstrom's marketing strategies have been successful in connecting with customers and addressing their unique needs. He helped a global cereal company understand why young women were not buying its top-selling breakfast brand by observing the tense relationships between Indian women and their mothers-in-law. He created a cereal box with two different color palettes, featuring earth tones for taller women and bright colors for mothers-in-law. Lindstrom also appealed to people's tribal need to belong during transformational times, using the Asian custom of passing items of worth to customers. This strategy increased customer retention rates. Lindstrom also tapped into the tribal need of tween and teenage girls by revising the strategy of Switzerland-based fashion brand Tally Weijl. He created a Wi-Fi-enabled smart mirror for young shoppers to share outfit photos on Facebook, allowing others to virtually vote on their choices.

Lindstrom leveraged the concept of "entry points" to boost customer retention rates in various industries. He used the concept of weight loss as a transformational moment to present free charm bracelets to dieters, symbolizing success, experience, and tribal belonging. He also tapped into the desire-motivated "Twin Selves" of consumers, which are those who desire things they once dreamed of but lost or never had. These contexts or experiences influence behavior by prompting individuals to become someone or something else. For example, he created a live-streamed event on a floating island to embody happier, sexier, and freer versions of himself. He also used the “Twin-Self” concept to create a brand image for a Chinese car, focusing on the driver's Twin Selves and creating a powerful, fast, and male car.

The power of narrative can help consumers connect with brands, as demonstrated by Steve Jobs' redesign of Tally Weijl and Devassa's use of brand ambassadors. By creating cohesive narratives, brands can resonate with consumers' stories about themselves, allowing them to resonate with their target audience. To conduct subtext research, follow the "7C's": collect baseline perspectives, focus on clues, connecting, cause, correlation, compensation, and concept. By understanding the emotions and shifts in consumer behavior, brands can better understand their target audience and develop strategies to compensate for what they feel their lives lack. By cultivating a more objective understanding of their inner motivations and desires, brands can better assess those of others, ultimately fostering a stronger connection with their customers.

Saturday, August 31, 2024

A self organizing map algorithm for scheduling meeting times as availabilities and bookings. A map is a low-dimensional representation of a training sample comprising of elements e. It is represented by nodes n. The map is transformed by a regression operation to modify the nodes position one element from the model (e) at a time. With preferences translating to nodes and availabilities as elements, this allows the map to start getting a closer match to the sample space with each epoch/iteration.

from sys import argv

import numpy as np

from io_helper import read_xyz, normalize

from neuron import generate_network, get_neighborhood, get_boundary

from distance import select_closest, euclidean_distance, boundary_distance

from plot import plot_network, plot_boundary

def main():

if len(argv) != 2:

print("Correct use: python src/main.py <filename>.xyz")

return -1

problem = read_xyz(argv[1])

boundary = som(problem, 100000)

problem = problem.reindex(boundary)

distance = boundary_distance(problem)

print('Boundary found of length {}'.format(distance))

def som(problem, iterations, learning_rate=0.8):

"""Solve the xyz using a Self-Organizing Map."""

# Obtain the normalized set of timeslots (w/ coord in [0,1])

timeslots = problem.copy()

# print(timeslots)

#timeslots[['X', 'Y', 'Z']] = normalize(timeslots[['X', 'Y', 'Z']])

# The population size is 8 times the number of timeslots

n = timeslots.shape[0] * 8

# Generate an adequate network of neurons:

network = generate_network(n)

print('Network of {} neurons created. Starting the iterations:'.format(n))

for i in range(iterations):

if not i % 100:

print('\t> Iteration {}/{}'.format(i, iterations), end="\r")

# Choose a random timeslot

timeslot = timeslots.sample(1)[['X', 'Y', 'Z']].values

winner_idx = select_closest(network, timeslot)

# Generate a filter that applies changes to the winner's gaussian

gaussian = get_neighborhood(winner_idx, n//10, network.shape[0])

# Update the network's weights (closer to the timeslot)

network += gaussian[:,np.newaxis] * learning_rate * (timeslot - network)

# Decay the variables

learning_rate = learning_rate * 0.99997

n = n * 0.9997

# Check for plotting interval

if not i % 1000:

plot_network(timeslots, network, name='diagrams/{:05d}.png'.format(i))

# Check if any parameter has completely decayed.

if n < 1:

print('Radius has completely decayed, finishing execution',

'at {} iterations'.format(i))

break

if learning_rate < 0.001:

print('Learning rate has completely decayed, finishing execution',

'at {} iterations'.format(i))

break

else:

print('Completed {} iterations.'.format(iterations))

# plot_network(timeslots, network, name='diagrams/final.png')

boundary = get_boundary(timeslots, network)

plot_boundary(timeslots, boundary, 'diagrams/boundary.png')

return boundary

if __name__ == '__main__':

main()

Reference:

https://github.com/raja0034/som4drones

#codingexercise

https://1drv.ms/w/s!Ashlm-Nw-wnWhPBaE87l8j0YBv5OFQ?e=uCIAp9

This is a summary of the book titled the “ESG Mindset” written by Matthew Sekol and published by Kogan Page in 2024. The author evaluates “Enterprise, Social and Governance” aka ESG practices for long-term sustainability of corporations and their challenge to the corporate culture The author finds that deployments can raise issues which might affect transformation and growth, and most companies interpret these practices to suit their needs. This poses a challenge to even a standard definition and acceptance of associated norms. Leaders also are quick to get at the intangible behind these practices by cutting them to the simplest form which risks diluting its relevance. The author concludes that to realize ESG mindset fully, the companies must be committed to go all the way. He asserts that these practices are not merely data. But technology is the invisible “fourth” pillar in ESG. There is demonstrated success in the campaigns of companies that have embraced ESG, but the mindset goes beyond operations. As with most practices in the modern world, ESG must remain flexible.

Environmental, Social, and Governance (ESG) practices are rooted in Corporate Social Responsibility (CSR) and Socially Responsible Investing (SRI) but differentiated itself by 2004 with its broader definition of "material value" and willingness to deal with intangibles. ESG is difficult to define as it links intangible values with material results. Companies must align their ESG mindset to manage crises in an increasingly complex world. ESG is not merely data, but requires companies to prioritize, interpret, and communicate their data to stakeholders. Companies must inventory their "data estate" by reviewing internal and external data sets to ensure transparency and sustainability. Challenges faced by companies include global emissions increasing by 70% between 1970 and 2004, climate change, and public pressure from stakeholders. Publicly traded companies can provide guidelines on how their boards make decisions, including those involving ESG or affecting stakeholders.

Globalization has led to systemic issues such as child welfare, climate change, forced labor, equity, and justice, resulting in crises. Boards must shift their decision-making practices from short-term to long-term to pursue their material goals. Technology, such as blockchain, the metaverse, and generative AI, can support ESG transformation by solving problems and facilitating goals. However, companies must modernize legacy technology, break down internal silos, and solve complex cultural fears of change. Technology also produces data that is integral to ESG analysis and decision-making, but it exposes companies to cybersecurity risks. Critics and controversy can hinder ESG, especially in the United States, where polarization and activism from both the left and right complicate the issues ESG already faces. Companies must collaborate to ensure ESG's relevance and address the accuracy and fairness of ESG scores.

ESG pillars interconnect and can be analyzed to uncover new issues and improve resilience in a crisis. Companies must recognize that long-term interconnected crises will become material to every company over time. Changes addressing systemic problems can influence both internal workings and external stakeholders. Companies like PepsiCo, Lego, and Target have successfully leveraged their investment in ESG goals in various ways. PepsiCo founded the Beverage Industry Environmental Roundtable (BIER) to address systemic industry issues, particularly around water use. Lego committed to switching to sustainable materials by 2030, while Target leveraged the social pillar of ESG by hiring a diverse workforce and practicing community outreach. Paramount aligned stakeholder engagement with its core product, storytelling, demonstrating its commitment to addressing systemic issues with an ESG mindset. The ESG mindset goes beyond operations, as large-scale disruptions in the Environmental and Social dimensions may leave businesses struggling to react. Companies can leverage their ESG goals while remaining profitable through B Corps, value chain improvements, and industry collaboration.

ESG must adapt to a complex and volatile world, addressing systemic issues, intangible value, and global economic development. Companies must move from the following data to promoting measurable change. Technology can help address complexity but requires stakeholder buy-in and coordination. Companies face pressure to standardize ESG goals, define the ESG mindset, and demonstrate how to implement it, especially in the face of political agendas and pushback against DEI programs.

It is interesting that there can be so many parallels to draw between organizations and data science projects from an ESG perspective. The same sets of benefits and challenges apply to the long-term sustainability of these projects and charters. It is not just about the analysis, operations and predictions but also how it is presented to stakeholders.

Friday, August 30, 2024

DevOps for IaC

As with any DevOps practice, the principles on which they are founded must always include a focus on people, process, and technology. With the help of Infrastructure-as-a-code and blueprints, resources, policies, and accesses can be packaged together and become a unit of provisioning the environment.

The DevOps Adoption RoadMap has evolved over time. What used to be Feature Driven Development around 1999 gave way to Lean thinking and Lean software development around 2003, which was followed by Product development flows in 2009 and Continuous Integration/Delivery in 2010. The DevOps Handbook and the DevOps Adoption Playbook are recent as of the last 5-6 years. Principles that inform practices that resolve challenges also align accordingly. For example, the elimination of risk happens with automated testing and deployments, and this resolves the manual testing, processes, deployments, and releases.

The people involved in bringing build and deployments to the cloud and making use of them instead of outdated and cumbersome enterprise systems must be given roles and clear separation of responsibility. For example, developers can initiate the promotion of code package to the next environment but only a set of people other than the developers must allow it to propagate to production systems and with signoffs. Fortunately, this is well-understood and there is existing software such as ITSM, ITBM, ITOM and CMDB. These are fancy acronyms for situations such as:

1. If you have a desired state you want to transition to, use a workflow,

2. If you have a problem, open a service ticket.

3. If you want orchestration and subscribe to events, use events monitoring and alerts.

4. If you want a logical model of the inventory, use a configuration management database.

Almost all IT businesses are concerned about ITOM such as with alerts and events, ITSM such as with incidents and service requests, and intelligence in operations. The only difference is that they have not been used or made available for our stated purposes, but this is still a good start.

The process that needs to be streamlined is unprecedented at this scale and sensitivity. The unnecessary control points, waste and overhead must be removed, and usability must be one of the foremost considerations for improving adoption.

The technology is inherently different between cloud and enterprise. While they have a lot in common when it comes to principles of storage, computing and networking, the division and organization in the cloud has many more knobs and levers that require due diligence.

These concerns around people, process and technology are what distinguishes and makes this landscape so fertile for improvements.