Saturday, March 16, 2024

Some of the shortcomings in Infrastructure-as-code come directly from the resources in the public cloud that they manage. For example, a resource like key-vault allows access control via both role-based access control as well as access control policy and the IaC can specify but there is no built-in feature to allow selective access to secrets between users who are all granted access. Compare this with file-level security and additional checks can be placed on the individual files for enabling individual access control. This is a write-up on how to enable fine grained resource access control using row level security as borrowed from databases. It might be useful to consider non-differentiable resources with path-qualified files as names for their naming convention. Next, consider that we want to ACL them in the storage much the same way as we do files/folders on Windows with each file system entry pertaining to a key in the key-vault. Specifically, we allow for different permissions, grantee and even inheritance on resources/sub-resources that do not have primary built-in access control.

How it works

Row level security in database is defined in the online documentation and comes with a readymade schema for such use - please see reference. In our case, we want to add a column/attribute/tag to our resource that is equivalent to the type of control that we want to apply to this resource. It could be a user name, or it could be a label or a foreign key to a set of markings that represent the sensitivity or the permissions of the data.
Keeping track of usernames, for example, is helpful in the case where all accesses are directly isolated per user. In other words, users do not have to share the ownership of the folders and that their accesses are to manage the lifetime of their folders
We can also create a label for each resource if we know that the labels are distinct and symbolic.  In most cases where the operations are predefined, but their actors are not, we can use different labels. For example, we can say whether the resource is read-only or read-write.
However, users and their access are both dynamic and they can be hierarchical. In this case, hierarchical means that entries can inherit top down or gain incremental privileges bottom up. In addition, a user may grant or revoke permission, take ownership, etc.
In such a case, it is better to view the labels as an output resulting from a schema of categories and markings.
The categories and markings are predefined in a system. The categories represent a domain of markings. This can be a classification that is hierarchical or compartment that are distinct, they can be one-to-one or zero-to-many in how many values can be applied and may have a comparison rule - such as any or all.  The markings are the compartment or classification values we would like to division the resources into. Resources can belong to one or more of these. They can have a name.

We may have only one category and one compartment. Categories can be hierarchical as in our case, but compartments are mutually exclusive. Note that the default or guest low privileged access corresponding to public marking may not be sufficient for security provisioning of all out of box features and hence it may need to be split or refined into more classifications. The classification hierarchy is expressed in the marking hierarchy table as opposed to the marking table. Next, we have the unique label table that assigns a unique label to a combination of markings and roles.
Database roles will be at least one for each value of any or all comparison rule of a non-hierarchical category. For hierarchical categories, again there will be one for each value, but the roles will also be nested. Some examples of roles are guest, dev, test, production support, reporting, owners, administrator, security administrator etc.
When using label-based security model, it is important to note that the labels are assigned directly to each row of the base table. The labels are small, often a short byte or an integer and can have a non-clustered index on them. Sometimes, tags are not kept together in a base table but in a junction table of the base identifier and the label marking identifier. The reason this does not scale well is because it creates a duplicate column of the identifier. Moreover, if the identifier column is of type guid, then we cannot even build an index on them, and performance can suffer with scanning and full comparison between guids. Therefore, it is advisable to spend time and effort one time to add labels at the row level directly to the base table.
A particular combination of markings could translate to a unique label for the resource. Usually, this table is populated on demand by say a stored procedure. If there is no corresponding row to the combination of markings, then a new row is added, and a unique ID is returned as the label.
We join the label and mark in a LabelMarking table to facilitate the translation of the labels.
A combination of markings can make up a permission.

We continue this a little bit more by enumerating the access permissions as Full Control, Modify, Read & Execute, List folder contents, Read, Write and Special Permissions. We also define the built-in users and groups. Note that these permissions directly translate to the classifications mentioned earlier and hence the markings. Permissions can be granted or denied. Therefore, the markings must allow either of those states. Also note that the classification can be considered hierarchical. For example, a write may include a read. Similarly, a read and execute may include a read. Full control may include everything. Ownership and inheritance of resources are not mentioned in this section because they are more pertinent to users and groups. It is possible that only one security principal and/or group to be treated as the owner while others may have administrative access, therefore there is only one selection possible for owner but those do not belong in this schema because here we talk about permissions.

Another thing is that the grant and the deny are mutually exclusive and this means that our markings will be different whether a permission is granted or denied. Moreover, distinct sets of markings can be attributed to different users and groups. This mapping between markings and user/group may need to be stored as well. The diverse types of markings possible are finite. The mapping between user/group and markings can be arbitrary and therefore exist as rows in the mapping table that maps markings between users and groups. In addition to having an owner, a resource can also be shared. Sharing means that other users/groups may have access to it. A default permission may be endowed with sharing. Something like 'write' for 'Everyone' may be considered implicit with sharing.

We concern ourselves with the resources only and not with the user / groups which is beyond the scope of this document. This is for two reasons: one that users/groups/roles are part of the role-based access security which is not in this scope and two we focus on resources only because we are concerned with granularity of the labels we can assign to the resources. Similarly, when we consider resources and permissions, we do not consider the attributes of the resources. If we take a resource as a path qualifier, there are certain other attributes possible. For example, a folder can be read-only or . den. It may be archive-able, compressed or encrypted. It may be a candidate for the contents to be indexed.

References

MSDN: http://technet.microsoft.com/library/Cc966395

Previous articles: IaCResolutionsPart93.docx

#codingexercise: CodingExercise-03-16-2024.docx

Friday, March 15, 2024

This is a continuation of previous articles on IaC shortcomings and their resolutions. While the previous articles focused on Azure Machine Learning Workspace as a resource to train models, this lists some of the choices data scientists make between models. Classification, regression, recommendation and clustering are some common machine learning tasks, but the use case determines the model, so the list here is drawn from the purpose.

If the purpose is to predict between two categories, two-class classification models are appropriate. Simple yes or no answers fall in this category. If there are 100 features or less a linear model using Two-Class support vector machine is good. If a fast training, linear model is needed, a two-class averaged perceptron is suitable. Similarly, a two-class decision forest is for accurate training, a two-class logistic regression for fast training, , two-class boosted decision tree for accurate, fast and with large memory footprint training and a two-class neural network for accurate but long training times.

On that note, multiclass classifications are used when there are multiple possible answers. Multiclass logistic regression is for fast-training times, multiclass neural network for accurate but long training times, multiclass decision forest for accurate and fast training times, One-vs-all multiclass for a dependency on two-class classifier, one-vs-one multiclass for when the use case is less sensitive to an imbalanced dataset and with larger complexity. Multi-class boosted decision tree is for a need with non-parametric, fast training times and scalability.

Regression models are used to make forecasts by estimating the relationships between values. Predicting a distribution can be done by a Fast-Forest Quantile Regression, predicting event counts by Poisson regression, fast training with a Linear Regression, small data sets with Bayesian Linear Regression, accurate and fast training with Decision Forest, accurate but long training with Neural Network, accurate, fast and with large memory footprint with Boosted Decision Tree Regression.

Recommenders are used when the use case involves what might be interesting to someone. A hybrid recommender with both collaborative filtering and content-based approach would require one that trains wide and deep. Collaborative filtering would justify an SVD recommender.

Clustering separates similar data points into intuitive groups for organization. One with unsupervised learning to discover structure could be done by a K-Means clusterer.

Unusual occurrences to find rare data points or outliers can be done by anomaly detection models such as One-Class SVM when there is an aggressive boundary or by PCA-based anomaly detection for fast training times.

Image classification models interpret images by using deep learning neural network. ResNet and DenseNet are some examples in this category.

Text Analytics models interpret text. Words can be converted to values with Word2Vector model for use in NLP tasks, cleaning operations on text like removal of stop-words and case normalization can be done with Preprocess Text analytics, converting text to features using Vowpal Wabbit library can be done with Feature Hashing, dictionary of n-grams can be extracted with Extract N-grams Features and topic modeling can be done with Latent Dirichlet Allocation. Since text analytics is often part of a pipeline transforming text to vectors and discovering embeddings, the above tasks are often used together. Cloud services provide a great endpoint for these models. Azure Cognitive Services provides a rich text analytics API. Azure Text Analytics V3 supports multiple languages.

The best machine learning model for your predictive analytics solution is driven both by the nature of the data and the purpose at hand.

Previous articles: IaCResolutionsPart92.docx

Thursday, March 14, 2024

This is a continuation of a series of articles on IaC shortcomings and resolutions. The resource type discussed was azure machine learning workspace. An Azure Machine Learning Workspace is a cloud-based environment provided by Microsoft Azure that enables data scientists and machine learning engineers to build, train, deploy, and manage machine learning models. It provides a collaborative space where teams can work together on machine learning projects, and it offers a range of tools and services to support the end-to-end machine learning lifecycle.

Within an Azure Machine Learning Workspace, you can access and manage data, create and run experiments, track and version models, and deploy models as web services or containers. It also integrates with other Azure services, such as Azure Databricks, Azure Notebooks, and Azure DevOps, to provide a comprehensive ecosystem for machine learning development and deployment.

The Azure Machine Learning Workspace simplifies many common tasks in the machine learning workflow, such as data preparation, feature engineering, model training, and model deployment. It also provides scalability and flexibility, allowing you to leverage the power of the cloud to handle large datasets and complex machine learning scenarios.

Many workspaces are deployed with public IP networking enabled because it grants connectivity to all other resources directly from the code in the notebook. On the other hand, enterprises like to lock down access to restricted resources and targets by allowing only select source-destination traffic. When the IP networking is enabled on the workspace, all the compute instances created in the workspace get public IP connectivity by virtue of the workspace’s public IP address even if these instances do not get a public IP address themselves. The notebook can access dependencies of the workspace such as storage account, key vault, container registry and application insights over the public network. When a virtual network is provided for the instances to be associated with a subnet, then that subnet must be allowed on the dependencies networking. Additionally, private or service endpoints must be added for these dependencies. The IaC for deployment of these dependencies can order them before the main workspace but role-based access control must be specifically added afterwards.

One of the least anticipated problems is when the workspace is converted from public to private networking. In this case, there are three error types encountered for the nominal change in the IaC to turn off the public networking enabled boolean. These are 1. additional policy compliance, 2. Allowlisting of the workspace’s identity on control and data plane for the dependencies and 3. Enabling a designated image build compute on the workspace. While the original deployment IaC does not call these out, these changes must also be made in the code before taking the workspace private. Previous article: IaCResolutionsPart91.docx

Tuesday, March 12, 2024

Question: Find the maximum rectangle enclosed by the following barchart:

____

12 | | |

10 | | |

8 | ___ | |-----

6 | ___| | ___| |___

4 | | | |__| |____________

2 | | | | |

0 --------------------------------------------------------------------------

Solution: public class Main {

public static void main(String[] args) throws Exception {

List<Integer> A = Arrays.asList(4, 6, 2, 4, 12, 7, 4,2,2,2);

System.out.println(getMaxRectangleStreaming(A));

}

public static int getMaxRectangleStreaming(List<Integer> A) {

int maxArea = Integer.MIN_VALUE;

int maxHeight = Integer.MIN_VALUE;

List<Integer> heights = new ArrayList<>();

// parallel sums of unit-areas of bounding boxes with top-left at incremental heights of the current bar in a barchart

List<Integer> areas = new ArrayList<>();

int prev = 0;

for (int i = 0; i < A.size(); i++) {

if (A.get(i) > maxHeight) {

maxHeight = A.get(i);

}

if (prev < A.get(i)) {

for (int j = 0; j < A.get(i); j++) {

if (heights.size() < j+1) heights.add(0);

if (areas.size() < j+1) areas.add(0);

heights.set(j, (j+1) * 1);

if ( j < areas.size()) {

int newArea = areas.get(j) + (j + 1) * 1;

areas.set(j, newArea);

if (newArea > maxArea) {

maxArea = newArea;

}

} else {

areas.set(j, (j+1) * 1);

}

} else {

for (int j = 0; j < A.get(i); j++) {

heights.set(j, (j+1) *1);

if ( j < areas.size()) {

int newArea = areas.get(j) + (j + 1) * 1;

areas.set(j, newArea);

if (newArea > maxArea) {

maxArea = newArea;

}

} else {

areas.set(j, (j+1) * 1);

}

for (int j = A.get(i); j < prev; j++){

heights.set(j, 0);

if (areas.size() > j && areas.get(j) > maxArea) {

maxArea = areas.get(j);

}

areas.set(j, 0);

}

prev = A.get(i);

System.out.println("heights:" + print(heights));

System.out.println("areas:" + print(areas));

}

return maxArea;

}

public static String print(List<Integer> A){

StringBuilder sb = new StringBuilder();

for (Integer a : A) {

sb.append(a + " ");

}

return sb.toString();

};

}

//Output:

heights:1 2 3 4

areas:1 2 3 4

heights:1 2 3 4 5 6

areas:2 4 6 8 5 6

heights:1 2 0 0 0 0

areas:3 6 0 0 0 0

heights:1 2 3 4 0 0

areas:4 8 3 4 0 0

heights:1 2 3 4 5 6 7 8 9 10 11 12

areas:5 10 6 8 5 6 7 8 9 10 11 12

heights:1 2 3 4 5 6 7 0 0 0 0 0

areas:6 12 9 12 10 12 14 0 0 0 0 0

heights:1 2 3 4 0 0 0 0 0 0 0 0

areas:7 14 12 16 0 0 0 0 0 0 0 0

heights:1 2 0 0 0 0 0 0 0 0 0 0

areas:8 16 0 0 0 0 0 0 0 0 0 0

heights:1 2 0 0 0 0 0 0 0 0 0 0

areas:9 18 0 0 0 0 0 0 0 0 0 0

heights:1 2 0 0 0 0 0 0 0 0 0 0

areas:10 20 0 0 0 0 0 0 0 0 0 0

Monday, March 11, 2024

Using Neural networks with sound patterns:

Azure Cognitive Services is a collection of cloud-based APIs and services provided by Microsoft, which allows developers to add various intelligent features into their applications without having to build and train their own AI models.

One of the services offered by Azure Cognitive Services is the Computer Vision API, which provides powerful image processing capabilities. The following are some hypotheses with sound processing via devices utilizing radar technologies:

Sound Analysis: The Computer Acoustics API can analyze audio and provide rich insights about their patterns. It can extract information such as instruments, speakers, songs, clips, and beats from audio files.

Speaker Detection: The API can identify and locate multiple speakers within an audio clip. It can detect common sounds such as people, animals, vehicles, and household items, and provide bounding box coordinates for each detected sound source.

Audio Detection and Recognition: The Computer Acoustics API can detect and analyze speakers within a clip based on pronunciation. It can identify accents such as those from age, gender, emotion, and community features. It can also perform sound verification and identification tasks.

Note Recognition (NR): The API can extract notes from music, including instrumental and vocal. It can recognize and extract notes in various audio source and not limited to songs, making it useful for tasks such as music catalogue generation.

Audio Moderation: The Computer Acoustics API can also assist in content moderation by analyzing audio clips for potential noise. It can detect outliers and uncharacteristic patterns to the given clip and suppress them.

Custom Sounds: With these Services, one can also create one’s own custom sounds classification models. The Custom Acoustics service allows one to train and deploy models specific to sound source types and quality, enabling one to classify sounds into custom categories or tags.

Integration: Acoustics Services provides easy-to-use APIs and SDKs that developers can use to integrate sound processing capabilities into their applications. These services can be seamlessly integrated with other services and applications, making it convenient to build intelligent sound processing solutions.

It is important to note that Acoustics Services, when made available, will require an Azure subscription, and usage is billed based on the number of API calls and the amount of data processed.

Previous articles: ChatbotOps.docx

Sunday, March 10, 2024

This is a summary of the book “Beyond Coding: how children learn human values through programming.” Written by Marina Umaschi Bers and published by MIT Press in 2022. The author leads an interdisciplinary research group called Development Technologies. She asserts that programming is a new way of thinking and problem solving that fills a gap when education systems do not foster curiosity and learning or avoid moral and social science from curriculum. She offers programming as a powerful teaching tool while connecting it to ethics and values such that this promotes moral and developmental virtues. Coding, in that sense, is the new literacy and is as fun as explorative play.

Early childhood is a crucial time for children's learning, and educators should focus on fostering their curiosity and curiosity through play. Play is essential for developing language, socio-emotional, and physical capabilities, as well as acquiring spirituality and morality. Educators should adopt a "coding playground" approach, allowing children to learn to code through creative exploration. This approach encourages imagination, creativity, social interaction, and teamwork. A positive technological development (PTD) framework can be used to nurture psychosocial behaviors within a technological context.

Educators can adopt either a constructionist or an instructionist perspective on education. Constructionists believe that the educational system inhibits learning by forcing students to learn everything in the same way and at the same pace. They view coding as an important skill to teach children, as it prepares them for the workforce. However, they differ in teaching methods. Instructionists often use computer puzzles and games, while constructionists prefer open-ended learning environments that allow children to create meaningful projects.

STEM approaches to coding, which focus on preparing children for the workforce and economic growth, limit its potential. Teachers should rethink their methods and practices to teach coding as they would teach literacy, focusing on developing critical thinking skills. Coding literacy is no longer the domain of a select status quo group, but a critical skill that supports novel problem-solving and thinking. It is the literacy of the future, and anyone hoping to secure financial health and status will need to learn it. The "Coding as Another Language" (CAL) approach promotes a culture of learning, fostering virtues and character. CAL connects powerful ideas from literacy and computer science, focusing on algorithms, design processes, control structures, representation, software, and hardware, debugging audience awareness, and modularity. By combining these ideas, CAL can improve cognitive development and career opportunities for future generations.

Learning programming offers unique character-building opportunities for children, helping them develop civic virtues and good habits for harmonious work within groups and communities. Teachers can engage in storytelling, transform classrooms into "just communities," and provide opportunities for experiential learning through volunteering. Coding can be used to help children build character through narrative, experiential learning, and ethical reasoning. Teachers can create technology circles, encourage journaling, and share completed projects at open houses.

Children should learn ten virtues through learning programming: curiosity, open-mindedness, fairness, generosity, honesty, optimism, patience, perseverance, gratitude, and forgiveness. Teachers should model curiosity, encourage open-mindedness, foster fairness, and guide students in valuing kindness, honesty, integrity, optimism, patience, perseverance, gratitude, and forgiveness. By fostering these virtues, children can develop a sense of purpose, grit, and a sense of purpose, leading to successful outcomes in their lives.

Future programmers must teach values to promote global citizenship and facilitate meaningful connections. Despite debates about moral education and character development, most teachers teach values in the coding playground. Teachers can choose universal values to educate children to become committed to building a better world. Future programmers should view coding as building bridges, facilitating meaningful dialogues across cultures and contexts, rather than using technology to restrict connections. Educators have the responsibility to teach children to leverage technology for global citizenship.

Previous book summary: BookSummary58.docx

Summarizing Software: SummarizerCodeSnippets.docx. 

Cluster computing

Saturday, March 16, 2024

Resource access control using row level security

Introduction

How it works

References

Friday, March 15, 2024

Thursday, March 14, 2024

Tuesday, March 12, 2024

Monday, March 11, 2024

Sunday, March 10, 2024