Cluster computing

Sunday, June 30, 2019

Today we discuss a data structure for Thesaurus. We have referred to a thesaurus as a source of information for natural language processing. There will be heavy use of the thesaurus when words from the incoming text are looked up for possible similarity. We will need a data structure that makes the lookups faster. A B-plus tree data structure to hold the words and their similarities will be helpful because they will make the lookups logarithmically faster while allowing the data to scale.

A thesaurus can be viewed as a table of words with references to other words. We form hierarchical representations with group id where the id points back to a word. A hierarchical representation of words is easy to explore with a recursive common table expression. By relating words to their groups as the next level, we can define the recursion using a level increment for each group and a terminal condition for the recursion. Although this table of words is standalone in this case, we can view it as a relational table and use a recursive CTE to query the words in the table.

The number of synonyms can vary for each word. A list of synonyms for each word can be easier to maintain but an arbitrary big table with a large number of columns will also suffice because a word will usually have a finite number of synonyms and their order does not matter. Synonym1, Synonym2, Synonym3, … can be the names of the columns in this case. Each synonym is a word. Each word is represented by the identifier associated with it in the table. A word and its list of synonyms can be represented with a column each where the second column stores varying number of word identifiers depending on the number of synonyms the word has. This might be efficient for storage purpose but it is better to store the words in a variety of words. Since the rows in the table of synonyms will be listed in alphabetical order of the words, a clustered index could prove helpful to make lookups faster.

A table for the synonyms facilitates SQL queries over the entries. This facilitates application development over the thesaurus because now the built-in operators and features of the well-known language come useful. Also, it helps with the object-to-relational mapping of the entities and thus makes it easier for the development of services using the language native features. The separation of querying from the storage also helps keep create-update-deletes to be independent from read-only operations.

The storage of synonyms does not need to be kept in a table. Alternative forms of persistence such as graphs with words as nodes and weighted edges as synonyms can also be used. However, the query language for graphs may not be the same as that for tables. Other than that, both forms of storage can be used to store the thesaurus.

Saturday, June 29, 2019

We now discuss some of the mitigations to the vulnerabilities detected by this tool. Perhaps the best defense is to know what is contained in the images. If we are proactive in not including packages with vulnerabilities, we reduce the risk tremendously. Next we can be consistent and clean in elaborating the location from which the images are built or the packages/jars are included. Third, we can update our dependencies to use the latest software as and when they get released. Fourth, we can run our scan continuously on our builds so we can evaluate the changes within. Finally, we can manifest and document all our jars and containers.
Some dependencies incur a lot of vulnerabilities. Their replacements are also well-advertised. Similarly, popular source code and containers don’t get due maintenance. The approach to start with minimal and make incremental progress wins in all these cases.
Source code analyzers work differently from binary scanners. Although both are static analyzers, source code analysis comes with benefits such as scanning code fragments, supporting cloud compiled language, compiler agnostic and platform agnostic processing etc. All scanner will evaluate only those roles that are specified to them from the Common Vulnerabilities and Exposure CVE public database.
There are a few thumb rules to defend against vulnerabilities
Keep the base images in Dockerfile up to date
Prefer public and signed base images to private images.
Keep the dependencies up to date.
Make sure there are no indirect references or downloaded code
Ensure that http redirects are properly chaining the correct dependencies
Actively trim the dependencies.
Get the latest CVE definitions and mitigations for the product
Validate the build integrations to be free from adding unnecessary artifacts to the product
Ensure that the product images have a manifest that is worthy of publication to public registries
Ensure that the product and manifests are scanned for viruses
A deployment could be used to scan network vulnerabilities and web application testing
Keep the images and product signed for distribution
Enforce role-based access control on all images, binaries and artifacts.
Keep all the releases properly annotated, documented and versioned.

int GetDistance(int[][] A, int Rows, int Cols, int X, int Y) {

Pair<int,int> posx = GetPosition(A, Rows, Cols, X);

Pair<int, int> posy = GetPosition(A, Rows, Cols, Y);

If ((posx.first == -1 || posx.second == -1) && (posy.first == -1 || posy.second == -1)) {

Return 0;

}

If (posx.first == -1 || posx.second == -1) {

Return getDistanceFromTopLeft(posy);

}

Return getDistanceFromTopLeft(posx);

}

Int getDistanceFromTopLeft(Pair<int, int> position)

{

If (position.first == -1 || position.second == -1) return 0;

Return position.first + position.second; // we don’t take cartesian co-ordinate distance as the sqrt of sum of squares of displacements along x and y axis

}

Friday, June 28, 2019

We continued discussing the static scanners - both source code and binary scanners. The scanner doesn’t really advertise what all activities it performs. However, we can take for granted that the scanner will not flag whether a container is running as root. It will also not flag an insecure Kubernetes configuration. Inappropriate use of shared resources such as persistent volumes are also not flagged by scanner. Even security vulnerabilities that are not already known to the scanner can escape detection. The scanner does look for package names. Many vulnerabilities are directly tied to packages and their versions. A registry of package and vulnerabilities proves easy to detect those in the container image. However, not all vulnerabilities are associated with package names. The same goes for open source that are not referred from their public locations but are indirectly included in the container image either from tar ball or other forms of download or local inclusions. Source analysis is also different from binary analysis so we cannot expect overlap there. If a source code has been included in the container image and it has been locally with build and install utility programs like ‘make’, they will likely escape detection. Image scanning is a foundational part of container security.

Open source is not limited to containers and the security threat modeling for open source differs from that for products that use containers. Open source is popular for the functionality they offer along with the access to the source code for customizations. Most companies will use source code. The Open Web Application Security Project (OWASP) was founded with drafting guidelines for companies that use open source for web applications. A top ten list for frequently seen application vulnerabilities was included in their publications. This list cited 1. Injection of code that can be executed within trusted boundaries, 2. broken authentication that lets you compromise system, 3. disclosure of sensitive data such as Personally Identifiable Information (PII) 4. XML external entities that break or exploit xml parsers 5. broken access control where an administrative panel can become available to a low-privilege user. 6. Security misconfigurations where some omission or mistakes in enforcement of security policies allow hackers to exploit, 7. Cross site scripting where a victim’s browser can execute malicious code from an attacker, 8. Insecure deserialization that lets attackers manipulate messages or state to gain remote code execution 9. Using components with known security issues that lower the common bar for security and 10. Insufficient logging and monitoring of the product where the authentication, authorization and auditing somehow escape detection or known patterns of exploitation are not detected. These kinds of attack vectors are very common and must be avoided when using web applications with open source.

These are some of the limitations of the scanner and they are best done by tools other than scanner.

Thursday, June 27, 2019

Programmability to automate container image scanning
The article on Container Image Scanning written earlier was an introduction. Different vendors provide the ability to scan the container images for security vulnerabilities that can be patched with software updates to improve the baseline. Most vendors try to tie the scanning to their repository or registry. For example, the binary repository in organizations that stores builds provides its own x-ray option. The cloud registry of container images from a popular container framework vendor provides its own scanning solution that works only with its hub of images.
Organizations have to choose between on-premise image storage or uploading to image registries and this drives the requirements to automate the scanning of images produced from every build. The automation usually proceeds with requests made to the application-programming-interface at the service hosted for the scanning of images at the repository or the registry. The requests can only be made by accounts registered for use with the service.
There are third party products that try to break the vendor lock-in. They even offer to scan images that are built locally. Usually a linking key is required to do so which links the scanner from the product instance to the service that automates and monitors the process. Therefore, the scanner can be remote while the service consolidates the requests and responses.
A scanner cannot remain in the field without acquiring latest knowledge about security vulnerabilities. New vulnerabilities keep cropping up and there needs to be a feedback to the scanner so that it can include the detection of these new vulnerabilities in its routine. This is facilitated with the help of programs named plugins that can be fed to the scanner to do its job
In order for the scanner to scan an image, the image must first be imported from the repository or registry. This is done with the help of a connector which imports images from a specific repository or registry. Connectors vary by the type of the target from which they import the image.
A scanner by itself and a connector can serve any on-premise need to scan an image. However, they are quite useless without a set of plugins where each plugin detects one or more vulnerabilities and takes steps to eradicate them. This definition is available usually from the third-party service that makes the scanner and collector available. A subscription is required to import the plugins from the well-known public Common-Vulnerabilities-and-Exposure (CVE) database of cybersecurity vulnerabilities.
For example:
docker run
-e SERVICE_ACCESS_KEY=<variable>
-e SERVICE_SECRET_KEY=<variable>
-e SERVICE_REPO_NAME=<variable>
-i thirdparty-docker-consec-local.jfrog.io/cs-scanner:latest inspect-image <Image name>

Wednesday, June 26, 2019

Container Image Scanning:
In our earlier article, we described how container images have become relevant in today’s security assessment. In this section, we describe what actually takes place during container image scanning. Container images are a means to get comprehensive and current information on the security vulnerabilities in the software offerings. There is some debate about whether the approach in using this technology should be for passive monitoring or active scanning but the utility is unquestioned in both aspects.
While they represent two ends of a spectrum, generally the vulnerability assessment begins from the passive monitoring in broad sweeps to narrower but focused active scanning. Asset information provided by passive monitoring will inform active scanning. Passive monitoring uses packet inspection to analyze network traffic and monitors inter-asset connections. Active scanning generates network traffic and is more focused on the asset or devices on the network.
Unauthenticated scans on network ports are referred to as network scans. They examine device from outside-in. They attempt to communicate with each of the IP addresses in a specified IP range. Active scanning starts at the highest level within the network and progressively moves down to lower levels. This step-down occurs in graded manner and over an evaluation period
When a scan is run, a container is seen as a form of layers. Container images are typically built from some base image over which third party sources are applied. These images and libraries may contain obsolete or vulnerable code. Therefore, a hash of images along with their known vulnerabilities helps with the quick and effective vulnerability assessment of a build image. Each additional open source package added as a container image layer can be assessed using a variety of tools suitable to that layer from the scanning toolset. Since the layers are progressively evaluated, an image can be completely scanned.
Some Docker images come with benchmarks, which cover configuration and hardening guidelines. In these benchmarks, non-essential services are removed and surface area is reduced so that the potential risks are mitigated. Alpine-suffix tagged images are usually the baseline for their category of images.
As with all asset management, images can also be classified as assets. Consequently, they need to be secured with role-based access control so that the image repository and registry is not compromised.
These salient features can be enumerated as steps with the following list:
1. Know the source and content of the images.
2. Minimize risks from the containers by removing or analyzing layers.
3. Reduce the surface area in images, containers and hosts
4. Leverage the build integration tools to do it on every image generation
5. Enforce the role segregation and access control for your Docker environment
6. Automate the detection actions and enforcement such as failing a build
7. Routinely examine the registries and repositories to prevent sprawl.
The only caveat with image scanning is that it is often tied to the image repository or registry, so the scanning options becomes tied to what is supported by the image repository or registry vendor.

Monday, June 24, 2019

Analyzing Docker images for security vulnerabilities:
This is an explanation of an emerging trend in the vulnerability assessment tools and the general security industry in general. The focus of the earlier article emphasized binaries in the form of executables. This has been the traditional model of packaging and publishing a product. It usually comes with an installer or a mere layout of the files produced by the compiler of the language in which the product was written.
With the move to the cloud, containers became widely adopted. A container framework mentions a proprietary format which includes not just the executable but also the libraries that enable the executable to be run in its own hosting environment referred to as the runtime. This format included a snapshot of all the executable and the runtime requirements. It is referred to as an image.
The container framework tossed in requirements to the image so that the image can work with their abstractions of components of the deployment often referred to as pods in a cluster. A pod is an encapsulation of the resources required to run an image. While a container provides a runtime for the image, the pod hosts one or more containers.
There were no more the concepts of Mac and PC as computers to run a program. The container framework was a veritable computer in itself and required the programs to be made available as images. This had the nice side benefit that the images could involve a variety of tools, technologies, languages, products and their settings. Some images became popular for distribution. Perhaps one of the most important benefit is that this packaging could not run on a computer or in the public and private clouds.
As such the security tools that worked on homogeneous technologies including the language specific introspection of objects now had a significant challenge in analyzing a basket of full of heterogeneous programs. This mixed notion of application was difficult to scan without including an almost equally hybrid toolset to cover all the programs. Along with the emerging trends, a lightweight language that was highly performed and geared for modular execution named Go also became popular. Toolsets to scan the binaries of these images were somewhat lacking not only because the language was lacking but also because the language represented erstwhile the unsafe usages of C programs and their functions.
As images began to be authored, collected and shared, they spread almost as fast as the internet and required public and private registries to be maintained so that they can be looked up, uploaded and downloaded. This proliferation of images provided a new challenge to digital signatures of the images which was not always avoided with the security around the registries.

Sunday, June 23, 2019

Today we discuss event storage:

Event storage gained popularity because a lot of IoT devices started producing them. Read and writes were very different from conventional data because they were time-based sequential and progressive. Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Event and it is in this space that Events and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.

File-systems have long been the destination to store artifacts on disk and while file-system has evolved to stretch over clusters and not just remote servers, it remains inadequate as a blob storage. Data writers have to self-organize and interpret their files while frequently relying on the metadata stored separate from the files.  Files also tend to become binaries with proprietary interpretations. Files can only be bundled in an archive and there is no object-oriented design over data. If the storage were to support organizational units in terms of objects without requiring hierarchical declarations and supporting is-a or has-a relationships, it tends to become more usable than files.

Since Event storage overlays on Tier 2 storage on top of blocks, files, streams and blobs, it is already transferring data to object storage. However, the reverse is not that frequent although objects in a storage class can continue to be serialized to Event in a continuous manner. It is also symbiotic to audience on both storage.

As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues can be given native support in unstructured storage. Messages are also time-stamped and can be treated as events