Cluster computing

Wednesday, November 18, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

1. Application optimization is probably the only layer that truly remains with the user even in a full-service stack using a storage product. Scaling, availability, backup, patches, install, host maintenance, rack maintenance do remain with the storage provider.
1. The use of http headers, attributes, protocol specific syntax and semantics, REST conventions, OAuth and other such standards are well-known and covered in their respective RFCs on the net. Content-Delivery-Network can be provisioned straight from the object storage. Application optimization is about using both judiciously
1. An out of box service which facilitates an administrator defined rules for enabling the type of optimizations to perform.  Moreover, rules need not be written in the form of declarative configuration. They can be dynamic in the form of a module.
1. The Application Optimization also acts as a gateway when appropriate. Any implementation of gateway must maintain a registry of destination addresses. As http access enabled objects proliferate with their geo-replications, this registry becomes granular at the object level while enabling rules to determine the site from which they need to be accessed. Finally, they gather statistics in terms of access and metrics which come very useful for understanding the http accesses of specific content within the object storage.

Tuesday, November 17, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

To find similar networking resources to form a group, we use some form of a similarity score. One way to calculate this score is to plot the resources that have been ranked in common and use them as axes in a chart. These scores can then be used with tags and other entities.

To determine the closeness, a few mathematical formula help. For example, we could use the Euclidean distance or the Pearson co-efficient. The Euclidean distance finds the distance between two points in a multidimensional space by taking the sum of the square of the differences between the coordinates of the points and then calculating the square root of the result.

The Pearson correlation co-efficient is a measure of how highly correlated the two variables are. It’s generally a value between -1 and 1 where -1 means that there is a perfect inverse correlation and 1 means there is a perfect correlation while 0 means there is no correlation. It is computed with the numerator as the sum of the two variables taken together minus the average of their individual sums and this is divided by the square-root of the product of the squares of the substitutions to the numerator by using the same variable instead of the other.

Tags can be used to make recommendations against the data to be searched. Tags point to groups and the preferences of the group is used to make a ranked list of suggestions. This technique is called collaborative filtering. A common data structure that helps with keeping track of preferences is a nested dictionary. This dictionary could use a quantitative ranking say on a scale of 1 to 5 to denote the preferences of the participants in the selected group.

Monday, November 16, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Web accessible Storage is touted as best for static content. Data that changes often is then said to be preferred in NoSQL or other unstructured storage. With object versioning, API and SDK, this is no longer the case. Also, when services are stateless, the data pertaining to a request can be saved in an object storage which gives. it request level granularity and http based access. This request scoping of storage makes it easy to retrieve with isolation.

Data Transfers have never been considered a virtual storage since they belong to the source. Data in transit can live in queues, cache and object storage which is good for vectorized execution.

The nature of the query language determines the kind of resolving that the data virtualization needs to do. In addition, the type of storage that the virtualization layer spans also depend on the query language.

In order to explain the difference between data virtualization over structured and unstructured storage types, we look at metadata in structure storage. All data types used are registered.  Whether they are system built-in types or user defined types, the catalog helps with the resolution.

A query describing the selection of entries with the help of predicates does not necessarily have to be bound to structured or unstructured query languages. Yet the convenience and universal appeal of one language may dominate another. Therefore, in such cases whether the query language is agnostic or predominantly biased, it can be modified or rewritten to suit the needs of the software stacks. Networking products do not have to reinvent querying.

Sunday, November 15, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

The virtual machines for the individual customers are sticky. Customers don’t usually release their resource and even identify it by their name or ip address for their daily work.  They host applications, services and automations on their virtual machines and often cannot let go of their virtual machine unless files and programs have a migration path to another compute resource. Typically, they do not take this step to create regular backups and keep moving the resource.
Cluster based topology testing differs significantly from peer-to-peer networking-based topology testing. One represents the capability and the other represents distribution. The tests must articulate different loads for each.
1. The testing of software layers is achieved with simulation of lower layers. However, integration testing is closer to real life scenarios. Specifically, the testing of data corruption, unavailability or loss is critical to the networking product
1. The use of injectors, proxies, man-in-the-middle test topology aspect but networking is more concerned with throughput and latency, specific numbers associated with minimum and maximum limits and inefficiencies when the limits are exceeded.
1. Most storage products have a networking aspect. Testing covers networking separately from the others. This means timeouts, downtime, resolutions and traversals up and down the networking layers on a host. It also includes location information.
1. The control and data path traces and execution cycles statistics are critical to capture and query with a tool so that the testing can determine if the compute is at fault. Most such tools provide data over the http.

Saturday, November 14, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

The virtual machines for the individual customers are sticky. Customers don’t usually release their resource and even identify it by their name or ip address for their daily work.  They host applications, services and automations on their virtual machines and often cannot let go of their virtual machine unless files and programs have a migration path to another compute resource. Typically, they do not take this step to create regular backups and keep moving the resource.

While container platforms for Platform-as-a-service (PaaS) have enabled software to be deployed without any recognition of the host and frequently rotated from one host to another, the end user's adoption of PaaS platform depend on the production readiness of the applications and services the force for PaaS adoption has made little or no changes to the use and proliferation of virtual machines by individual users
With the move towards serverless computing, networking has become somewhat more ubiquitous and taken for granted as hosts and resources disappear behind business functions. There are no more leases and IT resources required as compute is setup and torn down on demand automatically. The shift between deep partitioned modular cloud applications to serverless computing is a sliding scale and applications tend to leverage what's best for their business logic and not by available networks even for private cloud.

Software defined networking has made it easy to setup and tear down networks at scale. Ip addresses and port numbers that were once sticky are no longer the case. When the deployments are on PaaS or on Kubernetes, the networking best practice is already leveraged enabling the applications to free up concerns on resources and to do more for the business.

Friday, November 13, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Early warning notifications, running rules engine, detecting trends are some of the features that enhance not only popular use cases by providing feedback of deployed software but also increase customer satisfaction as changes are incremental

File-Systems continue to be a good storage for networking systems. File attributes include name, type, location, size, protection, and time, date and user identification. Operations supported are creating a file, writing a file, reading a file, repositioning within a file, deleting a file, and truncating a file.

Data structures include two levels of internal tables: there is a per process table of all the files that each process has opened. This points to the location inside a file where data is to be read or written. This table is arranged by the file handles and has the name, permissions, access dates and pointer to disk block. The other table is a system wide table with open count, file pointer, and disk location of the file.

Sections of the file can be locked for multi-process access and even to map sections of a file on virtual memory systems. The latter is called memory mapping and it enables multiple processes to share the data. Each sharing process' virtual memory map points to the same page of physical memory - the page that holds a copy of the disk block.

File Structure is dependent on the file types. Internal file structure is operating system dependent. Disk access is done in units of block. Since logical records vary in size, several of them are packed in single physical block as for example at byte size. The logical record size, the physical block size and the packing technique determine how many logical records are in each physical block. There are three major methods of allocation methods: contiguous, linked and indexed. Internal fragmentation is a common occurrence from the wasted bytes in block size.

Access methods are either sequential or direct. The block number is relative to the beginning of the file. The use of relative block number helps the program to determine where the file should be placed and helps to prevent the users from accessing portions of the file system that may not be part of their file.

Thursday, November 12, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Event monitoring software can accelerate software development and test cycles. Event monitoring data is usually machine data generated by the IT systems. Such data can enable real-time searches to gain insights into user experience. Dashboards with charts can then help analyze the data. This data can be accessed over TCP, UDP and HTTP. Data can also be warehoused for analysis. Issues that frequently recur can be documented and searched more quickly with the availability of such data leading to faster debugging and problem solving.

Data is available to be collected, indexed, searched and reported. Applications can target specific interests such as security or correlations for building rules and alerts. Data is also varied such as from network, from applications, and from enterprise infrastructure. Powerful querying increases the usability of such data.

Queries for such key valued data can be written using PIG commands such as load/read, store/write, foreach/iterate, filter/predicate, group-cogroup, collect, join, order, distinct, union, split, stream, dump and limit.

Some of the differentiators of such software include the ability to have one platform, fast return on investment, ability to use different data collectors, use non-traditional flat file data stores, ability to create and modify existing reports, ability to create baselines and study changes, programmability to retrieve information as appropriate and ability to include compliance, security, fraud detection etcEarly warning notifications, running rules engine, detecting trends are some of the features that enhance not only popular use cases by providing feedback of deployed software but also increase customer satisfaction as changes are incremental