Cluster computing: September 2020

Wednesday, September 30, 2020

Network engineering continued...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Performance counter: Frequently subsystems and components take a long time. It is not possible to exhaust diagnostic queries to discover the scope that takes the most time to execute. On the other hand, the code is perfectly clear about call sequences, so such code blocks are easy to identify in the source. Performance counters help measure the elapsed time for the execution of these code blocks.

Statistics counter: In addition to the above-mentioned diagnostic tools, we need to perform aggregation over the execution of certain code blocks. While performance counters measure elapsed time, these counters help with aggregation such as count, max, sum, and so on.

Locks: In order to perform thread synchronization, these primitives are often used. If their use cannot be avoided, they are best taken as few as possible universally. Partitioning and coordination solve this in many cases. The networking server relies on the latter approach and versioning.

Parallelization: Generally, there is no limit enforced to the number of parallel workers in the network server or the number of partitions that each worker operates on. However, the scheduler that interleaves workers works best when there is one active task to perform in any time slice. Therefore, the number of tasks is ideal when it is one more than the number of processors. A queue helps hold the tasks until their execution. This judicious use of task distribution improves performance in every layer.

Serialization: There is nothing simpler than bytes and offsets to pack and persist in any data structure. The same holds true in network engineering. We have referred to messages as a necessity for communication between layers and components. When these messages are written out, it is irrelevant whether the destination is local or remote. Serialization comes useful in both cases. Consequently, serialization and deserialization are required for most entities.

Tuesday, September 29, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

47) Key Management: We have emphasized that keys are needed for encryption purposes. This calls for keys to be kept secure. With the help of standardized key management interfaces, we can use external KeySecure managers. Keys should be rotated every now and then.

48) API security: it is almost undeniable to have APIs with any network service. Every request made over the web must be secured. While there are many authentication protocols including OAuth, each request will be sufficiently secured if it has authorization and a digital signature. API keys are not always required.

49) Integration with authentication provider: IPSec protocol has been integrated with Active Directory. This enables organizations to take advantage of authorizing domain users. Identity and Access management for cloud services can also be referred to.

50) Auditing: Audit serves to detect unwanted access and maintain compliance with regulatory agencies. Most network services enable auditing by each and every component in the control path. This is very much like the logging for components. In addition, the application exposes a way to retrieve the audits.

51) Offloading: Every bookkeeping, auxiliary, and routine activity that takes up system resources could be a candidate for hardware offloading so long as it does not have significant conditional logic and is fairly isolated. This improved performance in the data path especially when the activities can be consolidated globally.

Monday, September 28, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

41) Catalogs: Physical organization does not always have to directly co-relate with the way users save them. A catalog is a great example of utilizing the existing organization to serve various ways in which the content is looked up or correlated. Moreover, custom tags can help increase the ways in which the files can be managed and maintained. While lookups have translated to queries, content indexers have provided an alternate way to look up data. Here we refer to organization of metadata so that the network architecture can be separated from the logical organization and lookups. SNMP is credited to have formed a comprehensive metadata for the network but its use with products like Clearwire failed because there was no consensus on using it.

42) System metadata – Metadata is not specific only to the network artifacts from the user. Every layer maintains entities and bookkeeping in the immediately lower layer and these are often just as useful to query as some of the queries of the overall system. This metadata is internal and for system purposes only. Consequently, they are the source of truth for the rules in the system whether it is for routing or for interfaces.

43) User metadata – We referred to metadata for user objects. However, such metadata is usually in the form of predetermined fields that the system exposes. In some cases, users can add more labels and tags and this customization is referred to as user metadata. User metadata helps in cases outside the system where users want to group their content that can then be used in classification and data mining. Network does not need to remain hidden. A transparent peer to peer network allows customers to choose their own peers.

44) Connection pooling - a pool of open connections is efficient. These connections are already open and there is no overhead associated with using one. Each such pre-opened connection is ready for data transfer and is not considered dirty from previous use. Connection pooling can improve on overheads by as much as 50%.

45) Static content – is always preferable over dynamic data because a "learning mechanism" could allow clients to track the objects that are requested from a specific site and accelerate future requests by using the learned information and pre-fetching associated content. In addition, the learned information can be sent in parallel for normally sequential data, creating more optimization benefits.

46) Packet capture, route tracers and other tools - Every protocol has a sequence of request responses and their capture can be useful for troubleshooting at lower levels. The use of appropriate tools for appropriate job helps reduce cost of investigations. At higher levels, packets and logs entries can be collected and analyzed as machine data with tools that support user interface which take queries.

Sunday, September 27, 2020

Network engineering continued

This post is a continuation of the earlier posts starting from: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

36) Compression - Probably the hallmark of any efficient network is in the packing of the data. Most data files and directories can be archived. For example, a tar ball is a convenient way to make web sites and installable portable. When the data is viewed in the form of binaries, a long sequence of either 0 or 1 can be efficiently packed. When the binary sequence flips way too often, it becomes efficient to not encode It and leave it as such. That said, there are many efficient compression techniques available.

37) Acceleration- although network acceleration with the help of direct tcp connections is no longer a consideration only for specific network. In fact, organizations around the world are installing network appliances that enable acceleration over WAN networks to relieve pain points for application traffic that that span geographically distributed regions

38) Datacenters and data stores: the choice of locations for datacenters and Data stores plays heavily into the consolidation of network technologies. When virtual machines are spun up on organizational assets, they are often done in private datacenters. Many web services use the private cloud network and internal names to reach one host or the other but external connectivity to the internet is mandatory even for these VMs. Therefore, network offerings have to be mindful of consolidation and connection to the internet.

39) Distributed hash table. In order to scale horizontally over commodity compute, network tier use a distributed hash table to assign and delegate resources and tasks. This facilitates a large Peer to Peer network that works well with large scale processing including high volume workload.

40) Cluster This is another form of deployment as opposed to a single server deployment of network servers. The advantages of using a cluster include horizontal scalability, fault tolerance and high availability. Cluster technology is now common practice and is widely adopted for any server deployment.

Saturday, September 26, 2020

Network Engineering continued ...

This is a continuation of the article at http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Gateway- Traditionally gateways have been used to bridge across different network providers or between on-premise and cloud or even two similar but different origin network stacks. Gateways also help with load balancing, routing, and proxy duties. Some network providers are savvy to include this technology within their offering so that they are not customized everywhere.

Cache-A cache enables requests to be handled by providing the resource without looking it up in deeper layers. The technology can span across networks or serve at many levels deep in the stack. Cache not only improves performance but it also saves costs.

Checksum – This is a simple way to check data integrity and it suffices in a place where encryption may not be easy especially when keys required to encrypt and decrypt cannot be secured. This simple technique is no match for the advantages from encryption but it is often put to use in low-level message transfers and data at rest.

Containers – Data is organized as per the units of the organization from the network device or appliance. These containers however do not necessarily remain the same size because a user dictates what is packed in any container. Therefore, when it comes to data transfer, we can transfer a large container at a time or smaller. Also, users often have to specify attributes of the container and sometimes it could go wrong. Instead of correcting a container beyond salvage, it might be easier to recreate another and transfer the data. The network favors chunked data in transit so it is up to the network and the layer above to agree where the storage unit of data will be divided up into packets with assembly at the receiving end.

Aging – Generally the older the data, the lesser the activity on the data. The data age is progressive on the timeline. Therefore, it is easier to label the data as hot warm, and cold so that the cut-off for age-related treatments may then be taken. Cost savings on the cheaper network was touted as the primary motivation earlier but this has recently been challenged. It is easier to de-duplicate when data does not change and is posed for retiring.

Friday, September 25, 2020

Network engineering continued ...

This is a continuation of the article at http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Network organization – We referred to the hierarchical organization earlier that allows maximum flexibility to the user in terms of networking rules and policies. Here the organization includes networks, NAT traversal, IP security, and policies.

Background tasks – routine and periodic tasks can be delegated to the background workers instead of executing them in line with data in and out. These can be added to a background task scheduler that invokes them as specified.  Some of the tasks for networking are improved with journaling and other such background operations.

Relays – Most interactions between components are in the form of requests and responses.  These may have to traverse through multiple layers before they are authoritatively handled by a router or host. Relays help translate requests and responses between layers. They are necessary for making the request processing logic modular and chained. Think firewall, proxy, and gateway as primary citizens of the network.

Maintenance – Every network offering comes with a responsibility for administrators. Some excel at reducing this maintenance with the help of auto-tuning and automation of maintenance chores while others present comprehensive dashboards and charts for detailed, interactive, and involved maintenance. The managed service that moved technologies and stacks from on-premise Network Operations Center (NoC) to the cloud came with the reduction in Total Cost of Ownership by way of centralizing and automating tasks that provided scalability, high availability, backups, software updates and patches, host and server maintenance, rack and stack, power and network redress, etc.

Data transfer – The performance considerations of IO devices include throughput and latency in one form or another. Any network offering may be robust and large but will remain inadequate if the data transfer speed is low. In addition, data transfer may need to be across large geographical distances and repeatedly so. Facilitating of dedicated network connection may not be feasible in all cases so the baseline must itself be reasonable.

Thursday, September 24, 2020

Network engineering continued ...

This is a continuation of the article at http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Virtualization – Cloud computing has taught us the benefit of virtualization at all levels where different entities may be spanned with a universal access pattern. The network is no exception and every network product tends to take advantage of this strategy.

Security and compliance – Every regulatory agency around the globe look for some kind of certification. Most network providers have to demonstrate compliance with one or more of the following: PCI-DSS, HIPAA/HITECH, FedRAMP, EU Data Protection Directive, FISMA, and such others. Security is provided with the help of identity and access management and they come in useful to secure individual network artifacts

Management – Network is very much a resource. It can be created, updated, and deleted. With software-defined technologies, the resource only takes a gigantic form otherwise it is the equivalent of a single data record for the user. Every such resource has also significant metadata. Consequently, we manage the network just the same way as we manage resources.

Networking is often considered a veritable alternative to storage as referenced in the paper on queues are databases by Jim Gray. But it helps to keep the baremetal considerations separate from the overall product perspective.

Monitoring – Virtual large networks may be stretched across members in one form or the other. And the physical resources such as edge and core members often have failures and run into faults. Therefore, monitoring becomes a crucial aspect.

Wednesday, September 23, 2020

Network engineering continued ...

This is a continuation of the article at http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

1. Deduplication - As data ages, there is very little need to access it regularly. It can be packed and saved in a format that reduces spaces. Networking makes efficient use of bits. In addition, if data is repeated across packets, it can be viewed as segments that are delineations which facilitate study of redundancy in data. Then redundant segments may simply be avoided from storing which allows a more manageable form of accumulated raw data. Deduplication lightens the load for networking.
1. Encryption – Encryption is probably the only technique to truly protect a data when there can be unwanted or undesirable access. The scope of encryption may be limited to sensitive data if the raw data can be tolerated as not encrypted.
1. Data flow – Data flows into stores and stores grow by size. Businesses and applications that generate data often find the data to be sticky once it accumulates. Consequently, a lot of attention is paid to early estimation of size and the kind of treatment to take. Determining flows helps determine the network.
1. Protocols – Nothing facilitates communication between peers or master-slave better than a protocol. Even a description of the payload and generic operations of create, update, list and delete become sufficient to handle network relevant operations at all levels.
1. Layering – Finally network solutions have taught us that appliances can be stacked, services can be hierarchical and data may be tiered. Problem solved in one domain with a particular solution may be equally applicable to similar problem in different domain. This means that we can use layers for the overall solution