Cluster computing

Tuesday, October 27, 2020

Network engineering continued

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Shared-memory systems have been popular. They include SMPs, multi-core systems, and a combination of both. The simplest way to use it is to create threads in the same process. Shared-memory parallelism is widely used with big data.

The Shared-Nothing model supports shared-nothing parallelism. When each node is independent and self-sufficient, there is no single point of contention. None of the nodes share memory or disk storage. Generally, these compete with any model that has a single point of contention in the form of memory or disk space.

Shared-Disk: This model is supported where a large space is needed. Some products implement shared-disk and some implement shared-nothing. Shared-nothing and shared-disk do not go together in the same code base.

The implementation of a content-distribution network such as for images or videos generally translates to random disk reads which means caching may not always help. Therefore, the disks that are RAIDed are tuned. It used to be a monolithic RAID 10 when it is served from a single master with multiple slaves. Instead, nowadays a sharded approach is taken and preferably served from Object Storage.

Image and video libraries will constantly run into cache misses especially with slow replication. It is better to separate traffic to different cluster pools. The replication and caching into the picture to handle the load. With a distribution to different cluster pools, we can distribute the load and avoid them.

File Systems may implement byte-range locking to enable concurrent access. Typically, they are not supported by the File mapping operation. Poor use of file locks can result in performance issues or deadlock.

Monday, October 26, 2020

Network engineering continued ..

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Serializability of objects enables reconstruction on the remote destination. It is more than a protocol for data packing and unpacking on the wire. It includes constraints that enable data validation and helps prevent failures down the line. If the serialization includes encryption, it becomes tamper-proof.

Serializability is also the notion of correctness when simultaneous updates happen to a resource. When multiple transactions commit their actions, their result can correspond to the one from a serial execution of some transactions. This is very helpful to eliminate inconsistencies across transactions. It differs from isolation only in that the latter tries to do the same from the point of view of a single transaction.

Databases were veritable storage systems that guaranteed transactions. Two-phase locking was introduced with transactions where a shared lock was acquired before read and an exclusive lock before write. The two-phase referred to intent and acquisition. With transactions blocking on a wait queue, this was a way to enforce serializability

Transaction locking and logging proved onerous and complicated. Multi-Version Concurrency control was brought in for the purpose of not acquiring locks. With consistent view of data at some points of time in the past, we no longer need to keep track of every change made since the latest such point of time

Optimistic concurrency control was introduced to allow each transaction to maintain histories of reads and writes so that those causing isolation conflicts can be rolled back.

Sunday, October 25, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html
Hardware techniques for replication are helpful when the inventory is something, we can control along with the deployment of the storage product. Even so, there has been a shift to software-defined stacks, and replication per se is not required to be hardware-implemented any more. If it is offloaded to hardware, there is a total cost of ownership that increases so it must be offset with some gains.

The notion of physical replication when implemented in the software stack is perhaps the simplest of all. If the data is large, the time to replicate is proportional to the bandwidth. Then there are costs to reinstall the storage container and making sure it is consistent. This is an option for the end-users and typically a client-side workaround.

The notion of trigger-based replication is the idea of using incremental changes as and when they happen so that only that is propagated to the destination. The incremental changes are captured and shipped to the remote site and the modifications are replayed there.

The log-based replication is probably the most performant scheme where the log is actively watched for data changes that are intercepted and sent to the remote system. In this technique, the log may be read and the data changes may be passed to the destination or the log may be read and the captures from the logs may be passed to the destination. This technique is performant because it has a low overhead.

Most of the log-based replication methods are proprietary. A standard for this is hard to enforce and accepting all proprietary formats is difficult to maintain.

Statistics gathering: Every accounting operation within the storage product uses some form of statistics such as summation to building histograms and they inevitably take up memory especially if they can’t be done in one pass. Some of these operations were done as aggregations that were synchronous but when the size of the data is very large, it was translated to batch, micro-batch, or stream operations. With the SQL statement like query using partition and over, smaller chunks were processed in an online-manner. However, most such operations can be delegated to the background.

Saturday, October 24, 2020

The identity solution provider for the Cloud has an identity cloud that hosts a single system capable of authenticating and authorizing credentials from Virtual Private Network, On-Premise Applications and AD/LDAP. It is capable of connecting multiple untrusted Active Directory domains/forests to a single tenant of Office 365. This enables large enterprises or companies to go through mergers and acquisitions to easily add all users without changing their directory architecture.

One of the primary benefits of cloud computing is concept of a shared, common infrastructure across numerous customers simultaneously, leading to economies of scale. This concept is called multi-tenancy. Microsoft Office 365 and Okta both provide identity cloud that supports enterprise-level security, confidentiality, privacy, integrity and available standards. Microsoft office 365 is hardened with Trustworthy computing and Security Development Lifecycle principles where the tenants are assumed to be hostile to one another and the actions of one do not affect the other.

This isolation is provided on the basis of Public cloud AD based authorization and role-based access control, the storage level data isolation using Sharepoint online, rigorous physical security, background screening and a multi-layered encryption strategy to protect the confidentiality and integrity of customer content, server-side technologies that encrypt customer content at rest and in transit, including BitLocker, per-file encryption, TLS, and IPSEC. These protections provide robust logical isolation controls that provide threat protection and mitigation that is at par with the physical isolation.

In addition, Microsoft monitors and tests for weaknesses across tenant boundaries including intrusion, permission violation attempts, and resource starvation. The self-healing processes are built into the system.

Okta's tenant isolation structure is driven by several variables such as customer data access, data separation, and user-experience. Each Okta tenant is separated by its own data, network performance and feature set. Okta use cases treat workforce, customer and partner identity as separate. The workforce identity is supported by product features such as Univeral Directory, Single sign-on, Lifecycle management, and Adaptive multi-factor authentication. With workforce identity, IT enjoys one central place for policy-based management and employees get single sign-on.

Customer Identity products deliver customer user experience using Okta APIs and widgets, identity integration using APIs, scripts to modify user data, and APIs that handle authentication, authorization and user management.

Okta's approach to security comprises of two parts - Okta manages the security of the cloud and partners manage the security in their cloud. Okta provides the identity and access control lists. Partners provide the tenant and service settings and customer application and content. Partners are therefore responsible for leveraging the features of the identity cloud to grant the correct permissions to their users, disabling inactive accounts, properly configuring and monitoring the policies required to protect the data and reviewing activity data in the system log and monitoring Okta tenants for attacks such as password spraying and phishing.

Tenant data is stored in a tenant exclusive Keystore comprising of 256 bit AES symmetric keys and 2048 bit RSA Asymmetric keys. Okta uses asymmetric encryption to sign and encrypt SAML and WS-Fed Single sign-on assertions and to sign Open ID Connect and OAuth API tokens. Okta uses symmetric encryption to encrypt the tenant's confidential data in the database. With the asymmetric encryption, SSO risk for a tenant is minimized when a single org is compromised and tenants can rotate the keys.

Symmetric keys ensure data segregation and confidentiality for the tenant. Both types of keys are stored in a tenant exclusive keystore which can be accessed only with a tenant-exclusive master key. Since the keystore is unique to the tenant and keystores are stored in different databases, it mitigates damage when a single tenant is compromised. Tenant keys are only cached in memory for a short time and never stored on disk. No single person can decrypt customer data without a detailed audit trail and security response.

Friday, October 23, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Automatic load balancing can now be built on a range-based partitioning approach and account-based throttling.  This improves multi-tenancy in the environment as well as the handling of peaks in traffic patterns.

The algorithm for load-balancing can even be adaptive based on choosing appropriate metrics to determine traffic patterns that are well-known.  We start with a single number to quantify load on each partition and each server and then use the product of request latency and request rate to represent loads.
Memory allocation is a very common practice in the data path. The correct management of memory is required for programming as well as performance. A context-based memory allocator is often used. It involves the following steps: A context is created with a given name or type. A chunk is allocated within a context. The chunk of memory is deleted within a context after use. The context is then deleted and then reset. Alternatively, some systems are implemented in languages with universal runtime and garbage collection to utilize the built-in collection and finalization.
Overheads may be reduced by not requiring as many calls to the kernel as there are user-based requests. It is often better to consolidate them in the bulk mode so that majority of the calls return in a shallow manner.
Administrators have often dealt with provisioning sufficient memory on all computing resources associated with a networking product. With t-shirt sized commodity virtual machines, this is only partially addressed because that only specifies the physical memory. Virtual memory and its usage must be made easier to query so that corrective measures may be taken.

Thursday, October 22, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

The process per disk worker model is still in use today. It was used by early DBMS implementations. The I/O scheduling manages the time-sharing of the disk workers and the operating system offers protection. This model has been helpful to debuggers and memory checkers.

The process pool per disk worker model has alleviated the need to fork processes and tear down and every process in the pool is capable of executing any of the read-writes from any of the clients. The process pool size is generally finite if not fixed. This has all of the advantages from the process per disk worker model above and with the possibility of differentiated processes in the pool and their quota.

When compute and storage are consolidated, they have to be treated as commodities and the scalability is achieved only with the help of scale-out. On the other hand, they are inherently different. Therefore, nodes dedicated to computation may be separated from nodes dedicated to storage. This lets them both scale and load balance independently.

Range-based partitioning /indexing is much more beneficial for sequential access such as with stream which makes enumeration easier and faster because of the locality of a set of ranges. This helps with performance. Hash-based indexing is better when we have to fan out the processing in their own partitions for performance and all the hashes fall in the same bucket. This helps with load balancing.

Third, throttling or isolation is very useful when accounts are not well behaved.  The statistics are collected by the partition server which keeps track of request rates for accounts and partitions.  The same request rate may also be used for load balancing.

Wednesday, October 21, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Sampling: Sometimes it is difficult to estimate cost without actually visiting each and every value via estimation. If it were feasible to analyze and summarize the distribution of values with the help of histograms, then it is easier to make a call. Instead we could use sampling techniques to get an estimation without having to exhaust the scan.

Full Iterations are sometimes the only way to exhaust the search space. In top-down approach, at least an early use of cartesian product for instance can be helpful. This has been acknowledged even in the determination of plan space where the base tables are nested as right- hand inputs only after the cartesian product has been estimated.

Strategies never remain the same if the data and the business change. Consequently, even the longest running strategy is constantly re-evaluated to see if it can still perform as well. This has in fact been demonstrated in commercial database systems with the use of query compilation and recompilation and holds equally true for classifiers and other kinds of analysis.

Since strategy is best described by logic, it is very helpful to export is as a module so that it can run anywhere after being written once. This has been demonstrated by my machine learning packages and data mining algorithms regardless of the domain in which the data exists. At a low-level, the same applies to strategies within individual components because even if they are not immediately re-used, it will be helpful to have version control on them.

Optimization of an execution does not merely depend on the data and the strategy. It involves hints from the users, environmental factors and parameters. All of this play a role in driving down the costs and some are easier to tweak than others.