Cluster computing

Saturday, May 14, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

Social engineering applications provide a wealth of information to the end-user, but the questions and answers received on it are always limited to just that – social circle. Advice solicited for personal circumstances is never appropriate for forums which can remain in public view. It is also difficult to find the right forums or audience where the responses can be obtained in a short time. When we want more opinions in a discrete manner without the knowledge of those who surround us, the options become fewer and fewer. In addition, crowd-sourcing the opinions for a personal topic is not easily available via applications. This document tries to envision an application to meet this requirement. 

The previous article continued the elaboration on the usage of the public cloud services for provisioning queue, document store and compute. It talked a bit about the messaging platform required to support this social-engineering application. The problems encountered with social engineering are well-defined and have precedence in various commercial applications. They are primarily about the feed for each user and the propagation of solicitations to the crowd. The previous article described selective fan out. When the clients wake up, they can request their state to be refreshed. This perfects the write update because the data does not need to be sent out. If the queue sends messages back to the clients, it is a fan-out process. The devices can choose to check-in at selective times and the server can be selective about which clients to update. Both methods work well in certain situations. The fan-out happens in both writing as well as loading. It can be made selective as well. The fan-out can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which clients are active over a period so that only those clients get preference.   

In this section, we talk about noisy neighbor. This antipattern occurs when there are many clients that can starve other clients as they hold up a disproportionate set of critical resources from a shared and reserved pool of resources meant for all clients. The noisy neighbor problem occurs when one client causes problem for another. Some common examples of resource intensive operations include, retrieving or persisting data to a database, sending a request to a web service, posting a message or retrieving a message from a queue, and writing or reading from a file in a blocking manner. There is a lot of advantages to running dedicated calls especially from debugging and troubleshooting purposes because the calls do not have interference, but shared platform enables reuse of the same components. The overuse of this feature can hurt performance due to the clients' consuming resources that can starve other clients. It appears notably when there are components or I/O requiring synchronous I/O. The application uses library that only uses synchronous methods or I/O in this case. The base tier may have finite capacity to scale up. Compute resources are better suitable for scale out rather than scale up and one of the primary advantages of a clean separation of layers with asynchronous processing is that they can be hosted even independently. Container orchestration frameworks facilitate this very well. As an example, the frontend can issue a request and wait for a response without having to delay the user experience. It can use the model-view-controller paradigms so that they are not only fast but can also be hosted such that clients using one view model do not affect the other.

This antipattern can be fixed in one of several ways. First the processing can be moved out of the application tier into an Azure Function or some background api layer. Clients are given promises and are actively monitored. If the application frontend is confined to data input and output display operations using only the capabilities that the frontend is optimized for, then it will not manifest this antipattern. APIs and Queries can articulate the business layer interactions such that the clients find it responsive while the system reserves the right to perform. Many libraries and components provide both synchronous and asynchronous interfaces. These can then be used judiciously with the asynchronous pattern working for most API calls. Finally, limits and throttling can be applied. Application gateway and firewall rules can handle restrictions to specific clients

The introduction of long running queries and stored procedures, blocking I/O and network waits often goes against the benefits of a responsive multi-client service. If the processing is already under the control of the service, then it can be optimized further.

There are several ways to fix this antipattern. They are about detection and remedy. The remedies include capping the number of client attempts and preventing retrying for a long period of time. The client calls could include an exponential backoff strategy that increases the duration between successive calls exponentially, handle errors gracefully, use the circuit breaker pattern which is specifically designed to break the retry storm. Official SDKs for communicating to Azure Services already include sample implementations of retry logic. When the number of I/O requests is many, they can be batched into coarse requests. The database can be read with one query substituting many queries. It also provides an opportunity for the database to execute it better and faster. Web APIs can be designed with the REST best practices. Instead of separate GET methods for different properties, there can be a single GET method for the resource representing the object. Even if the response body is large, it will likely be a single request. File I/O can be improved with buffering and using cache. Files may need not be opened or closed repeatedly. This helps to reduce fragmentation of the file on disk.

Friday, May 13, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

In this section, we talk about busy frontend. This condition occurs when there are many background threads that can starve foreground tasks of their resources which decreases response times to unacceptable levels. There are lots of advantages to running background jobs which avoid the interactivity for processing and can be scheduled asynchronously. But the overuse of this feature can hurt performance due to the tasks consuming resources that foreground workers need for interactivity with the user, leading to a spinning wait and frustrations for the user. It appears notably when the foreground is monolithic compressing the business tier with the application frontend. Runtime costs might shoot up if this tier is metered. An application tier may have finite capacity to scale up. Compute resources are better suitable for scale out rather than scale up and one of the primary advantages of a clean separation of layers and components is that they can be hosted even independently. Container orchestration frameworks facilitate this very well. The Frontend can be as lightweight as possible and built on model-view-controller or other such paradigms so that they are not only fast but also hosted on separate containers that can scale out.

This antipattern can be fixed in one of several ways. First the processing can be moved out of the application tier into an Azure Function or some background api layer. If the application frontend is confined to data input and output display operations using only the capabilities that the frontend is optimized for, then it will not manifest this antipattern. APIs and Queries can articulate the business layer interactions. The application then uses the .NET framework APIs to run standard query operators on the data for display purposes.

UI interface is designed for purposes specific to the application. The introduction of long running queries and stored procedures often goes against the benefits of a responsive application. If the processing is already under the control of the application techniques, then they should not be moved. If the front-end activity reveals significant processing and very low data emission, it is likely that this antipattern is manifesting.

Thursday, May 12, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

In this section, we talk about busy database. There are a lot of advantages to running code local to the data which avoids the transmission to a client application for processing. But the overuse of this feature can hurt performance due to the server spending more time processing, rather than accepting new client requests and fetching data. A database is also a shared resource, and it might deny resources to other requests when one of them is using a lot for computations. Runtime costs might shoot up if the database is metered. A database may have finite capacity to scale up. Compute resources are better suitable for hosting complicated logic while storage products are more customized for large disk space. The busy database occurs when the database is used to host a service rather than a repository or it is used to format the data, manipulate data, or perform complex calculations. Developers trying to overcompensate for the extraneous fetching symptom often write complex queries that take significantly longer to run but produce a small amount of data.

This can be fixed in one of several ways. First the processing can be moved out of the database into an Azure Function or some application tier. As long as the database is confined to data access operations using only the capabilities, the database is optimized and will not manifest this antipattern. Queries can be simplified to fetching the data with a proper select statement that merely retrieves the data with the help of joins. The application then uses the .NET framework APIs to run standard query operators.

Database tuning is an important routine for many organizations. The introduction of long running queries and stored procedures often goes against the benefits of a tuned database. If the processing is already under the control of the database tuning techniques, then they should not be moved.

Avoiding unnecessary data transfer solves both this antipattern as well as chatty I/O antipattern. When the processing is moved to the application tier, it provides the opportunity to scale out rather than require the database to scale up.

Wednesday, May 11, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

In this section, we talk about synchronous I/O. When there are many background threads that can starve other threads as they enter a wait state and do not perform any work while holding up critical resources, it hampers the crowdsourced application which must stay up-to-date despite the volume of traffic matching the levels of other social engineering applications. Some common examples of I/O include, retrieving or persisting data to a database, sending a request to a web service, posting a message or retrieving a message from a queue, and writing or reading from a file in a blocking manner. There is a lot of advantages to running calls synchronously especially from debugging and troubleshooting purposes because the call sequences are pre-established. But the overuse of this feature can hurt performance due to the tasks consuming resources on a spinning wait can starve other threads. It appears notably when there are components or I/O requiring synchronous I/O. The application uses library that only uses synchronous methods or I/O. The base tier may have finite capacity to scale up. Compute resources are better suitable for scale out rather than scale up and one of the primary advantages of a clean separation of layers with asynchronous processing is that they can be hosted even independently. Container orchestration frameworks facilitate this very well. As an example, the frontend can issue a request and wait for a response without having to delay the user experience. It can use the model-view-controller paradigms so that they are not only fast but can also be hosted on separate containers that can scale out.

It can be fixed in one of several ways. First the processing can be moved out of the application tier into an Azure Function or some background api layer. If the application frontend is confined to data input and output display operations using only the capabilities that the frontend is optimized for, then it will not manifest this antipattern. APIs and Queries can articulate the business layer interactions. Many libraries and components provide both synchronous and asynchronous interfaces. These can then be used judiciously with the asynchronous pattern working for most API calls.

Tuesday, May 10, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

In this section, we talk about caching. A no-caching antipattern occurs when the crowdsourced application handles many concurrent requests, and they fetch the same data. Since there is contention for the data access, it can reduce performance and scalability. When the data is not cached, it leads to many manifestations of areas for improvement. Degradation in response times, increased contention, and poor scalability are common examples.

Caching is sometimes out of scope of the architecture design or listed as option for operations to include via standalone independent products. Other times, the introduction of a cache might increase latency, maintenance and ownership and decrease overall availability. It might also interfere with existing caching strategies and expiration policies of the underlying systems. Some might prefer to not add an external cache to a database and only as a sidecar for the web services. It’s true that databases can cache even materialized views for a connection, but the addition of a cache lookup could be cheap in all cases where the compute in the deeper systems could be costly and can be avoided.

There are two strategies to fix the problem. The first one includes the on-demand network or cache-aside strategy. When the application tries to read the data from the cache, and if it isn’t there, it retrieves and puts it in the cache. When the application writes the change directly to the data source, it removes the old value from the source but refilled the next time it is required.

Another strategy might be to always keep static resources in the cache with no expiration date. This is equivalent to CDN usage although CDNs are for distribution. Applications that cache dynamic data should be designed to support eventual consistency.

No matter how the cache is implemented, it must support fallback to the deep data access when the data is not available in the cache. This Circuit-breaker pattern merely avoids overwhelming the data source.

Monday, May 9, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

In this section, we talk about content delivery network on Azure. This is a distributed network of servers that deliver web content for the crowdsourced application to users. It includes resources for web pages such as JavaScript, Stylesheet and HTML. CDNs that are closest to the application or clients are used so that there is little or no latency. Azure CDN can also accelerate dynamic content which cannot be cached, by leveraging networking optimizations such as the Point-of-Presence (POP) location and the route optimization via border gateway protocol. Benefits of using Azure CDN include better performance, large scaling and distribution of user requests.

Azure CDN performs geo-replication and automatic synchronization between virtual datacenters which is a term used to denote shared-nothing collection of servers or clusters. It leverages some form of synchronization with the help of say, message-based consensus protocol. Web-accessible storage is provided by Azure Storage, but the CDN is hosted as its own service and comes with its ARM resource. As with all Azure services, the CDN service also provisions an Azure resource backed by an Azure resource manager template. Azure CDN can be used for enabling faster access to public resources from Azure CDN POP locations, Improving the experience for users who are further away from data centers, supporting the Internet of Things by scaling to a huge number of devices that can access content, and handling traffic surges without requiring the application to scale.

Some of the challenges involved when planning CDN involve deployment considerations about where to deploy CDN and a few others. For example, these include versioning and cache control of the content, testing of the resources independent of the publications, search engine optimizations and content security. In addition, CDN service must provide disaster recovery and backup options so that the data is not lost and is highly available. System engineering design looks down upon CDN because of the costs involved. For example, it is easier to scale the servers without requiring the planning of content delivery network which saves costs because the resources are co-located and there are easier options to scale. The customer would integrate the publication of their content which can be done with the help of the CDN.

Sunday, May 8, 2022

This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context. 

We talk about databases to meet the transactional aspects of the processing on both sides of the campaign generation side and the response accumulation side. The relational data from both these sides will need a warehouse where analytical queries can be run for reporting stacks. Separation of read-only from read-write store helps with both performance and security.

The choice of relational/cloud databases is left outside this discussion. Instead, we focus on the choice of this warehouse. There are five major players – Azure, BigQuery, Presto, RedShift and Snowflake. The responses accumulation is inherently tied to users and the warehouse can expect a lot of users to be differentiated based on their campaigns and responses. The type of queries invoked on the data is only relevant based on its accumulation and not in the stream of responses. One response is just like another, and the queries have little or no benefit to processing them in a stream like manner as opposed to processing them after their accumulation both from the individual’s point of view as well as the administrator’s point of view. The warehouse is also able to reconcile campaign and response activities to remain a source of truth and maintain accuracy on the tally. It provides the ability to write queries in simple SQL language and comes free from maintenance when hosted in the cloud regardless of the size of the data accrued. Picking one or the other warehouse will enable separation of reporting stack and the fostering of the other microservices that may be envisioned for future offerings. For example, a campaign based on responses accumulation could be forked as its own campaign management microservice utilizing only the database and a message broker. The microservice model is also best suited for separation of concerns in promoting offerings from this one-stop shop for responses while the data layer remains the same. All the microservices are expected to be slim because there is only a connection facilitated between producers and consumers of responses. A virtual elastic warehouse is the right choice to make this connection because it facilitates all kinds of workflows associated with the data most of which are independent of the transactional processing. Even message brokers work well with warehouses when the warehouse accepts json. The archiving of response accumulation mentioned earlier can now be automated to be redirected to the virtual data warehouse using an automated ingestion capability.