Cluster computing

Tuesday, September 20, 2022

Some rules, applications and guidelines for multitenant application:

When multitenant applications are developed, there are a few best practices that can be called out. This article covers some of them.

When naming objects pertaining to a tenant such as tables, pages, artifacts, using a prefix/suffix helps to reduce name collisions with those defined in others. The prefix/suffix must be at least 3 characters. The object/field name must start or end with a prefix/suffix. If a conflict arises, the one who registered the prefix always wins. Setting the prefix/suffix at the top level of objects is sufficient for domain specific objects. A tool can be used to detect missing prefixes or suffixes.

Instrumenting the application for telemetry. This data can be collected and visualized for analyzing the application against the desired business goals, troubleshooting and more.

One aspect of event logging is the data collection about the working and deployment infrastructure of an application to diagnose conditions and troubleshoot problems that affect its operation and performance. Application and database can both emit events. Metrics and logs can flow into a time-series database and used with read-only reporting stacks that render impressive visualizations. When these charts and graphs make it to a dashboard, the overall health of the system can be monitored.

Telemetry can be divided into different categories that include those for engineering purposes, those for business and those for customers. Custom telemetry signals can also be emitted. By default, the signals emitted by the system can be sent to multiple destinations such as event logs and application insights. Custom signals can enable sending data from anywhere in the application code to one of these destinations.

Testing the tenant specific extensions allows us to catch some of the basic errors that would otherwise be discovered only later. Install, uninstall, publish and unpublish are some of the actions that can guarantee proper testing in isolation. Sometimes a feature flag is sufficient and other times leveraging the application store workflow is preferable. Testing with the least required privileges is helpful to limit the impact to the system.

Documentation is easily overlooked even for key scenarios but including all the necessary information in such as details, inexperienced users, screenshots, prerequisites and setups if any, functionality walkthroughs and such others.

When writing code for web services, it is best to separate out the user interactivity. The web service must run independently of a user interface. Descriptive error messages can be propagated via the user interface. Building an advanced sample extension

Leveraging as much declarations as possible also avoids reduce changes to the code. Configurations and settings help in this regard.

Reference: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN    

Monday, September 19, 2022

Supportability features of databases:

This is a follow up to the previous article on the supportability features of databases:

There are dynamic management views available to query information on the following:

These dynamic management views include Always On availability groups, Change Data Capture, Change Tracking Related, Common Language Runtime, Database Mirroring, Database related, Execution related,  Extended events, Filestream and Filetable, Full-text search and semantic search, Geo replication, Index related, I/O related, PolyBase, Memory-Optimized, Object Related, Query Notifications related, Replication related, Resource Governor related, Security-related, Server-Related, Service Broker, Spatial Data, Azure Synapse analytics and Parallel data warehouse, SQL Server operating system related, stretch database and transaction related.

The DMV for always on availability groups describes threads, page repair, automatic seeding, availability group states, availability replica cluster nodes, availability replica cluster states, cluster members and networks, replica cluster states, node maps and listener states.

The DMV for change data capture contains one row for each error encountered during the change data capture log scan session.

The DMV for common language runtime provides information on clr appdomains, clr properties, loaded assemblies, and active tasks.

The DMV for mirroring related auto page repair returns a row for every automatic page-repair attempt on any mirrored database on the server instance.

The DMV for execution related information describes background queues, cached plan dependent objects, compute node errors, cursors, first result set for objects, distributed requests, function stats, query memory grants, query optimizer memory gateway, exec query parallel workers, exec query resource semaphores, exec query stats, exec requests, exec sessions, cached plans, distributed requests, dms workers and input buffer, query plans and query statistics.

The extended event DMVs provide information on map values, objects, packages, session event actions, targets and sessions.

The object related dynamic management views display stats properties and sql referencing entities, db_stats_histograms, and sql referenced entities.

The server related DMVs display information on memory dumps, server services, and registry.

The spatial data related DMVs provide information on the indexes and constraints containing UDT columns of spatial data types that will be disabled as a result of changing compatibility level in SQL Server

The transaction related DMVs provide information on current snapshots, database transactions, session transactions, current and active transactions and version stores.

Saturday, September 17, 2022

The language of templates

Introduction: Many infrastructure providers including the Azure public cloud favor templates that manifest infrastructure as a code. Azure offers a control plane for all resources that can be deployed to the cloud and services take advantage of them both for themselves and their customers. While Azure Functions allow extensions via new resources, Azure Resource provider and ARM APIs provide extensions via existing resources. This eliminates the need to have new processes introduced around new resources and is a significant win for reusability and user convenience. Resources and their extensions can be written only in Bicep and ARM templates. Bicep provides more concise syntax and improved type safety, but they compile to ARM templates which is the de facto standard to declare and use Azure resources and supported by the unified Azure Resource Manager. Bicep is a new domain-specific language that was recently developed for authoring ARM templates by using an easier syntax. Bicep is typically used for resource deployments to Azure. It is a new deployment-specific language that was recently developed. Either or both JSON and Bicep can be used to author ARM templates and while JSON is ubiquitous, Bicep can only be used with Resource Manager Templates. In fact, Bicep has tooling that converts Bicep templates into standard Json Templates for ARM Resources by a process called transpilation. This conversion happens automatically but it can also be manually invoked. Bicep is succint so it provides a further incentive. The use of builtin functions, conditions and loops for repetitive resources infuses logic into the ARM templates.

With the standardization of the template, it can bring consistency across services and their resources with added benefits like policy as a code and repeated deployments across clouds and regions. The need for region agnostic deployments cannot be over-emphasized for foundational services that struggle with limitations. There are many clouds and regions to support, and the task of deployment could have significant cost when the services groan without the availability of suitable ARM Templates.

Other infrastructure providers like Kubernetes have a language that articulates state so that its control loop can reconcile these resources. The resources can be generated and infused with specific configuration and secret using a configMap generator and a secret generator respectively. For example, it can take an existing application.properties file and generate a configMap that can be applied to new resources. Kustomization allows us to override the registry for all images used in the containers for an application. There are two advantages to using it. First, it allows us to configure the individual components of the application without requiring changes in them. Second, it allows us to combine components from different sources and overlay them or even override certain configurations. The kustomize tool provides this feature. Kustomize can add configmaps and secrets to the deployments using their specific generators respectively. Kustomize is static declaration. It allows adding labels across components. We can choose the groups of Kubernetes resources dynamically using selectors, but they must be declared as yaml. This kustomization yaml is usually stored as manifests and applied on existing components so they refer to other yamls. Arguably, yaml is the most succint format of templates.

Azure Blueprints can be leveraged to allow an engineer or architect to sketch a project’s design parameters, define a repeatable set of resources that implements and adheres to an organization’s standards, patterns and requirements. It is a declarative way to orchestrate the deployment of various resource templates and other artifacts such as role assignments, policy assignments, ARM templates, and Resource Groups. Blueprint Objects are stored in the CosmosDB and replicated to multiple Azure regions. Since it is designed to setup the environment, it is different from resource provisioning. This package fits nicely into a CI/CD pipeline and handles both what should be deployed and the assignment of what was deployed.

Yet none of the formats discussed so far make use of the technique popularized by dockerfiles that represent container images. An image is a collection of layers where each layer represents a set of file-system differences over a base but are themselves stored as folders and files. This not only makes the building of an image incremental, programmatic and re-usable but also interchangeable instead of merely a composite. Perhaps, that is the next evolutionary step to templates.

Friday, September 16, 2022

A note about supportability and tuning of databases

Introduction: This is a continuation of the article titled tuning of databases and linked here.

There is a lot of documentation on performance tuning of a relational database, but this article continues to cover some of those and supportability for a multitenant application. 

Monitoring is a critical aspect for determining how a server is performing. Effective monitoring involves taking periodic snapshots to isolate runaway processes. Optimal performance can be achieved by minimizing response times and maximizing throughput. Efficient network traffic, disk I/O, and CPU usage are key to peak performance.

There is a performance dashboard available for whether a database is experiencing a performance bottleneck. If the bottleneck is found, additional diagnostic data can be easily captured to resolve the problem. The dashboard can help with common problems pertaining to

CPU bottlenecks,

I/O bottlenecks
Missing indexes
Blocking
Resource contention

The performance dashboard also helps to identify expensive queries that may have been executed before, and several metrics that include CPU, Logical writes, Logical Reads, Duration, Physical reads, and CLR time.

The dashboard is divided into the following sections and sub-reports: System CPU utilization, Current waiting requests, Current activity, Historical information, and miscellaneous information.

The database also provides dynamic management views and functions.

There are server scoped dynamic management views and functions which require “VIEW SERVER STATE” permission on the server and then there are database scoped dynamic management views and functions which require view database state permission on the database.

These dynamic management views include Always On availability groups, Change Data Capture, Change Tracking Related, Common Language Runtime, Database Mirroring, Database related, Execution related,   Extended events, Filestream and Filetable, Full-text search and semantic search, Geo replication, Index related, I/O related, PolyBase, Memory-Optimized, Object Related, Query Notifications related, Replication related, Resource Governor related, Security-related, Server-Related, Service Broker, Spatial Data, Azure Synapse analytics and Parallel data warehouse, SQL Server operating system related, stretch database and transaction related.    

Thursday, September 15, 2022

A note about performance tuning the database server:

There is a lot of documentation on performance tuning of a relational database, but this article covers some of those for a multitenant application.

Multitenant databases tend to grow rapidly in number of instances. The purchasing model matters significantly for cloud SQL Databases.

There are two purchasing models: 1) Virtual v-core based purchasing model - this model provides a choice between a provisioned compute tier and a serverless compute tier. With the provisioned compute tier, the amount of computer resources is always provisioned for the workload. With the serverless compute tier, the autoscaling of the compute resources is specified instead. With autoscaling, the databases are paused and resumed with charges only for storage during period of inactivity.

2) the database transaction unit-based purchasing model – this model provides bundled compute and storage packages balanced for common workloads. The compute sizes are declared in DTUs for single databases and elastic DTUs for elastic pools. The vCore based model allows us to independently choose compute and storage resources. Customers prefer DTU-based for simple, preconfigured resource options and vCore-based for value flexibility, control, and transparency.

Storage costs are calculated differently based on each purchasing model. Storage is included in the price of the DTU. It is possible to add extra storage in the standard and premium tiers of a cloud SQL database.

Application characteristics determine both the purchasing model and the performance tuning of existing instances. Chatty applications make excessive data access operations that are sensitive to network latency. Batching ad-hoc queries or moving queries to stored procedures is an effective remediation in this case. These will reduce the number of data access operations to the database.

Another case where remediation is justified is when databases have an intensive workload that can’t be supported by an entire single machine. Scaling out of the workload is a suitable remediation in this case.

Applications that have suboptimal queries might not respond well to increased compute size. This is easy to spot with such queries in the data access layer and show symptoms such as missing ‘where’ clauses, missing indexes, or outdated statistics. These applications benefit from standard query performance-tuning. Besides query, applications may have inherent data access concurrency issues such as deadlocking and starvation. Reducing roundtrips between the database and the application by caching data on the client side is an effective remediation in this case.

Query tuning and usage of hints are standard query optimizer techniques. These apply to databases as well. Tuning queries might reduce aggregate resource demands. The query optimizer sniffs parameters and evaluates the current value of a parameter to determine whether it can generate a more optimal query plan. When this does not work sufficiently, query hints or directives can be specified and the default behavior via parameter sniffing can be overridden.

Customers generally find this on a case-by-case basis. In general, they hit capacity limits for individual databases more often. These capacity limits still exist for pooled databases and instance databases but there are two options to solve problems with large databases.

Cross-database sharding is a technique to spread database operations over multiple databases. The data is split on a single dimension into multiple databases. Splitting different functions of the application into different databases is another such technique. As the application becomes busier, the compute sizes for each function can vary because the load is split across multiple machines.

Reference: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN

Wednesday, September 14, 2022

There is a lot of documentation on performance tuning of a relational database, but this article covers some of those for a multitenant application.

Setting the SQL compatibility level to optimize database performance is just one of those. This will equip the database with the latest optimization features of the Azure SQL database or SQL server. It is particularly relevant for demonstration databases because the default compatibility level is usually old.

Keys are another consideration. When we define keys to improve performance, we must make note of the time it takes to complete a loop through the subset of records. Different sized subset of records will take different times but the performance deteriorates if the subset of records cannot be located or read efficiently. The best selection of keys will be such that it supports the code that needs to run on the data. Proper selection of such keys will maximize the performance.

Data access is another. Data that the client needs goes from the database server through the multitenant application to the client. If the data is cached, it is returned from the cache instead of the database and some time and cost can be saved. If the data isn’t cached, it is fetched from the database over the network and then saved in the cache. There might be more than one cache – say a global and a local where the global cache is for all users connected to the multitenant solution and a local cache is tenant-specific.

Connections to the database are yet another performance consideration. The multitenant solution provider must maintain a connection pooling which can dramatically decrease memory consumption by the multitenant application. The database connection pooling also simplifies deployment of the multitenant application as a three-tier architecture where the tiers are installed independently. Administrators are no longer required to manually create the SPNs and setup the delegation when the client, application and database server are hosted independently.

Data read/write performance requires significant investment. Record identifiers and SQL variant columns don’t prevent the use of BULK inserts. Filtering can usually be done with a single predicate. The exceptions in the multitenant application in which the filtering does not involve a single predicate is when the filtering is based on a computed column that requires external logic or there is a second predicate provided.

Adding constraints on a SQL column can also be useful to performance. The name of the constraint is not important if it is not used by another column in the database.

Clients can also be made performant and not just the servers. The hardware of the computer running the client, the choice of the browser, and the network bandwidth and latency between the client and the server are all important to boost the performance of the clients.

Tuesday, September 13, 2022

Performance for multitenant application

This is a continuation of the articles on multitenancy with the most recent one linked https://1drv.ms/w/s!Ashlm-Nw-wnWhLZnYUBoDUNcjAHNwQ?e=NAo7vM. This article focuses on performance.

The multitenant application discussed so far has an application server and a database. Performance is improved by 1. Writing efficient pages, 2. Efficient web services, 3. Efficient reports, 4. performance patterns, 5. Efficient data access, 6. Testing and validating performance, 7. Tuning the development environment 8. And using the profiler to analyze performance.

Efficient pages are written by using patterns that get a page to load faster. These include: avoiding unnecessary recalculation, 2. Doing less work, 3. And offloading the UI thread. Caching the data and refreshing the cache regularly avoids recalculation. This saves time each time the page is loaded. Querying objects are notorious for recalculation since they reach the database each time. Caching the results from an API works significantly better.

Reducing the amount of work also speeds things up. A simple page with few UI elements can also be ease of use and navigation. Removing calculated fields from lists if they aren’t needed and removing the field definition or page extension definition improves loading of pages that list data.

Creating dedicated lookup pages instead of the normal pages when dropdown like logic is involved, and removing triggers and fact boxes will help because a default page will render all controls

Offloading the UI thread with say page background tasks can get a more responsive and faster UI. Custom controls that require heavy duty logic can also be avoided.

Avoiding expose of calculated fields, avoiding heavy duty logic in pre and post handlers of getting records, refactoring the page and its code so that values are persisted can reduce performance hits. It is not recommended to use temp tables if there are many records. Fetching and inserting each record in a temp table without caching data can be detrimental to performance. If the number of records exceeds a hundred, this antipattern is easy to detect.

Parent and child records need not be inserted in parallel. This condition causes locks on parent and integration record tables because parallel calls try to update the same parent record. It is best to do it incrementally by allowing one to finish before another or by putting them in a transaction batch.

A deprecated protocol can be avoided. OData version 4 and APIs have best performance. API queries and pages are faster with newer technology stacks.

API pages and API queries are better than exposing ui pages as web service endpoints. If the latter must be implemented, then triggers need to run for all the records returned from the server. If we want OData endpoints that work as data readers, we can use API queries. OData has a few performance callouts such as limiting the set with $filter and $top if there’s an expensive $expand, using a transaction batch and read-only data access intent.

Large volumes of web service calls can cause stability and performance issues. It is important to understand the operational limits and to scale such that the load always falls under the limit. External applications can handle the HTTP Status codes 429 for too many requests and 504 for gateway timeout.

Handling status code 429 requires the client to adopt a retry logic while providing a cool off period. Retries can be regular interval, incremental interval, exponential backoff, and randomization. Status code 504 requires the client to refactor the long running request to execute within the time limit by splitting the request into multiple requests. Then the potential 429 codes can be handled by a backoff strategy. A common pattern is to implement a queue in the external application to flatten the spikes in the traffic. If the request gets a 429, it is put back in the queue and one of the retry strategies is applied.

Testing and validating the multitenant solution for performance is necessary prior to its planned deployment to production environments. Performance unit testing comes helpful in this regard and can include unit-tests that track the number of SQL statements or rows read. This can be used before and after the code to be tested. Also, assert statements can check for normal behavior. Performance toolkits are available to simulate the number of resources that tenants use in realistic scenarios to compare performance between builds of their solution. This toolkit might not measure throughput but given the time to execute key scenarios and the guidance provided by operational limits, it is possible to determine say how many orders are processed per hour. If there are queues involved, advanced analysis can be enabled. Performance telemetry includes database locks, long running operations, long running queries, page views, reports, sessions started, and web service requests.