Cluster computing

This is a continuation of the articles on multitenancy with the most recent one linked https://1drv.ms/w/s!Ashlm-Nw-wnWhLZnYUBoDUNcjAHNwQ?e=NAo7vM. This article focuses on performance.

Reports can be efficiently written. They can pertain to a single instance of an entity like an invoice or they can be more analytical in nature by performing joins across multiple instances. Faster reports can be enabled by performing read scale-out to read data from a read-only copy of the database, by using partial records to reduce the data loaded from the database, by using AL queries to optimize the way data is read from the database, and using word layouts and RDL layouts which can result in slower performance with document reports, especially for actions related to the user interface.

If a data lake or a data warehouse is involved, there are typically two types of data extractions: 1. a historical load with all the data from a given point-of-time and 2. delta loads on what changed since the historical load. The fastest and least disruptive way to get ahistorical load is usually a SQL BACPAC file that can be restored on a private database server. The fastest or the least disruptive way to get delta loads is to setup API queries configured with read scale-out and use of data audit field LastModifiedOn.

Efficient data access is at the heart of performance improvements. There are a lot of parallels that can be established with data access in a database using SQL, even if the multitenant application makes differences.

Many performance issues can be tracked down to missing indexes but index design does not always fall on the tenant or the multitenant solution developer. The indexes must be designed to conform the way data is accessed.

Keys are used to identify rows in table and can combine one or more columns in a table. Indexes speed up data retrieval. They serve different purposes. Records are stored sequentially in ascending order and sorted by the primary keys in clustered indexes involving primary keys. Records are sorted dynamically so the data is always structurally correct. Sorting allows faster data access via binary search.

Non-clustered column store indexes are also stored in tables. This index type can also achieve significant performance gains over traditional row-oriented storage. It also achieves data compression over the uncompressed data size on normal tables.

If the number of secondary keys that are marked as active are increased, performance can improve when the data is retrieved in several different sorting sequences because the data is already sorted. But this slows down insertion and deletion because indexes for each secondary key must be maintained.

IF the number of keys are small, performance increases because a minimal number of indexes are maintained. It slows when the data is retrieved because of missing secondary keys to get the appropriate sorting.

Schema synchronization errors can be minimized when primary keys are not deleted, the order of primary keys is not changed, more unique keys are not added and more clustered keys are not added.

SumIndexField Technology or SIFT for short, helps to quickly calculate the sum of numeric data type columns in tables even with thousands of records. Any field that has a numeric data type can be associated with a key as a SumIndexField. With a materialized view per SIFT key, this speeds up retrieval of data.

Databases are big about locking and logging. Performance issues that are not due to resource starvation can often be attributed to workers waiting on other workers to release locks on shared objects. Some tips to avoid locking include reading information before starting write operations, limiting the time locks are held, limiting the transaction size by dividing it into smaller number of operations and making sure indexes are present on ranges that are updated. Multitenant databases generally have less locking contention due to isolation.

Testing and validating the multitenant solution for performance is necessary prior to its planned deployment to production environments. Performance unit testing comes helpful in this regard and can include unit-tests that track the number of SQL statements or rows read. This can be used before and after the code to be tested. Also, assert statements can check for normal behavior. Performance toolkits are available to simulate the number of resources that tenants use in realistic scenarios to compare performance between builds of their solution. This toolkit might not measure throughput but given the time to execute key scenarios and the guidance provided by operational limits, it is possible to determine say how many orders are processed per hour. If there are queues involved, advanced analysis can be enabled. Performance telemetry includes database locks, long running operations, long running queries, page views, reports, sessions started, and web service requests

Cluster computing

Monday, September 12, 2022

No comments:

Post a Comment