NoSQL Databases and scalability.
Traditional databases such as SQL Server have a rich history
in scalability and performance considerations for user data storage such as
horizontal and vertical table and index partitioning, data compression,
resource governor, IO resource governance, partition table parallelism,
multiple file stream containers, NUMA aware large page memory and buffer array
allocation, buffer pool extension, In-memory OLTP, delayed durability etc. In
NoSQL databases, the unit of storage is the key-value collection and each row
can have different number of columns from a column family. The option for
performance and scalability has been to use sharding and partitions. Key Value
pairs are organized according to the key. Keys in turn are assigned to a
partition. Once a key is assigned to a partition, it cannot be moved to a
different partition. Since it is not configurable, the number of partitions in
a store is decided upfront. The number of storage nodes in use by the store can
however be changed. When this happens, the store undergoes reconfiguration, the
partitions are balanced between new and old shards, redistribution of partition
between one shard and another takes place.
The more the number of partitions the more granularities there are for
the reconfiguration. It is typical to have ten to twenty partitions per shard.
Since the number of partitions cannot be changed, this is often a design
consideration.
The number of nodes belonging to a shard called its
replication factor improves the throughput. The higher the replication factor,
the faster the read because of the availability but the same is not true for
writes since there is more copying involved. Once the replication factor is
set, the database takes care of creating the appropriate number of replication
nodes for each shard.
A topology is a collection of storage nodes, replication
nodes, and the associated services. At any point of time, a deployed store has
one topology. The initial topology is such that it minimizes the possibility of
a single point of failure for any given shard. If the storage node hosts more
than one replication node, those replication nodes will not be from the same
shard. If the host machine goes down, the shard can continue for reads and
writes.
Performance tuning on the NoSQL databases is an iterative
process. Topologies, replication factor, shards and partitions can be changed
to improve performance or to adjust for the storage nodes.
Courtesy:
Microsoft and Oracle documentations.
No comments:
Post a Comment