Now back to the post on in-memory applications. Let's look at some of the data structures that lends themselves to better concurrent programming and traditional data-store data structures. Skip List or balanced search trees etc are suitable because of the locks per node and the order in which they are applied.Let's take a look at skip lists and see how these are faster than say B+-trees. First, the skip lists have locks on each node for the purposes of localized updates. Second, the locks are needed only for two or three nodes at the same time for an insert or delete operation. Third, the locks are acquired in the order of the traversal. The readers and writers traversing the skip list will not pass each other. The locks taken in specific order will not cause deadlocks. In the case of the balanced search tree and say a red-black tree, each of the nodes have the locks but insert or delete does not acquire all the locks up until the root as this would throttle concurrency. The rotations and recoloring required for re-balancing the red black trees are performed with only three or four nodes at the same time. This is a bit more involved than skip lists.
As a practical example for say search over text where all the words have been populated in tuples of the word and the list of their occurrences, the skip lists enables for searching over a range of words by finding adjacent nodes in the skip lists and allowing for fast search, insert, delete operations. Data parallelism can also be used to make performance improvements to the same. Concurrency is also about hardware as much as it is for code. If the architecture involves memory synchronization between threads so that changes made by one is visible to another, there is a cost involved. In a shared nothing model, this cost is removed. Also, in a shared nothing model, the nodes can be added or removed from a cluster without affecting other operations. And moreover, in a model where the concurrency is distributed over shared nothing servers, the system can scale out to several thousands of nodes. Concurrency, however, is not the only thing to improve performance with.
As a practical example for say search over text where all the words have been populated in tuples of the word and the list of their occurrences, the skip lists enables for searching over a range of words by finding adjacent nodes in the skip lists and allowing for fast search, insert, delete operations. Data parallelism can also be used to make performance improvements to the same. Concurrency is also about hardware as much as it is for code. If the architecture involves memory synchronization between threads so that changes made by one is visible to another, there is a cost involved. In a shared nothing model, this cost is removed. Also, in a shared nothing model, the nodes can be added or removed from a cluster without affecting other operations. And moreover, in a model where the concurrency is distributed over shared nothing servers, the system can scale out to several thousands of nodes. Concurrency, however, is not the only thing to improve performance with.
No comments:
Post a Comment