Monday, November 25, 2013

we mentioned external sorting and dictionaries in the previous post for sorting from very large data. Dictionaries are very large files too and they are implemented as an index to the actual file and contains the key and record address of data. A dictionary can be implemented with a red-black tree replacing pointers with offsets from the beginning of the index file, and use random access to reference nodes of the tree. However, every transition on a link could imply a disk access and would be prohibitively expensive. Since disk access is by sectors, its btter to group several keys together in each node to minimize the number of I/O operations This is what B-Trees do.
B-Trees arrange keys surrounded by pointers or offsets that are less than or greater than the key value.For example, all keys less than the key value are to the left and all keys greater than the key value are to the right. We can locate any key in this tree with fewer disk access than a regular tree because if we were to group 100 keys / node, we would search over 1,000,000 keys in with just these three nodes accesses. To ensure this property holds, we must maintain a balanced tree during insertion and deletion.
B-Trees come in many flavors. For example, we have B-Tree, B*-Tree and B+-Tree. The standard B-Tree stores keys and data in both internal and leaf nodes.B* trees are similar only that the nodes are kept 2/3 full for better utilization of space in the tree and better performance.
In a B+ tree, all keys are stored at the leaf level with their associated data values. Duplicates of the keys appear in internal parent nodes to guide the search. Also, the pointers have a slightly different meaning. The left pointer designates all keys less than the value, while the right pointer designates all keys greater than or equal to the value. The author also mentions a  B++-Tree that is similar to the B+ trees except for the split/join strategy i.e the keys are scattered to and gathered from other nodes for insertion and deletion.

No comments:

Post a Comment