2-3 Tree Files: tree23.c, tree23.h A B-tree maintains its balance by ensuring that the paths from the root to leaves are all equal in length. The 2-3 tree is one of several variations of B-trees. A node in a 2-3 tree has either two or three children, except for the special cases where the tree is empty or contains only one node. All data items in the tree are stored as leaf items of the tree. From the way the tree is maintained, leaf items are all stored at the same level in the tree. The internal nodes of the tree are used for guiding the search when trying to locate a data item in the tree. For this purpose, each internal node of the tree has two keys. A node of a 2-3 tree is illustrated below: ----------------------- | | | | key_item1 | key_item2 | | | | |-----------------------| | | | | | left | middle | right | | | | | --/--------|--------\-- / | \ ... ... ... Pointers to the left, middle, and right child nodes are maintained. Note that if the nodes children are leaves items of the tree, then these pointers will point to data items, rather than nodes. The keys of a node are determined as follows: key1 - holds the value of the smallest (or left-most) data item in the subtree rooted at the middle child. key2 - holds the value of the smallest (or left-most) data item in the subtree rooted at the right child. For the purpose of flexibility, the 2-3 tree implementation described here uses pointers to the corresponding data item for each key: key_item1 - points to the minimum data item in the subtree rooted at the middle child. key_item2 - points to the minimum data item in the subtree rooted at the right child. If a node in the 2-3 tree has three children, then all pointers will be used. If a node in the 2-3 tree has two children, then the right child pointer, and key_item2 will be NULL. For the special case where there is only one data item in the tree, the root node has all pointers NULL, except for the left child pointer, which points to the data item. For the case of an empty tree, the root node still exists, but has all child pointers and key items set to NULL. In the source files tree23.c and tree23.h, a union data type tree23_link_t is used which allows child pointers to point to either a node, or a data item. In this way, p->left.node specifies to the left child node of a node, p. Similarly, p->left.item specifies the left child data item of a node, p. In order to know what kind of child pointers a node has, the additional enumerated type field link_kind is used. This uses the value LEAF_LINK when child pointers specify data items, and the value INTERNAL_LINK when the child pointer specify nodes. The Find Operation ------------------ The find operation locates a data item stored in the 2-3 tree. It requires a pointer to a data item, key_item, and searches for a data item in the tree which has the same key value. The search begins from the root node. At each internal node in the search, we determine which child pointer needs to be followed: - if key_item2 is not NULL and key(key_item) >= key(key_item2) then follow the right child pointer. - else if key(key_item) >= key(key_item1) then follow the middle child pointer. - else follow the left child pointer. The search finishes once a leaf node is reached. Special cases, where there are only 0 or 1 items in the tree, are handled separately. If a matching data item is found, find returns a pointer to the item, otherwise find returns NULL. The Find-Min Operation ---------------------- Find-min returns the minimum data item in the 2-3 tree. It is possible to implement find-min efficiently by maintaining a pointer to the minimum data item in the tree. This is the method used in the source file tree23.c. If a minimum pointer is not maintained, find-min can be implemented by searching for the left-most data item in the tree. The Insert Operation -------------------- The insert operation inserts a new data item, u, into the 2-3 tree. In order to insert u into the tree, the correct insertion position must first be located. This is done using the same process as the find operation, eventually arriving at some data item at leaf level in the tree. Let p be the parent node of this data item and let x be the left child of p, y the middle child, and z the right child (if any). One of these children is the item that was located for the insertion position. If this item has the same key as u, then insertion fails, since duplication of keys in the tree is not allowed, otherwise insertion can continue. If p only has two children, then insertion of x is simple since the number of nodes under p is easily increased to three. The following insertion cases are possible: (y,_) ---> (y,u) / | / | \ insert beside y / | / | \ x y ^ x y u u (y,_) ---> (u,y) / | / | \ insert beside x / | / | \ x ^ y x u y u (y,_) ---> (x,y) (special case) / | / | \ / | / | \ ^ x y u x y u In the above illustration, the node is shown as (k1,k2) where k1 and k2 are the key items. Insertion of u will always occur to the left of the item that was located for the insertion position, except for the special case where u becomes the new minimum (or left-most) item in the entire tree. If u is not the minimum node, then key(u) > key(x) since this comparison was performed during the search when u was a key item of some node on the path to u. If p has three children then the insertion of u is more complicated. The following leaf-level insertion cases are possible: p (y,z) ---> p (y,_) q (u,_) insert new node, q: / | \ / | / | umin <- z insert beside z / | \ / | / | u <- q x y z ^ x y z u u p (y,z) ---> p (y,_) q (z,_) insert new node, q: / | \ / | / | umin <- u insert beside y / | \ / | / | u <- q x y ^ z x y u z u p (y,z) ---> p (u,_) q (z,_) insert new node, q: / | \ / | / | umin <- y insert beside x / | \ / | / | u <- q x ^ y z x u y z u p (y,z) ---> p(y,z) ---> p (x,_) q (z,_) / | \ / | \ / | / | / | \ / | \ / | / | ^ x y z u ^ y z u x y z u x (special case) insert new node, q: umin <- y u <- q Since a node cannot have 4 children a new internal node, q, is created which takes two of the children, so that both nodes p and q have two children. Node q must then be inserted beside node p at the next level up in the 2-3 tree. Insertion of internal nodes has slightly more complexity since the value of keys in sub-trees must be kept track of. For this purpose, umin is use for keeping track of the minimum data item in sub-tree, u. We use ymin and zmin to indicate the keys corresponding to the minimum data items in subtrees y and z respectively. At the start of the insertion, these are available as the key items of node p. The different cases for inserting an internal node are shown below: p (ymin,zmin) ---> p (ymin,_) q (umin,_) insert q: / | \ / | / | umin <- zmin insert beside z / | \ / | / | u <- q x y z ^ x y z u u p (ymin,zmin) ---> p (ymin,_) q (zmin,_) insert q: / | \ / | / | u <- q insert beside y / | \ / | / | x y ^ z x y u z u p (ymin,_) ---> p (ymin,umin) STOP / | / | \ / | / | \ x y ^ x y u u p (ymin,zmin) ---> p (umin,_) q (zmin,_) insert q: / | \ / | / | umin <- ymin insert beside x / | \ / | / | u <- q x ^ y z x u y z u p (ymin,_) ---> p (umin,ymin) STOP / | / | \ / | / | \ x ^ y x u y u Note that there is no special case when inserting an internal node. The special case only arises for inserting a leaf data item that is smaller than any other item in the tree. When insertion occurs at the root level of the tree, a new root node, q, is created, with the inserted node, u, and the old root node, r, as its children. The pointer to the root node of the tree is then updated to q. r (k,_) ^ ---> q (umin,_) STOP u / | / | r (k,_) u The Delete Operation -------------------- The delete operation deletes a data item, u, from the 2-3 tree. Item u is located using the same process as the find operation. As for insertion: p denotes the parent of u; x, y, and z denote the left, middle and right children of p respectively. Note that u is one of x, y or z. If the tree is empty then delete fails. There are also these special cases: Only one item in the tree (p is the root node): p(_,_) ---> p(_,_) STOP / / x ^ u Only two items in the tree (p is the root node): p(y,z) ---> p(_,_) STOP / | / | x y x ^ u p(y,z) ---> p(_,_) UPDATE_KEY and MERGE y / | / | x y y ^ u For all other cases, if p has three children, then deletion is easy since two children are left over. However if p has only two children, deletion becomes complicated since the 2-3 tree cannot allow a single child to be left over after deletion. The different cases when deleting a data item from the 2-3 tree are shown below: u = z: p(y,z) ---> p(y,_) STOP / | \ / | / | \ / | x y z x y ^ u u = y: p(y,z) ---> p(z,_) STOP / | \ / | / | \ / | x y z x z ^ u p(y,z) ---> p(_,_) MERGE x / | / | x y x ^ u u = x: p(y,z) ---> p(z,_) UPDATE_KEY and STOP / | \ / | / | \ / | x y z y z ^ u p(y,z) ---> p(_,_) UPDATE_KEY and MERGE y / | / | x y y ^ u Note that when u is a left child, there may be a key higher up in the tree which needs to be updated. The key corresponding to the most recent non-left branch traversed on the path to p is the correct key to update. The find process can be extended to remember this key. If u is the minimum item in the tree, then no such key will exist, and there is no need to update a key. In this case, if a pointer to the minimum data item in the tree is being maintained, it will need to be updated. Consider the cases where p has just a single child left over after deletion. Let this left over child be denoted by u, and its parent denoted by p. For now, assume that node p also has a parent, m. Data item u must be merged with child data items from a sibling of p. Let this sibling be denoted by q, and its children be denoted by x, y and z. The different cases for the MERGE u operation at leaf level are illustrated below: p is a right child: m(x,u) m(x,z) / | \ / | \ . | \ . | \ . | \ . | \ . | \ . | \ | \ | \ | \ | \ q(y,z) p(_,_) ---> q(y,_) p(u,_) STOP / | \ / | / | / | \ / | / | x y z u x y z u m(x,u) m(x,_) / | \ / | . | \ . | . | \ . | . | \ . | | \ | | \ | q(y,_) p(_,_) ---> q(y,u) STOP (node p is deleted) / | / | \ / | / | \ x y u x y u p is a middle child: m(u,?) m(z,?) / | \ / | \ / | ? / | ? / | / | / | / | / | / | / | / | q(y,z) p(_,_) ---> q(y,_) p(u,_) STOP / | \ / | / | / | \ / | / | x y z u x y z u m(u,k) m(k,_) / | \ / | / | . / . / | . / . / | . / . / | / / | / q(y,_) p(_,_) ---> q(y,u) STOP (node p is deleted) / | / | \ / | / | \ x y u x y u m(u,_) m(_,_) / | / | / | / | / | / | q(y,_) p(_,_) ---> q(y,u) MERGE q (node p is deleted) / | / | \ / | / | \ x y u x y u p is a left child: m(x,?) m(y,?) / | \ / | \ / | ? / | ? / | / | / | / | / | / | / | / | p(_,_) q(y,z) ---> p(x,_) q(z,_) STOP / | \ / | / | / | \ / | / | u x y z u x y z m(x,k) m(k,_) / | \ / | / | . / . / | . / . / | . / . / | / / | / p(_,_) q(y,_) ---> q(x,y) STOP / | / | \ (node p is deleted) / | / | \ u x y u x y m(x,_) m(_,_) / | / | / | / | / | / | p(_,_) q(y,_) ---> q(x,y) MERGE q (node p is deleted) / | / | \ / | / | \ u x y u x y For two of the merging cases shown above, the parent of node p is left with just one child, and merging must be carried out at the next level up in the tree. Maintaining keys in the tree requires slightly more insight when merging nodes, as opposed to merging leaf data items. Except for the maintenance of keys, the merging of nodes, is exactly the same as for leaf level data items. Labels x, y, z, and u, now denote nodes, rather than data items. The labelling xmin, ymin, umin, and zmin is used for labelling key data items to show where keys are moved around in the tree. p is a right child: m(xmin,umin) m(xmin,zmin) / | \ / | \ . | \ . | \ . | \ . | \ . | \ . | \ | \ | \ | \ | \ q(ymin,zmin) p(_,_) ---> q(ymin,_) p(umin,_) STOP / | \ / | / | / | \ / | / | x y z u x y z u m(xmin,umin) m(xmin,_) / | \ / | . | \ . | . | \ . | . | \ . | | \ | | \ | q(ymin,_) p(_,_) ---> q(ymin,umin) STOP / | / | \ / | / | \ x y u x y u p is a middle child: m(umin,?) m(zmin,?) / | \ / | \ / | ? / | ? / | / | / | / | / | / | / | / | q(ymin,zmin) p(_,_) ---> q(ymin,_) p(umin,_) STOP / | \ / | / | / | \ / | / | x y z u x y z u m(umin,k) m(k,_) / | \ / | / | . / . / | . / . / | . / . / | / / | / q(ymin,_) p(_,_) ---> q(ymin,umin) STOP / | / | \ / | / | \ x y u x y u m(umin,_) m(_,_) / | / | / | / | / | / | q(ymin,_) p(_,_) ---> q(ymin,umin) MERGE q / | / | \ / | / | \ x y u x y u p is a left child: m(xmin,?) m(ymin,?) / | \ / | \ / | ? / | ? / | / | / | / | / | / | / | / | p(_,_) q(ymin,zmin) ---> p(xmin,_) q(zmin,_) STOP / | \ / | / | / | \ / | / | u x y z u x y z m(xmin,k) m(k,_) / | \ / | / | . / . / | . / . / | . / . / | / / | / p(_,_) q(ymin,_) ---> q(xmin,ymin) STOP / | / | \ / | / | \ u x y u x y m(xmin,_) m(_,_) / | / | / | / | / | / | p(_,_) q(ymin,_) ---> q(xmin,ymin) MERGE q / | / | \ / | / | \ u x y u x y All of the merge cases illustrated above assumed that node p had a parent node, m. If node p is the root node, then m will not exist. For the case where node p is the root node of the tree, node p is deleted, and node u becomes the new root node in the tree. This is illustrated below: p(_,_) ---> u STOP u The Delete-Min Operation ------------------------ For the delete-min operation, the minimum data item in the tree is located and deleted from the tree. The source code for delete-min is simpler than that for delete, since only merge cases where p is a left child can occur. Unlike find-min, maintaining a minimum pointer will not help improve the efficiency of delete-min since the rearrangement of the 2-3 tree, which occurs after the delete, takes O(log n) worst case time. Note also that the minimum node must be located in order to obtain the path to the minimum node. The only way to avoid performing the search for the minimum node would be to maintain parent pointers for each node, or path information for the path to the minimum node.