Digital Search Tree

Files:  dst.c, dst.h

The digital search tree can be an alternative to the binary search tree if the
keys of items stored in the tree can be represented by unsigned integers.  Like
the binary search tree, the digital search tree has a binary tree structure.
The structure type for nodes used in the digital search tree is illustrated
below.

  A node in a digital search tree:

       -------------
      |    item     |
      |-------------|
      |    key      |
      |-------------|
      | a[0] | a[1] |
       --/-------\--
        .         .
       .           .


   item          - Points to the item that the node corresponds to.

   key           - The key value of the item indexed by the node.  Normally,
                   the key can be obtained by accessing the nodes item, but
                   since keys must be integers, a key field is provided in the
                   node for faster access.

   a[0] and a[1] - An array is used for the left and right pointers.  During
                   searching, this array is indexed using individual bits in 
                   the search key to traverse to the next node.


find()
------

Pseudo-code for the find() operation is given below.  The search for an item
with a key k, starts at the root node of the digital search tree, t.  The
pointer variable, p, points to the current node in the search, and max_b is the
key size in bits.


   p = t->root;
   i = max_b;

   loop {
       if p is null then the search has failed, exit loop;

       if p->key == k then a matching item has been found, exit loop;

       i = i - 1;

       /* Traverse left or right branch, depending on the current bit. */
       let j be the value of the (i)th bit of k;
       p = p->a[j];
   }


When the loop exits, if p is null then no matching item was found, otherwise
p->item points to the matching item.


find_min()
----------

Pseudo-code for the find_min() operation is given below.  As with the find()
operation the search for the node with the minimum key begins at the root node.


   if t->root is null then there are no items in the tree, return null.

   p = t->root;
   min_key = infinity;

   loop {
	if p->key <= min_key then {
            min_node = p;
	    min_key = p->key;
	}

	if p->a[0] is not null then p = p->a[0] else p = p->a[1];
             
	if p is null then return min_node->item;
   }


This code will return a pointer to the minimum item.  Otherwise if there are no
nodes in the tree it returns null.

To see how this code correctly locates the minimum node, consider the following
properties of the digital search tree:

  1. If node p is a right child and has a sibling node q, then all nodes in 
     tree(q) have a key which is greater than the key of node p.  Here tree(q)
     denotes the subtree rooted at node q.

  2. The key of node p is not necessarily minimum among the keys of ancestors
     and descendant nodes of p.

Because of property 2, during the search the code keeps track of the minimum
key seen so far, and its corresponding node.  Now consider this property at the
start of the first line of code in the search loop.  Upon reaching node p, we
must update the minimum key seen so far and the pointer to its corresponding
node.  Due to property 2 the search must consider descendants of node p.  If
node p has a right child, then the search continues to the right child, and the
left child need not be considered, because of property 1.  Otherwise if there
is no right child, the left child must be considered.  The search ends after
traversing a node with no children.


insert()
--------

To insert an item, with a key, k, we begin a search from the root node to
locate the insertion position for the item.


   if t->root is null then {
       t->root = new node for the item with key k;
       return null;
   }

   p = t->root;
   i = max_b;

   loop {
       if p->key == k then a matching item has been found, return p->item;

       i = i - 1;

       /* Traverse left or right branch, depending on the current bit. */
       let j be the value of the (i)th bit of k;
       if p->a[j] is null then {
           p->a[j] = new node for the item with key k;
           return null;
       }

       p = p->a[j];
   }


In the above pseudo-code, insertion fails if there is already an item with key
k in the tree, and a pointer to the matching item will be returned.  Otherwise,
when insertion is successful, a null pointer is returned.

When the new node, x, is created, its fields are initialised as follows.

   x->key = k;
   x->item = item;
   x->a[0] = x->a[1] = NULL;


delete()
--------

The delete() operation deletes an item with key k from the digital search tree.
Similar to the find() and insert() operations, a search obtains a pointer, p,
to the node holding the item to be deleted.

Since node p will no longer be needed in the tree it is deleted, but this means
that some other node must replace p.  This requires some rearrangement of nodes
in the tree.  Here we describe two possible rearrangement methods.

A property of digital search trees is that any node that is a descendant of p
can be used to replace it.  To see this, consider the leading bits of node p's
key which are used to locate p.  Descendant nodes of p also have these same
leading bits in their keys.

   Rearrangement Method 1

     1.  q = p;
     2.  If q has no child nodes then rearrangement can stop.
     2.  Otherwise let c be a child of node q.
     3.  Node c is moved up to fill the space left by node q.
     4.  The process continues from 1 to fill the space left from moving
         node c.

   Rearrangement Method 2

     1. If p has no child nodes then rearrangement is not needed.
     2. Traverse down the descendant nodes from p, until a leaf node, c, is
        reached.  (A leaf node has no children.)
     3. Node c is moved up to replace node p.


In rearrangement method 1, when a node, q, has two children, one must be chosen
to be moved up.  This could be done randomly, but random number generation
would take additional time.  Rather than generating a random number, the
`untraversed' bits of k can be used to choose the child, depending on whether
the current bit is 1 or 0.  Alternatively, the least significant bit of the key
of node q could be used.


delete_min()
------------

The delete_min() operation locates the minimum node, min_node, using the same
process as in the find_min() operation.  After locating the minimum node it is
removed.  If the minimum node is not a leaf, then it needs to be replaced with
some other node.  The search for the minimum node will have ended at some leaf
node, x.  Relocating node x to replace the minimum node is the simplest
rearrangement method, since it will avoid having to perform any additional
traversal of the tree.

The delete-min operation in a digital search tree requires a key comparison for
each node traversed on the search path in order to determine which node is
minimum.  In comparison, the binary search tree, avl tree, 2-3 tree and
red-black tree, only have to follow the left-most path in order to locate the
minimum node.