By Julienne Walker
License: Public Domain

New programmers who are introduced to binary search trees quickly learn that if items are inserted in certain orders, the performance of the tree degenerates into that of a glorified linked list. Many brain cells have been devoted to the task of finding efficient ways to avoid these worst cases. Many exceedingly clever solutions have been developed, but only a handful have made it into public knowledge and even fewer into common use. Of those solutions, height balanced trees are the most common, and the AVL tree is probably the oldest of the height balanced trees.

This tutorial will cover, often in painful detail, the concept behind AVL trees and implementation issues with a practical eye. If you are a graduate student looking for lemmas and proofs, or detailed mathematical forumlae proving the performance claims made, this tutorial is not for you. I am neither qualified nor compelled to present such things because AVL trees have been around for a very long time and their performance has been exhaustively studied. I see no point in restating what others have written better than I could. For the graduate students or anyone else interested, an excellent analysis of AVL trees is made in Donald Knuth's “The Art of Computer Programming”, volume 3.

If you are a struggling professional, or an amateur looking to broaden your skills, look no further. This tutorial is written for you, the average programmer who wants to learn what these mysterious “balanced trees” are without having to work through the impenetrable code of a research paper or textbook. Even if you don't care about the theory or the implementation and just want code to solve a problem, this tutorial provides several working versions of insertion and deletion in AVL trees. The code is all public domain, so do with it whatever you want as my gift to you.

I talk about the impenetrable code of papers and textbooks, but there are many who would call my own approach impenetrable as well. I don't use conventional solutions, so to minimize the shock of my implementations, let's cover some of the key ideas behind why my code is the way it is. Most noticeable is my use of an array of two links rather than two separate links in the node structure:

1 struct jsw_node { 2 int data; 3 struct jsw_node *link[2]; 4 };

The whole point of this is to avoid symmetric cases. Balanced tree algorithms can be very verbose because code needs to be duplicated for both the left and right subtree cases. Rather than suffer these symmetric cases, I prefer to merge them into a single case through the use of an easily computed array index. While a classic recursive binary search tree insertion would look like this:

1 struct jsw_node { 2 int data; 3 struct jsw_node *left; 4 struct jsw_node *right; 5 }; 6 7 struct jsw_node *make_node ( int data ) 8 { 9 struct jsw_node *rn = malloc ( sizeof *rn ); 10 11 if ( rn != NULL ) { 12 rn->data = data; 13 rn->left = rn->right = NULL; 14 } 15 16 return rn; 17 } 18 19 struct jsw_node *jsw_insert ( struct jsw_node *tree, int data ) 20 { 21 if ( tree == NULL ) 22 tree = make_node ( data ); 23 else if ( data < tree->data ) 24 tree->left = jsw_insert ( tree->left, data ); 25 else 26 tree->right = jsw_insert ( tree->right, data ); 27 28 return tree; 29 }

The same function with my idiom would merge the left and right cases into one through the use of a direction index:

1 struct jsw_node { 2 int data; 3 struct jsw_node *link[2]; 4 }; 5 6 struct jsw_node *make_node ( int data ) 7 { 8 struct jsw_node *rn = malloc ( sizeof *rn ); 9 10 if ( rn != NULL ) { 11 rn->data = data; 12 rn->link[0] = rn->link[1] = NULL; 13 } 14 15 return rn; 16 } 17 18 struct jsw_node *jsw_insert ( struct jsw_node *tree, int data ) 19 { 20 if ( tree == NULL ) 21 tree = make_node ( data ); 22 else { 23 int dir = tree->data < data; 24 tree->link[dir] = jsw_insert ( tree->link[dir], data ); 25 } 26 27 return tree; 28 }

Now, at this level there don't seem to be any savings, and if you're unfamiliar with the idiom, it might take a moment to verify the correctness of the algorithm. However, consider a balanced tree algorithm that does more than simply walk down the tree in each case:

1 struct jsw_node *jsw_insert ( struct jsw_node *tree, int data ) 2 { 3 if ( tree == NULL ) 4 tree = make_node ( data ); 5 else if ( data < tree->data ) { 6 tree->left = jsw_insert ( tree->left, data ); 7 /* Fifty lines of left balancing code */ 8 9 } 10 else { 11 tree->right = jsw_insert ( tree->right, data ); 12 /* Fifty lines of right balancing code */ 13 } 14 15 return tree; 16 }

That's 50 lines of extra code if you believe the comments, which is very hard for even a talented programmer to keep in his or her head all at once. In this case, my merging idiom is far more beneficial because it eliminates half of that code:

1 struct jsw_node *jsw_insert ( struct jsw_node *tree, int data ) 2 { 3 if ( tree == NULL ) 4 tree = make_node ( data ); 5 else { 6 int dir = tree->data < data; 7 tree->link[dir] = jsw_insert ( tree->link[dir], data ); 8 /* Fifty lines of dir balancing code */ 9 } 10 11 return tree; 12 }

Shorter functions are often easier to understand, and more importantly, easier to check for correctness. In a traditional balanced tree implementation, the author has to be very careful to make the left and right cases perfectly symmetric. If this is not done correctly, the code may work fine except in uncommon situations where it would fail mysteriously. By merging the cases together, this repetition is avoided and more effort can be devoted to making sure that a single case is correct, because if the single case is correct, the symmetric case is also correct. This is a basic tenet of good programming practice: avoid code repetition. Once this idiom is understood, everything will fall into place.

How does the comparison for calculating a direction index work though? Put simply, if you want to move down the right link, ensure that the comparison results in 1, otherwise the comparison will result in 0 and you will move down the left link. In the classic algorithm, the comparison “data < tree->data” moves you down the left link, but if data is less than tree->data, the result is 1 and that's the wrong direction. So you reverse the test by saying “tree->data < data”. If this test is true then data is greater than or equal to tree->data, and you want to move down the right link, which will happen because the result of the test is 1. Alternatively, you could use “data >= tree->data” with the same results. Consider a case where data is 5 and tree->data is 11:

1 5 < 11 == 1 /* Wrong! We want to go left */ 2 11 < 5 == 0 /* Correct! We want to go left */ 3 5 >= 11 == 0 /* Correct! */

Of course, this assumes that if duplicate items are allowed, they are placed in the right subtree, which is by far the most common method if duplicates are allowed in a binary search tree implementation. Now let's consider the opposite case, just for completeness. If data is 27 and tree->data is 19, the comparisons look like this:

1 27 < 19 == 0 /* Wrong! We want to go right */ 2 19 < 27 == 1 /* Correct! 1 is right */ 3 27 >= 19 == 1 /* Correct! */

 

Concept

Basic binary search trees can degenerate into linear data structures in three cases. First, the data could be inserted in ascending or descending sorted order. This results in trees with nothing but left or right links. The third case is far less common, but if items are inserted in alternating order from the outside in, the same degenerate case arises because there is only one choice at each node and the effect is a linear data structure:

 0                                  3          0

   \                              /              \

     1                          2                  3

       \          or          /          or      /

         2                  1                  1

           \              /                      \

             3          0                          2

This is basically an inefficient linked list, because the binary search tree algorithms expect each node to have two paths rather than one, whereas linked list algorithms are written with a linear structure in mind. The result is a linear data structure that has expensive algorithms. What we want in a binary search tree is a broad and flat structure, where each node has two links and any path is logarithmic to the number of nodes in the tree:

           3

       /       \

     1           5

   /   \       /   \

 0       2   4       6

Unfortunately, forcing such perfect balance is very expensive because it requires a global restructuring that visits almost every node in the tree and guarantees balance at each subtree. Alternatives for global balancing use an auxiliary array, which isn't much better. So we have to settle for less than perfect by only making local changes that meet a well chosen invariant. However, with the AVL invariant, these local changes result in a structure that is still very close to perfect.

A binary search tree that meets the AVL invariant is balanced if the height of a node's subtrees differ by no more than 1. In the following diagram, the first two trees are AVL trees but the third is not because the left subtree of 5 has a height of 2 while the right subtree is a null link and has a height of 0:

           3                        5                  5

       /       \                 /     \             /

     1           5             3         7         3

   /   \       /   \             \                   \

 0       2   4       6             4                   4

To maintain this invariant, certain structural changes must be made to force subtrees into balance. Of course, these structural changes need to be carefully considered because the AVL invariant is not the only rule that cannot be violated. AVL trees inherit the rule that all items in a node's left subtree must be lower in value and all items in the right subtree must be greater or equal in value. So to maintain the AVL invariant, we can't just splice nodes to our heart's delight and expect a valid binary search tree. Rotations must be used to ensure that both the binary search tree and AVL invariants are not violated.

It's easy to see that the AVL invariant can only be violated in two cases when the symmetric cases are merged. First, if a subtree is too long in the same direction, a single rotation can bring the structure back into balance without breaking the binary search tree invariant. In the next diagram, a left rotation at 3 is made to bring the subtree into balance:

 3                     4

   \                 /   \

     4       -->   3       5

       \

         5

The code to perform a single rotation is simple. Care must be taken to both rotate in the right direction and to correctly transfer the children of the affected nodes. In this case, 4's left child must be reassigned to 3's right child because 4's left child will be 3 and 4's right child is free. This function uses dir as the direction of the rotation, so a left rotation (such as above) will find dir as 0 when the function is called. In the above diagram, root would be 3, and save would be 4. The function returns the new root for reassignment into the tree.

1 struct jsw_node *jsw_single ( struct jsw_node *root, int dir ) 2 { 3 struct jsw_node *save = root->link[!dir]; 4 5 root->link[!dir] = save->link[dir]; 6 save->link[dir] = root; 7 8 return save; 9 }

The second case that would violate the AVL invariant is if a subtree is too long in the opposite direction, where two rotations are required to return the structure to balance. The first rotation is at the subtree, then the second is at the root. Notice how the binary search tree invariant is maintained throughout the entire operation, including the intermediate step of the first rotation:

 3             3                     4

   \             \                 /   \

     5   -->       4       -->   3       5

   /                 \

 4                     5

The code for a double rotation is only slightly longer than a single rotation and simply combines two rotations in opposite directions. Another option that demonstrates this is to call jsw_single twice in the body of jsw_double instead of doing the double rotation manually. However, to better show how a double rotation works, I've hardcoded the operations for now:

1 struct jsw_node *jsw_double ( struct jsw_node *root, int dir ) 2 { 3 struct jsw_node *save = root->link[!dir]->link[dir]; 4 5 root->link[!dir]->link[dir] = save->link[!dir]; 6 save->link[!dir] = root->link[!dir]; 7 root->link[!dir] = save; 8 9 save = root->link[!dir]; 10 root->link[!dir] = save->link[dir]; 11 save->link[dir] = root; 12 13 return save; 14 }

This is all well and good, but how does one determine when a subtree is out of balance in one of these ways? By storing extra balance information in each node. For AVL trees, the minimum extra space required is two bits. However, in C and C++ it's hard to get a portable and easy to use implementation that uses this minimum, so a variety of other methods are used in practice. The first method is the one that I will be using in this tutorial; a simple integer:

1 struct jsw_node { 2 int data; 3 int balance; 4 struct jsw_node *link[2]; 5 }; 6 7 struct jsw_tree { 8 struct jsw_node *root; 9 };

Variations of even this solution exist, with varying success and reasons. Most commonly, a signed or unsigned char is used with space efficiency in mind, other times a signed or unsigned short with similar reasoning. However, for certain implementations (of which one will be explored in this tutorial) a signed type is easier to understand because the balance information consists of the values -1, 0, and +1 for a taller left subtree, two subtrees of equal height, and a taller right subtree, respectively. These balance factors represent the difference in height between two subtrees. While an unsigned type will work under this scheme, it would confuse too many people for my tastes. Another method seen occasionally is a two bit bitfield.

The most important decision in setting up these balance factors is whether they should be bounded or unbounded. An AVL tree with bounded balance factors guarantees that the values will only be within a specified range. By using only two bits, the range is -1, 0, and +1, which is a little too restrictive for a simple implementation. Our bounded range will be -2, -1, 0, +1, and +2, with -2 and +2 being temporary values that signify a need to rebalance. Any range could be used, but balance factors in this range can be updated easily with minimal effort. Here are the trees given previously with bounded balance factors:

               3,0                         5,-1                 5,-2

           /         \                   /      \             /

       1,0             5,0          3,-1          7,0    3,+1

     /     \         /     \             \                    \

 0,0         2,0  4,0        6,0           4,0                  4,0

Another option is an unbounded balance factor. While this sounds like a limitation, in our case it really isn't. While one option is to use the number of nodes in a subtree as the balance factor, this could easily overflow a small integer. In this tutorial, we will be using an unbounded balance factor of the longest path. Under such a scheme the root of the tree will tell you how tall the tree is, which is useful information in and of itself. Unlike the number of nodes in a subtree, the longest path will not overflow an integer in the forseeable future, so there is no hidden limitation. Here are the same three trees with unbounded balance factors. Notice that the balance of each node is the longest path in its subtrees:

               3,2                        5,2                5,2

           /         \                  /     \            /

       1,1             5,1          3,1         7,0    3,1

     /     \         /     \            \                  \

 0,0         2,0  4,0        6,0          4,0                4,0

 

Bounded Insertion

The biggest problem with bounded insertion is that the code to correctly update balance factors after a rotation is somewhat intricate. All in all it's not difficult, but you have to be careful not to break the range or give the wrong balance to nodes. Fortunately, it all falls into a few simple cases and the code to implement those cases is relatively short and easy to follow. There are four cases for insertion into an AVL tree, one case requiring a single rotation and three cases requiring a double rotation. Put simply, if a single rotation is needed, the tree becomes more balanced and both the old parent and new parent are given a value of 0. The three cases for deletion depend on the initial balance of the lowest node that will become the new parent. The following diagram shows these four cases (note that I'm using the right path for this example, so the direction index would be 1. Also notice that an asterisk stands for balance factors that don't matter during intermediate rotations):

Single case:


     x,+2                         y,0

   /      \                     /     \

 A          y,+1     ->     x,0         C

          /      \        /     \

        B          C    A         B


Double case 1 (z is balanced):


     x,+2                   x,*                               z,0

   /      \               /     \                          /       \

 A          y,-1        A         z,*                  x,0           y,0

          /      \   ->         /     \         ->   /     \       /     \

      z,0          D          B         y,*        A         B   C         D

    /     \                           /     \

  B         C                       C         D


Double case 2 (z's right subtree is taller):


     x,+2                   x,*                               z,0

   /      \               /     \                          /       \

 A          y,-1        A         z,*                  x,-1          y,0

          /      \   ->         /     \         ->   /      \      /     \

      z,+1         D          B         y,*        A          B  C         D

    /      \                          /     \

  B          C                      C         D


Double case 3 (z's left subtree is taller):


     x,+2                   x,*                               z,0

   /      \               /     \                          /       \

 A          y,-1        A         z,*                  x,0          y,+1

          /      \   ->         /     \         ->   /     \      /      \

      z,-1         D          B         y,*        A         B  C          D

    /      \                          /     \

  B          C                      C         D

The code to implement this is simple, and can be modularized into a function. The best part is that the function can be reused for deletion as well because deletion has three similar cases for double rotations. There is a trick to this code, by the way. Instead of using -1 and +1 directly, I use the value of the direction index to determine which would be best suited for the current direction, thus removing an unnecessary symmetric case. This balance information varies with insertion and deletion, just like the direction index, so they are both arguments to the function. Follow the code closely and walk through each of the cases above to see how the diagrams translate into code:

1 void jsw_adjust_balance ( struct jsw_node *root, int dir, int bal ) 2 { 3 struct jsw_node *n = root->link[dir]; 4 struct jsw_node *nn = n->link[!dir]; 5 6 if ( nn->balance == 0 ) 7 root->balance = n->balance = 0; 8 else if ( nn->balance == bal ) { 9 root->balance = -bal; 10 n->balance = 0; 11 } 12 else { /* nn->balance == -bal */ 13 root->balance = 0; 14 n->balance = bal; 15 } 16 17 nn->balance = 0; 18 }

Of course, just knowing how to change the balance factors for a double rotation isn't enough. Another helper function that handles rebalancing for insertion must be used that actually performs the rotations and also fixes the balance factors for a single rotation. Remember that there are only two rotation cases, and jsw_adjust_balance fixes the balance factors for the double rotation case. So the code for jsw_insert_balance is short and sweet. This is the function that determines the value of bal and calls the rotation functions as necessary. Once again, follow through this code and see how it compares to each diagram above:

1 struct jsw_node *jsw_insert_balance ( struct jsw_node *root, int dir ) 2 { 3 struct jsw_node *n = root->link[dir]; 4 int bal = dir == 0 ? -1 : +1; 5 6 if ( n->balance == bal ) { 7 root->balance = n->balance = 0; 8 root = jsw_single ( root, !dir ); 9 } 10 else { /* n->balance == -bal */ 11 jsw_adjust_balance ( root, dir, bal ); 12 root = jsw_double ( root, !dir ); 13 } 14 15 return root; 16 }

These two functions are all that's needed to rebalance a violation of the AVL invariant during insertion. Now all we need to do is come up with the framework to insert a new node and update the balance factors. The good news is that from the parent of the newly inserted node, updating a balance factor is a simple matter of incremending the balance factor if we went down the right subtree or decremending the balance factor if we went down the left subtree. Beyond that it's just a basic insertion. However, during AVL insertion, only one call of jsw_insert_balance will ever be needed, and changing the balance factors above that part of the tree will break the data structure. So we need a way to stop updating after a rebalance has been performed. In an iterative version, this is a simple matter of breaking from a loop or returning from a function, but in the following recursive version a status flag is needed:

1 struct jsw_node *jsw_insert_r ( struct jsw_node *root, int data, int *done ) 2 { 3 if ( root == NULL ) 4 root = make_node ( data ); 5 else { 6 int dir = root->data < data; 7 8 root->link[dir] = jsw_insert_r ( root->link[dir], data, done ); 9 10 if ( !*done ) { 11 /* Update balance factors */ 12 root->balance += dir == 0 ? -1 : +1; 13 14 /* Rebalance as necessary and terminate */ 15 if ( root->balance == 0 ) 16 *done = 1; 17 else if ( abs ( root->balance ) > 1 ) { 18 root = jsw_insert_balance ( root, dir ); 19 *done = 1; 20 } 21 } 22 } 23 24 return root; 25 } 26 27 int jsw_insert ( struct jsw_tree *tree, int data ) 28 { 29 int done = 0; 30 tree->root = jsw_insert_r ( tree->root, data, &done ); 31 return 1; 32 }

There are only three cases for updating balance factors, since there are only three values in the range. If the balance factor was 0, then it is incremented or decremented and the updating continues up the tree. If the balance factor was -1 or +1 and becomes 0, the tree has become more balanced and no more updating or rebalancing is required. If the balance factor was -1 or +1 and becomes -2 or +2, rebalancing is required and jsw_insert_balance is called. After jsw_insert_balance, the tree is once again balanced and no more updating is performed. The code is simple, but I still recommend drawing out several tree insertions while following the algorithm closely to understand what is going on, or at least to get bragging rights for the bug report if my algorithm is wrong. ;-) Let's walk through a degenerate case to see that the AVL tree results in a well balanced tree. The degenerate case is an ascending sequence of integers, 0 through 9. Keep in mind that newly inserted nodes have subtrees of equal height (ie. two empty subtrees), so the balance factor of a node returned by make_node is 0 to show that it is balanced:

Insert 0:

   0,0

Insert 1:

   0,1

       \

         1,0

Insert 2:

       1,0

     /     \

 0,0         2,0

Insert 3:

       1,1

     /     \

 0,0         2,1

                 \

                   3,0

Insert 4:

       1,1

     /     \

 0,0         3,0

           /     \

       2,0         4,0

Insert 5:

             3,0

           /     \

       1,0         4,1

     /     \           \

 0,0         2,0         5,0

Insert 6:

                 3,0

           /            \

       1,0                5,0

     /     \            /     \

 0,0         2,0    4,0         6,0

Insert 7:

                 3,1

           /            \

       1,0                5,1

     /     \            /     \

 0,0         2,0    4,0         6,1

                                    \

                                      7,0

Insert 8:

                 3,1

           /            \

       1,0                5,1

     /     \            /     \

 0,0         2,0    4,0         7,0

                              /     \

                          6,0         8,0

Insert 9:

                   3,1

           /                \

       1,0                    7,0

     /     \                /     \

 0,0         2,0        5,0         8,1

                      /     \           \

                  4,0         6,0         9,0

Moving onward and upward, a lot of people flinch at recursive implementations because recursion has a bad reputation of being slow and limited by an unknown “stack” size. As such, it makes sense to consider the non-recursive implementation of an AVL tree. There are two, the first of which is a direct conversion of the recursive algorithm using a stack and bottom-up modifications. The code is longer, but the logic is identical once you get past the apparent complexity of using two stacks to save the traversal path. If you find it hard to understand, use the recursive code as an example and compare the similar parts, then you can more easily figure out how the rest of the code does what it does. If you've gone through Binary Search Trees I, you should have no trouble with the logic:

1 int jsw_insert ( struct jsw_tree *tree, int data ) 2 { 3 /* Empty tree case */ 4 if ( tree->root == NULL ) { 5 tree->root = make_node ( data ); 6 if ( tree->root == NULL ) 7 return 0; 8 } 9 else { 10 struct jsw_node *it, *up[50]; 11 int upd[50], top = 0; 12 13 it = tree->root; 14 15 /* Search for an empty link, save the path */ 16 for ( ; ; ) { 17 /* Push direction and node onto stack */ 18 upd[top] = it->data < data; 19 up[top++] = it; 20 21 if ( it->link[upd[top - 1]] == NULL ) 22 break; 23 24 it = it->link[upd[top - 1]]; 25 } 26 27 /* Insert a new node at the bottom of the tree */ 28 it->link[upd[top - 1]] = make_node ( data ); 29 if ( it->link[upd[top - 1]] == NULL ) 30 return 0; 31 32 /* Walk back up the search path */ 33 while ( --top >= 0 ) { 34 /* Update balance factors */ 35 up[top]->balance += upd[top] == 0 ? -1 : +1; 36 37 /* Terminate or rebalance as necessary */ 38 if ( up[top]->balance == 0 ) 39 break; 40 else if ( abs ( up[top]->balance ) > 1 ) { 41 up[top] = jsw_insert_balance ( up[top], upd[top] ); 42 43 /* Fix the parent */ 44 if ( top != 0 ) 45 up[top - 1]->link[upd[top - 1]] = up[top]; 46 else 47 tree->root = up[0]; 48 49 break; 50 } 51 } 52 } 53 54 return 1; 55 }

Instead of an implicit stack through recursion, this non-recursive version of AVL insertion caches only the required information using an explicit stack in the form of two arrays that hold the nodes along a search path and the directions taken at each node. Then the same rebalancing logic is used as the top of each stack is popped off in an upward traversal back to the root. Because a simple break can be used when one of the stopping cases is encountered, a status flag is no longer needed.

 

Bounded Deletion

Deletion from an AVL tree is only marginally more complicated than insertion. This is shocking to most people because deletion is always left as an exercise to the reader, but not before scaring the wits out of the reader by talking about how long and complicated it is and that the algorithms are beyond the scope of a textbook or paper devoted to the topic. In reality, there is only one extra case for rebalancing in AVL deletion, and that is a simple single rotation case. The extra complexity comes from the task of removing a node from the tree rather than any rebalancing effort. The five cases are as follows:

Single case 1 (y is not balanced):


     x,+1                         y,0

   /      \                     /     \

 A          y,+1     ->     x,0         C

          /      \        /     \

        B          C    A         B


Single case 2 (y is balanced):


     x,+1                         y,-1

   /      \                     /      \

 A          y,0     ->     x,+1          C

          /     \        /      \

        B         C    A          B


Double case 1 (z is balanced):


     x,+1                   x,*                               z,0

   /      \               /     \                          /       \

 A          y,-1        A         z,*                  x,0           y,0

          /      \   ->         /     \         ->   /     \       /     \

      z,0          D          B         y,*        A         B   C         D

    /     \                           /     \

  B         C                       C         D


Double case 2 (z's right subtree is taller):


     x,+1                   x,*                                z,0

   /      \               /     \                           /       \

 A          y,-1        A         z,*                  x,-1           y,0

          /      \   ->         /     \         ->   /      \       /     \

      z,+1         D          B         y,*        A          B   C         D

    /      \                          /     \

  B          C                      C         D


Double case 2 (z's left subtree is taller):


     x,+1                   x,*                               z,0

   /      \               /     \                          /       \

 A          y,-1        A         z,*                  x,0           y,+1

          /      \   ->         /     \         ->   /     \       /      \

      z,-1         D          B         y,*        A         B   C          D

    /      \                          /     \

  B          C                      C         D

Amazingly enough, all but one of those cases are solved problems from insertion. So the code to add the extra case and rebalance after a deletion is trivial:

1 struct jsw_node *jsw_remove_balance ( struct jsw_node *root, int dir, int *done ) 2 { 3 struct jsw_node *n = root->link[!dir]; 4 int bal = dir == 0 ? -1 : +1; 5 6 if ( n->balance == -bal ) { 7 root->balance = n->balance = 0; 8 root = jsw_single ( root, dir ); 9 } 10 else if ( n->balance == bal ) { 11 jsw_adjust_balance ( root, !dir, -bal ); 12 root = jsw_double ( root, dir ); 13 } 14 else { /* n->balance == 0 */ 15 root->balance = -bal; 16 n->balance = bal; 17 root = jsw_single ( root, dir ); 18 *done = 1; 19 } 20 21 return root; 22 }

Wait, why does the new case set the status flag to show that rebalancing is finished? Unlike insertion, an AVL deletion could potentially require rebalancing all of the way back up the tree, so there are fewer cases where the updating terminates early and this is one of them. In the new case, the the height of the tree doesn't change, so there is no need to continue rebalancing up the tree. An important key to how deletion works is to remember that instead of rebalancing the subtree that was increased after an insertion, we need to rebalance the opposite subtree that the node is deleted from. So look very closely at the values of dir and bal when jsw_remove_balance is called from the following AVL deletion algorithm:

1 struct jsw_node *jsw_remove_r ( struct jsw_node *root, int data, int *done ) 2 { 3 if ( root != NULL ) { 4 int dir; 5 6 /* Remove node */ 7 if ( root->data == data ) { 8 /* Unlink and fix parent */ 9 if ( root->link[0] == NULL || root->link[1] == NULL ) { 10 struct jsw_node *save; 11 12 dir = root->link[0] == NULL; 13 save = root->link[dir]; 14 free ( root ); 15 16 return save; 17 } 18 else { 19 /* Find inorder predecessor */ 20 struct jsw_node *heir = root->link[0]; 21 22 while ( heir->link[1] != NULL ) 23 heir = heir->link[1]; 24 25 /* Copy and set new search data */ 26 root->data = heir->data; 27 data = heir->data; 28 } 29 } 30 31 dir = root->data < data; 32 root->link[dir] = jsw_remove_r ( root->link[dir], data, done ); 33 34 if ( !*done ) { 35 /* Update balance factors */ 36 root->balance += dir != 0 ? -1 : +1; 37 38 /* Terminate or rebalance as necessary */ 39 if ( abs ( root->balance ) == 1 ) 40 *done = 1; 41 else if ( abs ( root->balance ) > 1 ) 42 root = jsw_remove_balance ( root, dir, done ); 43 } 44 } 45 46 return root; 47 } 48 49 int jsw_remove ( struct jsw_tree *tree, int data ) 50 { 51 int done = 0; 52 tree->root = jsw_remove_r ( tree->root, data, &done ); 53 return 1; 54 }

The code to rebalance is actually shorter than insertion because of fewer cases where the updating would terminate early. What makes this code longer is the extra work of deleting a node. The idea behind the recursive deletion is to search for the node to delete. Then once this node is found, if it has only one child or no children, simply replace the node with its child. If the node has two children, the inorder predecessor is found and its data is copied into the node to be deleted. This is where the tricky part comes in. Instead of stopping the recursion and bending over backward to remove the inorder predecessor, the recursion continues by replacing the search key with with the value of the inorder predecessor. This is somewhat confusing because the search key changes in the middle of the recursive path. But this tutorial is about AVL trees, not the variations of binary search tree deletion, so I highly recommend that you work through this function to get a feel for how it works.

Because the recursive code is potentially confusing, we'll look at a non-recursive version. But first, let's walk through the deletion of items from an existing AVL tree. This execution will continue what was started with insertion by tracing through the deletion of a degenerate case, 0 through 7:

Remove 0:

             3,1

     /                \

 1,1                    7,0

     \                /     \

       2,0        5,0         8,1

                /     \           \

            4,0         6,0         9,0

Remove 1:

             7,-1

           /      \

       3,1          8,1

     /     \            \

 2,0         5,0          9,0

           /     \

       4,0         6,0

Remove 2:

              7,-1

            /      \

       5,-1          8,1

     /      \            \

 3,1          6,0          9,0

     \

       4,0

Remove 3:

             7,0

           /     \

       5,0         8,1

     /     \           \

 4,0         6,0         9,0

Remove 4:

       7,0

     /     \

 5,1         8,1

     \           \

       6,0         9,0

Remove 5:

       7,1

     /     \

 6,0         8,1

                 \

                   9,0

Remove 6:

       8,0

     /     \

 7,0         9,0

Remove 7:

 8,1

     \

       9,0

The non-recursive code to remove from an AVL tree is identical to the recursive code in logic, and equivalent to the solution used for non-recursive insertion through the use of two stacks to save the search path. Rather than any sneaky tricks for handling the case where the node to be deleted has two children, a direct approach from Binary Search Trees I will be used:

1 int jsw_remove ( struct jsw_tree *tree, int data ) 2 { 3 if ( tree->root != NULL ) { 4 struct jsw_node *it, *up[32]; 5 int upd[32], top = 0; 6 int done = 0; 7 8 it = tree->root; 9 10 for ( ; ; ) { 11 /* Terminate if not found */ 12 if ( it == NULL ) 13 return 0; 14 else if ( it->data == data ) 15 break; 16 17 /* Push direction and node onto stack */ 18 upd[top] = it->data < data; 19 up[top++] = it; 20 21 it = it->link[upd[top - 1]]; 22 } 23 24 /* Remove the node */ 25 if ( it->link[0] == NULL || it->link[1] == NULL ) { 26 /* Which child is not null? */ 27 int dir = it->link[0] == NULL; 28 29 /* Fix parent */ 30 if ( top != 0 ) 31 up[top - 1]->link[upd[top - 1]] = it->link[dir]; 32 else 33 tree->root = it->link[dir]; 34 35 free ( it ); 36 } 37 else { 38 /* Find the ino