By Julienne Walker
License: Public Domain
A binary search tree is a data structure designed
for efficient search, insertion, and deletion in the presence of a large number
of items. While these operations are easy to implement for a basic binary search
tree, less trivial operations such as flexible non-recursive traversal are not as
simple to create. Also, basic binary search trees have uncomfortably common worst
case behavior that makes them impractical for many applications.
The initial solution to the problem of degenerate
trees is to maintain a balancing invariant, whereby the tree is restructured to
enforce a certain measure of balance when a new insertion or deletion causes the
invariant to be broken. Unfortunately, the one benefit of the basic binary search
tree, simplicity, is destroyed when a balanced tree is used. The algorithms for
balanced binary search trees are very complicated, very fragile, and few programmers
are willing to attempt an implementation.
Worse yet, balanced trees are very inflexible when
it comes to optimization. The structure is well defined, efficient algorithms are
known, and changes to those algorithms will typically result in subtle bugs. The
basic rule of thumb with balanced trees is to use a known recursive approach if
you do not need blazing speed, and a known non-recursive approach if you do. As
such, balanced trees do not encourage experimentation.
An alternative to balanced trees is to use randomization
to guarantee, with high probability, that a tree will not be so unbalanced as to
lose its O(log N) performance properties. These randomized trees are much simpler
that their balanced siblings and far safer than basic binary search trees, but non-recursive
implementations are often not immediately obvious. If an absolute guarantee of good
performance is not necessary, a recursive implementation and a slim chance of poor
performance is acceptable, randomized trees are a good solution.
However, this still does not solve the problem of
non-trivial operations. For example, an iterator based tree traversal is neither
simple in concept, nor efficient in practice. One tries to implement a tree traversal
such that clients can write code like this with a guarantee that each item is only
visited once, thus ensuring an efficient traversal:
reset ( tree )
while ( item := next ( tree ) ) != END do
# Do something with item
loop
Unfortunately, such a guarantee is not possible without
caching items in a linear data structure or saving extranenous traversal information
in the tree. The former is expensive when it comes to space, and the latter is complicated
to implement and in the end, has little real use. The reset operation would be forced
to start a search at the smallest item, thus incurring a logarithmic cost to find
the smallest item:
function reset: tree
begin
tree.walk := tree.root
if tree.walk = null then
return
endif
while tree.walk.left <> null do
tree.walk := tree.walk.left
loop
end
This alone would not be much of a concern, but for
each call to next, a stack must be maintained to keep track of the search path,
basically turning an iterative inorder traversal into a step-by-step iterative inorder
traversal, where the state is saved after each step so that it can be continued
with the next call to the function:
function next: tree
begin
if walk = null then
return END
endif
if walk.right <> null then
stack.push ( walk )
walk := walk.right
while walk.left <> null do
stack.push ( walk )
walk := walk.left
loop
else
while true do
last := walk
walk := stack.pop()
if last <> walk.right then
break
endif
loop
endif
return walk.item
end
One might argue that this is an elegant and efficient
algorithm, but it is not as efficient as we would like, and without extra help,
this is the best that binary trees can do.
Fortunately, there is a data structure that is often
simpler both in concept and in implementation. A skip list is a data structure that
conceptually uses parallel sorted linked lists. For example, consider a basic sorted
linked list:
1---2---3---4---5---6---7---8---*
This is clearly a linear structure. Searching, insertion,
and deletion will all take O(N) time. However, if we add a second list that skips
over every other node, we can make searches faster by searching the second list
and then moving down into the first list when a node that is larger than the search
item is found:
1-------3-------5-------7-------*
1---2---3---4---5---6---7---8---*
A search for 8 would result in four
nodes being traversed rather than seven. This is the basic idea behind a skip list,
but a skip list goes even further and uses M sorted linked lists to achieve O(log
N) performance. For example, by adding a third list that skips over every other
node in the second list, we achieve a structure that looks strikingly like a balanced
binary search tree:
1---------------5---------------*
1-------3-------5-------7-------*
1---2---3---4---5---6---7---8---*
Indeed, a skip list is really just a simulation of
a binary search tree using multiple linked lists running in parallel. Of course,
trying to maintain a perfectly balanced binary search tree using this simulation
is still expensive, and it is still complicated. However, by viewing a skip list
as a single sorted linked list, where each node has a column of links to other nodes
in the list, it is easy to see how randomly choosing a height for each new column
could potentially produce the same good behavior as a balanced skip list.
The original design of the skip list was as a randomized
data structure, much like randomized binary search trees. By eschewing a logarithmic
bound guarantee in favor of a logarithmic bound with high probability, a skip list
can enjoy the performance properties of a well balanced binary search tree (on average)
while avoiding many disadvantages of tree structures.
For example, while an iterator based traversal in
a binary search tree is a non-trivial problem, the same problem becomes a couple
of one-liners when using a skip list, and the best case of traversal performance
is achieved (each unique node is visited only once):
function reset: skip
begin
walk := skip[0]
end
function next: skip
begin
return ( walk := walk.next ) <> null
end
Notice that the above functions assume that a skip
list is implemented as an array of linked lists, with index 0 being the lowest level.
This is by no means the only way to implement skip lists, as we shall see shortly,
but it is by far the most common.
Random height
Selecting a random height for each node is not as
simple as just picking any old random number. An important aspect of a skip list
is the maximum height, because if more time is spent moving down to the next level
from a ridiculously high column compared to the rest, any gain from the skip list
could be lost. So an upper limit to the column height is necessary. Fortunately,
the height can be relatively small and still support a large list. A skip list with
a maximum height of 16, for example, could hold approximately 216 nodes.
Even a skip list of moderate height, when full, can quickly take up most of the
fast memory of modern systems (at the time of writing).
So it is reasonable to say that the maximum height
for a skip list will be somewhere between 16 and 24, for most real world applications.
If there are a small number of items then a less sophisticated structure is more
likely to be faster simply because it does less work for each operation. Anyway,
with the maximum height in hand, the problem is still not as simple as returning
a random number between 0 and MAX.
First, a random height generator should not return
0, because every valid node in a skip list must contain at least one link to the
next node on the lowest level. A height of 0 is illogical. Second, most random number
generators return a random number with uniform distribution, which means that every
height would be equally probable, and this does not encourage the efficient data
structure that we want.
What we need is a function that returns a weighted
random number based on a probability equation. To get the binary tree-like structure
in a skip list, each height must be about half as probable as the height below it,
or in more precise terms, the random height should be chosen with a probability
of 1/2. To find a number like this, one only has to perform a series of coin flips
until the maximum height is reached, or a probability test fails:
1 int rheight ( int max )
2 {
3 int h = 1; /* Never return 0 */
4
5 while ( h < max && rand() < RAND_MAX / 2 )
6 ++h;
7
8 return h;
9 }
This is a simple solution, but it is not practical
for the real world because it is too inefficient. Notice how rand()
is called for every iteration of the loop. Because max is expected to be relatively
small, we can expect that at most, max calls to rand() will be
made. However, this can prove to be a significant bottleneck in the skip list algorithm
if care is not taken.
Notice how we are always testing for “yes”
or “no”, up to a count of max. So rather than continually call
rand() and test whether it is bigger than half of RAND_MAX
or not, why not simply call rand() once and then use the bits of
that random number to determine the height? This bit stream approach is more efficient
because it only calls the expensive random number generator once. The solution is
simple. Just create a random number, then walk through the bits of the number testing
for a 0 or a 1. An implementation is longer, but straightforward:
1 int rheight ( int max )
2 {
3 static int bits = 0;
4 int h, found = 0;
5
6 for ( h = 0; !found; h++ ) {
7 if ( bits == 0 )
8 bits = rand();
9
10 found = bits % 2;
11 bits = bits / 2;
12 }
13
14 if ( h >= max )
15 h = max - 1;
16
17 return h;
18 }
This algorithm is tuned for speed without sacrificing
clarity. Notice how no test for h being greater than or equal to max is made until
outside of the loop. This is because if the random number has sufficiently random
bits, the loop will be short, and our probability requirements will be met. However,
even this can be improved by forcing all of the bits in the number to be used, thus
reducing the number of calls to the random number generator, and also by replacing
the divisions with faster bitwise operations:
1 int rheight ( int max )
2 {
3 static int bits = 0;
4 static int reset = 0;
5 int h, found = 0;
6
7 for ( h = 0; !found; h++ ) {
8 if ( reset == 0 ) {
9 bits = rand();
10 reset = sizeof ( int ) * CHAR_BIT;
11 }
12
13 found = bits & 1;
14 bits = bits >> 1;
15 --reset;
16 }
17
18 if ( h >= max )
19 h = max - 1;
20
21 return h;
22 }
One important point to keep in mind with the bit stream
solution is that it relies heavily on the bit length of the random number generator.
On many systems, rand() returns a 16-bit random number, so the
largest height available on such a system using rand() and the
bit stream solution is 15. Therefore, it makes sense to use a random number generator
that is known to have random bits and return at least a 32-bit value.
Also note that a probability of 1/2 is not the best,
theoretically. A probability of 1/3 has seen better performance in real applications,
and 1/e is the theoretical optimum. However, in my experience, there has been little
difference between the three, and 1/2 is easier to optimize for the bit stream approach.
Performance-wise, a skip list using 1/2 will have approximately half the links on
each level that are on the level below it, and on average, only two nodes per level
are visited during a search, so regardless of whether you choose 1/2, 1/3, 1/e,
or something else near that range, a skip list will be O(log N) with high probability.
Searching
Searching in a skip list is only marginally more difficult
than searching in a regular sorted linked list. However, because a skip list is
a two dimensional data structure, it must be noted that there are several ways to
implement one. The two most common implementations are the classic skip list and
the linked skip list, which use a linked list of arrays of links, and a two dimensional
network of nodes, respectively.
The classic skip list node and list structures will
look something like this in a realistic implementation:
1 struct jsw_node {
2 int data;
3 int height;
4 struct jsw_node **next;
5 };
6
7 struct jsw_skip {
8 struct jsw_node *head;
9 int height;
10 };
Notice how next is a pointer to a pointer so that
the size of a node is relatively small while still allowing memory to be allocated
to the column so that it can be used as an array of pointers to jsw_node. Also notice
that this, and most implementations, uses a header node that does not contain an
item. This simplifies the algorithms for insertion and deletion. The header node
should have the maximum height possible for the list, to avoid reallocation of memory.
However, for the purposes of this tutorial, a much
simpler alternative will be used so as to avoid memory allocations and better illustrate
the workings of the data structure:
1 struct jsw_node {
2 int data;
3 int height;
4 struct jsw_node *next[MAX];
5 };
6
7 struct jsw_skip {
8 struct jsw_node *head;
9 int height;
10 };
The height field in jsw_skip
is for the current height of the list, which will be less than an upper limit, in
this case defined as MAX with an unspecified value, which is used
as the parameter for the rheight function. The height
field changes if the height of a new node exceeds the current height of the list,
which starts at 0 for an empty list and never exceeds MAX - 1.
A linked skip list node and list structure typically
look like this, though the linked implementation is rare, and I have yet to see
one in practice, we will cover the algorithms in detail for completeness:
1 struct jsw_node {
2 int data;
3 struct jsw_node *next;
4 struct jsw_node *down;
5 };
6
7 struct jsw_skip {
8 struct jsw_node *head;
9 struct jsw_node *bottom;
10 int height;
11 };
The down and next
links do exactly what you would expect. The next link moves to
the next node on the current level and the down link moves to the
level below the node. What is not immediately obvious is the need for both a
head and a bottom node. Because the linked implementation
no longer uses an array of headers (though it could if you choose to do so), it
makes sense to save the beginning of the lowest level for traversals rather than
start at the top and move down to the bottom each time you want to traverse the
lowest level.
Now, you might think that MAX and
height are no longer needed, and while it is possible to implement
a linked skip list without them, it is much easier with them. MAX
can be variable in this implementation if you so choose, but for the purposes of
this tutorial, it will not change, just like in the classic implementation. Modification
of the algorithms to allow a dynamically sized maximum height is left as an exercise,
though keep in mind that such a change is probably not ever needed if MAX
is chosen carefully.
Classic search
The algorithm is simple. Start at the highest level,
follow the next link until an item that is larger than the key is found, then move
down to the next level and repeat until either the item is found, or until the spot
where the item would be if it were in the list is found. That is the basis of all
searching algorithms in a skip list, and searching is the basis of almost all other
operations on a skip list. The implementation for search in a classic skip list
is short and sweet:
1 int jsw_find ( struct jsw_skip *skip, int key )
2 {
3 struct jsw_node *it = skip->head;
4 int i;
5
6 for ( i = skip->height - 1; i >= 0; i-- ) {
7 while ( it->next[i] != NULL && key > it->next[i]->data )
8 it = it->next[i];
9 }
10
11 return it->next[0] ? it->next[0]->data : -1;
12 }
Notice how the search finishes on the node before
either the found item, or the place where the key would be. This can be useful for
modularizing a skip list so that only one search algorithm is implemented as a function
and all other algorithms that require a search can use the function. However, also
note that there is potential for the next link to be a null pointer, which requires
a bit of extra work to ensure that the correct type is returned and that a null
pointer is never dereferenced. In this example I make the assumption that
-1 will never be a valid item in the skip list. However, a more realistic
approach is to use a pointer to void for the item and simply return the next link
in the hope that the calling function will take care to test for a null pointer.
Linked search
Searching in a linked implementation is arguably even
simpler in concept than in a classic implementation because it involves straight
pointer chasing rather than a combination of pointer chasing and array indexing:
1 int jsw_find ( struct jsw_skip *skip, int key )
2 {
3 struct jsw_node *it = skip->head;
4
5 while ( it->down != NULL ) {
6 while ( it->next != NULL && key > it->next->data )
7 it = it->next;
8 it = it->down;
9 }
10
11 return it->next ? it->next->data : -1;
12 }
Finger search
Just like in Binary search trees, there is a locality
of reference issue where if certain groups of items are searched for regularly,
the skip list search still starts at the top and performs a complete search. This
is wasteful, and can easily be fixed using search fingers. A search finger is simply
the search path of the last successful or unsuccessful search. Future searches begin
by following the finger, rather than the full search path, in the hopes that there
will be locality of reference and the new item to be searched for is close to the
last search path. The simplest finger search by far is a dual search starting with
the finger:
1 int jsw_find ( struct jsw_skip *skip, int key )
2 {
3 struct jsw_node *it = fix[skip->height - 1];
4 int i = skip->height - 1;
5
6 for ( i = skip->height - 1; i >= 0; i-- ) {
7 while ( it->next[i] != NULL && key > it->next[i]->data )
8 it = it->next[i];
9 fix[i] = it;
10 }
11
12 if ( key != it->next[0]->item ) {
13 it = skip->head;
14
15 for ( i = skip->height - 1; i >= 0; i-- ) {
16 while ( it->next[i] != NULL && key > it->next[i]->data )
17 it = it->next[i];
18 fix[i] = it;
19 }
20 }
21
22 return it->next[0] ? it->next[0]->data : -1;
23 }
First the finger path is searched, and the finger
array is updated as the search progresses. If the finger search succeeds then the
algorithm returns successfully and the finger array reflects the most recent search.
If the finger search fails, a complete search is performed, taking care to also
update the finger array.
A fortunate side effect of traditional insertion and
deletion is the need for an auxiliary array of size MAX for maintaining
the search path. This is done so that the list can be correctly updated. The update
array can also be used as a finger if it is global or a member of the skip list
structure, because the last search will be saved in it unless explicitly cleared.
A finger search can be either significantly faster, or significantly slower, depending
on the locality of reference for a search, so if you use it, do so carefully.
Classic insertion
Insertion into a skip list is a simple extension of
the search algorithm. The only difference is that during the search, the path is
saved so that each level of the new column can be properly linked into the list.
Aside from that, insertion is just a repetitive insertion into a sorted singly linked
list. Of course, if the height of the new column is greater than the current height
of the list, the height of the list must grow to accomodate it. Consider the insertion
of 6 with height 4 into the following skip list:
*-----------3-----------7-----------*
*---1-------3---4-------7-------9---*
*---1---2---3---4---5---7---8---9---*
First, a search is performed, where the node before
where 6 would be is saved. Because the new node is taller than the list, the header
is used as the previous node for each level higher than the current list level.
Then the actual search begins and 3 is placed in the update array because 7 is greater
than 6, so 6 would be the next node:
fix[3] = head
fix[2] = 3
fix[1]
fix[0]
*===========3-----------7-----------*
*---1-------3---4-------7-------9---*
*---1---2---3---4---5---7---8---9---*
The search continues down at 3, then right to 4, and
4 is saved in the update array once again because 6 would go before 7 and 7 is directly
after 4:
fix[3] = head
fix[2] = 3
fix[1] = 4
fix[0]
*===========3-----------7-----------*
*---1-------3===4-------7-------9---*
*---1---2---3---4---5---7---8---9---*
The search moves down again because we are not at
the bottom level, and right to 5. The search stops at 5 because 7 is after 5, 6
would fit in that gap, and the bottom of the list has been reached. At this point
the update array is complete and the new node can be spliced into the list:
fix[3] = head
fix[2] = 3
fix[1] = 4
fix[0] = 5
*===========3-----------7-----------*
*---1-------3===<4-------7-------9---*
*---1---2---3---4===5---7---8---9---*
Splicing the new node into the list is a simple matter
of afixing 6's 0th next pointer to fix[0]'s next pointer, then
fix[0]'s next pointer to 6. Repeat the process for all of 6's height,
and the splice is complete:
fix[3] = head
fix[2] = 3
fix[1] = 4
fix[0] = 5
*=======================6---------------*
*-----------3===========6---7-----------*
*---1-------3---4=======6---7-------9---*
*---1---2---3---4---5===6---7---8---9---*
The code to perform this magic is short, simple, and
uses the same search algorithm except with a minor modification for adding to the
update array:
1 int jsw_insert ( struct jsw_skip *skip, int key )
2 {
3 struct jsw_node *it = skip->head;
4 struct jsw_node *item = new_node ( key );
5 int i;
6
7 for ( i = skip->height - 1; i >= 0; i-- ) {
8 while ( it->next[i] != NULL && key > it->next[i]->data )
9 it = it->next[i];
10 fix[i] = it;
11 }
12
13 if ( it->next[0] != NULL && it->next[0]->data == key )
14 return 0;
15 else {
16 int h;
17
18 if ( item->height > skip->height ) {
19 item->height = ++skip->height;
20 fix[item->height - 1] = skip->head;
21 }
22
23 h = item->height;
24
25 while ( --h >= 0 ) {
26 item->next[h] = fix[h]->next[h];
27 fix[h]->next[h] = item;
28 }
29 }
30
31 return 1;
32 }
Alternatively, you can avoid the auxiliary array by
splicing the links as you go down the search path, but this approach assumes that
the item is not already present in the list, or that duplicate keys are allowed.
Duplicate items may be allowed, in which case they will exhibit stack-like behavior,
but the typical use of a skip list will disallow duplicates. To disallow duplicates
in this apprach, an extra step that searches the list for existence is needed, otherwise
unwanted changes might be made to the structure before it is determined whether
or not a duplicate is present:
1 int jsw_insert ( struct jsw_skip *skip, int key )
2 {
3 if ( jsw_find ( skip, key ) == key )
4 return 0;
5 else {
6 struct jsw_node *it = skip->head;
7 struct jsw_node *item = new_node ( key );
8 int i;
9
10 if ( item->height > skip->height )
11 item->height = ++skip->height;
12
13 for ( i = skip->height - 1; i >= 0; i-- ) {
14 while ( it->next[i] != NULL && key > it->next[i]->data )
15 it = it->next[i];
16
17 if ( i < item->height ) {
18 item->next[i] = it->next[i];
19 it->next[i] = item;
20 }
21 }
22 }
23
24 return 1;
25 }
One of the key points in insertion is raising the
height of the list if the height of the new node is taller than the list. In these
examples, the list only every grows by one level, but a more common implementation
is to grow the list by the difference between the current height of the list and
the height of the new node. This variation is preferred because everyone probably
reads the pseudocode in skip list reports and not the text. ;-) The changes are
simple:
1 int jsw_insert ( struct jsw_skip *skip, int key )
2 {
3 struct jsw_node *it = skip->head;
4 struct jsw_node *item = new_node ( key );
5 int i;
6
7 for ( i = skip->height - 1; i >= 0; i-- ) {
8 while ( it->next[i] != NULL && key > it->next[i]->data )
9 it = it->next[i];
10 fix[i] = it;
11 }
12
13 if ( it->next[0] != NULL && it->next[0]->data == key )
14 return 0;
15 else {
16 int h;
17
18 while ( item->height > skip->height )
19 fix[skip->height++] = skip->head;
20
21 h = item->height;
22
23 while ( --h >= 0 ) {
24 item->next[h] = fix[h]->next[h];
25 fix[h]->next[h] = item;
26 }
27 }
28
29 return 1;
30 }
The top-down approach can also be modified for for
this variation:
1 int jsw_insert ( struct jsw_skip *skip, int key )
2 {
3 if ( jsw_find ( skip, key ) == key )
4 return 0;
5 else {
6 struct jsw_node *it = skip->head;
7 struct jsw_node *item = new_node ( key );
8 int i;
9
10 while ( item->height > skip->height )
11 ++skip->height;
12
13 for ( i = skip->height - 1; i >= 0; i-- ) {
14 while ( it->next[i] != NULL && key > it->next[i]->data )
15 it = it->next[i];
16
17 if ( i < item->height ) {
18 item->next[i] = it->next[i];
19 it->next[i] = item;
20 }
21 }
22 }
23
24 return 1;
25 }
Which of these approaches (forced growth by one or
unbounded growth) is better remains to be seen, though that leads me to believe
that either works just as well as the other. However, in my experience, the former
results in a faster algorithm, and the difference in structure is negligable, so
my preference is to limit list growth to a single level. The difference between
top-down and bottom-up insertion is also difficult to determine. The tradeoff of
speed for space is tempting, and the top down algorithm certainly has the benefit
of clarity. The choice, as usual, must be made on a case by case basis.
Linked insertion
Insertion into a linked skip list is a direct translation
of the description into some pointer chases. As with classic insertion, the links
can be added in a top-down manner as the search progresses, or by saving the previous
links in an array and splicing new links in after the search. However, there is
little point in using an auxiliary array as one of the primary benefits of a linked
skip list is the lack of need for arrays. As you already know, the top-down insertion
requires that a prior search for existence be made if the skip list does not allow
duplicate keys:
1 int jsw_insert ( struct jsw_skip *skip, int key )
2 {
3 if ( jsw_find ( skip, key ) == key )
4 return 0;
5 else {
6 struct jsw_node *it, *save = NULL;
7 int h = rheight ( MAX );
8 int i;
9
10 if ( h > skip->height ) {
11 skip->head = new_node ( -1, NULL, skip->head );
12 ++skip->height;
13 }
14
15 it = skip->head;
16
17 for ( i = skip->height; it != NULL; i-- ) {
18 while ( it->next != NULL && key > it->next->data )
19 it = it->next;
20
21 if ( i <= h ) {
22 if ( it->next == NULL || key != it->next->data )
23 it->next = new_node ( key, it->next, NULL );
24
25 if ( save != NULL )
26 save->down = it->next;
27
28 save = it->next;
29 }
30
31 it = it->down;
32 }
33 }
34
35 return 1;
36 }
This example shows a primary disadvantage of the linked
approach. Rather than simply creating an array of pointers, each column must be
constructed from multiple unique nodes. Not only is this a waste of space, it can
be an issue if extra data aside from the key is changed for a column. Either all
but the lowest level must be ignored, and thus result in an inaccurate list, or
extra work must be done to change every node in the column. It also results in more
calls to a memory manager, which is expensive. However, a free list of nodes can
improve this cost drastically if deletions are common.
A linked skip list has only three real advantages
over the classic skip list. First, a linked skip list is easier to visualize because
it does not attempt to form a hybrid of arrays and sorted linked lists. That concept
seems to be a stumbling block among beginning programmers and experienced programmers
who are comfortable with trees. As such, I have found the linked skip list to be
easier to implement correctly the first time whereas the classic skip list requires
a keen eye for subtle indexing errors.
The second advantage that a linked skip list has over
a classic skip list is that it is easier to convert a randomized skip list to a
balanced skip list efficiently and without bending over backward. We shall see why
this is so shortly. Finally, the linked skip list can more easily grow its maximum
height as a side effect of simply adding a new head node rather than growing a dynamic
array.
Aside from those two points, the classic skip list
dominates. It is faster, takes less space in memory, requires fewer comparisons
per search, requires fewer calls to the memory manager per insertion or deletion
(though a free list alleviates the problem in a linked solution), does not require
extra copies of the key, uses less expensive array indexing for one direction, has
slightly better cache performance due to the contiguous memory of arrays, and the
average and worst case time complexity is exactly the same. Concerning the growth
issue, because the maximum height will be relatively small for huge skip lists,
the cost of a few extra pointers in the head array of a classic skip list is minimal,
so the benefit of the linked skip list is also minimal.
All in all, while a linked skip list has a few advantages
and seems to be a superior first encounter for newcomers who are first learning
the data structure, the classic skip list has much better properties for the real
world. So my recommendation is to understand both, but unless you are teaching skip
lists to someone, stick to the classic algorithms.
Traversal
Traversing a skip list is incredibly simple. For the
classic skip list, it breaks down to traversing a linked list, starting at the lowest
level. There is no magic to traversal, but there are a few considerations when implementing
a skip list. First, the question of whether the list should be singly or doubly
linked arises. A doubly linked skip list on the lowest level can provide two-way
traversals with only a small amount of extra implementation effort. A doubly linked
skip list on all levels also encourages search optimizations based on interpolation.
If it could be more efficient to start a search at the back and move forward then
the speed of a search could be improved, and a doubly linked skip list allows that.
Another traversal consideration is traversal of the
other levels, or even a two dimensional traversal. The possibilities are endless,
and I encourage you to be creative in how you can traverse a skip list, because
you might find a use for it that you never saw before.
Classic traversal
The classic skip list promotes easy traversal because
you the 0th index of the head array can act as a starting point with no work at
all:
1 void jsw_traverse ( struct jsw_skip *skip, void (*action) ( struct jsw_node * ) )
2 {
3 struct jsw_node *it = skip->head->next[0];
4
5 while ( it != NULL ) {
6 action ( it );
7 it = it->next[0];
8 }
9 }
Linked traversal
A linked traversal is harder to design because the
skip list is meant to be searched efficiently from the top, not traversed from the
bottom. In a naive linked skip list, the traversal would end up starting at the
top, then finding the bottom, then performing a traversal in an L-shaped pattern:
1 void jsw_traverse ( struct jsw_skip *skip, void (*action) ( struct jsw_node * ) )
2 {
3 struct jsw_node *it = skip->head;
4
5 while ( it->down != NULL )
6 it = it->down;
7
8 while ( it != NULL ) {
9 it = it->next;
10 action ( it );
11 }
12 }
The linked skip list in this tutorial uses a bottom
pointer to keep track of the head of the lowest level so as to avoid that first
step. Now the traversal looks more like the classic traversal and is also more efficient
because it does not need to find the bottom first:
1 void jsw_traverse ( struct jsw_skip *skip, void (*action) ( struct jsw_node * ) )
2 {
3 struct jsw_node *it = skip->bottom->next;
4
5 while ( it != NULL ) {
6 action ( it );
7 it = it->next;
8 }
9 }
Classic deletion
Removing an item from a skip list is the inverse of
inserting one. As usual, a search is performed and the update array is used to save
the path. Then a simple linked list removal is performed for each level. The traditional
code for a classic skip list is:
1 int jsw_remove ( struct jsw_skip *skip, int key )
2 {
3 struct jsw_node *it = skip->head;
4 struct jsw_node *save;
5 int i;
6
7 for ( i = skip->height - 1; i >= 0; i-- ) {
8 while ( it->next[i] != NULL && key > it->next[i]->data )
9 it = it->next[i];
10 fix[i] = it;
11 }
12
13 if ( it->next[0] == NULL || it->next[0]->data != key )
14 return 0;
15 else {
16 save = fix[0]->next[0];
17
18 for ( i = 0; i < skip->height; i++ ) {
19 if ( fix[i]->next[i] != NULL )
20 fix[i]->next[i] = fix[i]->next[i]->next[i];
21 }
22
23 while ( skip->height > 0 ) {
24 if ( skip->head->next[skip->height - 1] != NULL )
25 break;
26
27 skip->head->next[--skip->height] = NULL;
28 }
29 }
30
31 free ( save );
32
33 return 1;
34 }
Just as with insertion, deletion can remove the item
as the search moves down the list rather than by building an array for fixing links.
This top-down structure change is actually a more elegant solution in the absence
of fingers because there is no need for an update array, and with a little extra
thought, there is no real need for an extra existence search beforehand as some
part of the search will be performed anyway. Of course, the code is easier to understand
with the preliminary search:
1 int jsw_remove ( struct jsw_skip *skip, int key )
2 {
3 if ( jsw_find ( skip, key ) != key )
4 return 0;
5 else {
6 struct jsw_node *it = skip->head;
7 struct jsw_node *save;
8 int i;
9
10 for ( i = skip->height - 1; i >= 0; i-- ) {
11 while ( it->next[i] != NULL && key > it->next[i]->data )
12 it = it->next[i];
13
14 if ( i == 0 )
15 save = it->next[0];
16
17 if ( it->next[i] != NULL )
18 it->next[i] = it->next[i]->next[i];
19
20 if ( skip->head->next[i] == NULL )
21 --skip->height;
22 }
23
24 free ( save );
25 }
26
27 return 1;
28 }
Linked deletion
The code for linked deletion is, like insertion, almost
identical to classic deletion. I hesitated to provide code for it to help keep the
tutorial shorter, but for completeness, here is the code for linked deletion using
a top-down approach that the linked structure encourages:
1 int jsw_remove ( struct jsw_skip *skip, int key )
2 {
3 if ( jsw_find ( skip, key ) != key )
4 return 0;
5 else {
6 struct jsw_node *it = skip->head;
7 struct jsw_node *save = NULL;
8
9