By Julienne Walker
License: Public Domain

I'm confident that you already know what sorting is. Not only is it a concept that saturates computer science, it's also pretty intuitive. Basically you take things that are out of order, and put them into order. Examples of sorting are a phone book being ordered by name, or a coin machine that puts the same type of coins into those neat little packages so that the bank will give you paper money in exchange. :-) The idea behind sorting is simple. It makes life easier if things are in order, not the least important is finding things. It'd be a nightmare to try and find somebody's phone number if the phone book wasn't ordered by name.

Beyond the practical uses for sorting, the algorithms are fundamental to computer science, and fluency in the techniques used by sorting algorithms provides insight into the craft of programming in many unexpected areas. Sorting also gives one a certain measure of confidence in designing and analyzing algorithms because most of the techniques used appear in the context of sorting as well.

Finally, it has been said that computers have spent more time sorting data than anything else, so it makes sense to have a strong background in the concept and theory of sorting, as well as a firm understanding of the different sorting methods and their relative strengths and weaknesses. Since sorting efficiently is surprisingly complicated, programmers should try to avoid adding to the time wasted by using a poor algorithm for the job.

Many generic sorting libraries already exist and are ready to be used, such as C's qsort, or C++'s std::sort or std::partial_sort. However, these general algorithms are not always the best option. Yes, they should be first on your list of possible solutions because you're basically given a tested and debugged algorithm that's easy to use. That's far superior to writing your own algorithm, testing it, and debugging it before it can be used with any confidence.

However, there are two distinct problems with the standard library sorting algorithms. First, they're designed to work for as many cases as possible, and may not be practical or efficient for your needs. For example, if qsort is too slow, how to do you fix the problem? You can find another library, but what are the chances that it's any better than qsort? What if you need a sorting algorithm that's tailored to your data? The chances of finding a library that meets your needs are slim indeed unless your needs are very common.

Second, libraries are generally designed as black boxes, so users of the functions are have no idea what algorithm is actually being used. This isn't as much of a problem in the standard C++ library, where the sorting algorithms have well defined performance attributes that must be met, but many people are surprised to learn that qsort may actually be implemented as anything as long as it eventually sorts the array correctly. The name suggests the quicksort algorithm, but in reality it could be bubble sort, or worse, a very poor implementation of quicksort that works for many cases but runs very very slow in common worst cases.

As such, it's beneficial to understand what algorithms are available, what their advantages and disadvantages are, and how to implement your own algorithm that's custom tailored to your data if the need arises. And you can brag that you know half a dozen sorting algorithms and sneer at your friends who can barely write bubble sort off the top of their heads. Bragging rights are a pretty darned good reason to learn something in my opinion. ;-)

 

Sorting Considerations

To sort or not to sort. That's the question, isn't it? Sorting an unordered array or list is time consuming, even if you do it efficiently, so it makes sense to avoid sorting when you can. That usually amounts to using a better data structure such as a hash table (where sorting probably isn't necessary), or a binary search tree (where items are sorted by default). Sometimes you don't even need to sort! It could be just as fast, or faster, to just brute force your way through things. It really isn't noticeably slower to use a sequential search on an array of about 50 items than it is to sort the same array and then use a binary search on it. To use a crude analogy, it doesn't take a wrecking ball to destroy a sand castle.

Okay, let's pretend that we want to sort. What about stability? Stability, you say? What's stability? No, it's not a part of my psychological diagnosis that's immediately preceeded by “lack of”. Stability is what happens when two items compare as equal when sorting. There are two kinds of sorting algorithms concerning stability, stable and unstable (obviously!). The only difference is how they treat duplicate items.

In a stable algorithm, two items that compare as equal will retain their original relation to each other as in the original unsorted array or list. This makes absolutely no difference if the items being compared consist only of, say, integral values. When 2 compares equal to 2, it really doesn't matter which 2 is placed before the other in the final sorted sequence. However, consider the more common situation where items consist of more than one value, and only one of those values is used as the item for sorting:

 |(5:4)|(6:6)|(3:1)|(5:5)|(1:0)|(3:2)|(4:3)|(6:7)|

When sorting on the first half of the item, a stable sort has to give a result that retains the original order of the second half of the item. In other words, the second values have to be in sorted ascending order after the sort is finished, or the algorithm isn't stable:

 |(1:0)|(3:1)|(3:2)|(4:3)|(5:4)|(5:5)|(6:6)|(6:7)|

An unstable sort has no such restriction. Of the algorithms that we'll cover in this tutorial, the following are naturally stable, but the others can be made stable at the cost of using more space or taking more time. For example, selection sort is not usually stable, but can easily be made so by inserting the new item (shifting all items after its location to make a hole, like insertion sort) instead of simply swapping items:

What about performance? This is a tricky area because sorting algorithms are all across the board. The performance complexities you'll see primarily are O(N2), O(Nlog N), and O(N). It should be relatively obvious that the upper bound for any sorting algorithm is infinite as long as it manages to sort the items eventually. Naturally, this suggests a realistic upper bound of O(N!), because sorting can be thought of as picking a sorted permutation of the items. However, the canonical awful algorithm, called bogosort, is probably the worst possible sorting algorithm. Bogosort is equivalent to throwing a deck of cards in the air, then testing to see if they're sorted when you pick them up at random. Bogosort could potentially never end due to the its random nature.

Lower bounds aren't as obvious. The lowest possible bound for most sorting algorithms is Ω(Nlog N). This is because most sorting algorithms use item comparisons to determine the relative order of items. Any algorithm that sorts by comparison will have a minimum lower bound of Ω(Nlog N) because a comparison tree is used to select a permutation that's sorted. A comparison tree for the three numbers 1,2, and 3 can be easily constructed:

                             1 < 2

               1 < 3                       1 < 3

       2 < 3           3,1,2       2,1,3           2 < 3

  1,2,3     1,3,2                            2,3,1     3,2,1

Notice how every item is compared with every other item, and that each path results in a valid permutation of the three items. The height of the tree determines the lower bound of the sorting algorithm. Because there must be as many leaves as there are permutations for the algorithm to be correct, the smallest possible height of the comparison tree is logn!, which is equivalent to Ω(Nlog N).

So are we doomed to Ω(Nlog N) as the best possible performance for sorting algorithms? Well, yes, if you use item comparisons to determine relative order. Therefore, the most general of sorting algorithms will be limited by this lower bound. However, it's possible to meet the safe lower bound of O(N) for sorting. We'll look at some of these algorithms later when we look at the counting and radix sorts.

It's easy to see that O(N) is a safe lower bound on general sorting algorithms for the obvious reasons; you can't sort what you haven't seen. Well, technically, you could rely on the law of quantum indeterminacy whereby not observing any of the items, you place them in a state of superposition and it's both sorted and unsorted, and the sorting process would be O(1). But as of yet, nobody has suggested a practical use for such a sorting algorithm, or even whether it would work correctly in anything but a theoretical sense. :-)

 

Preliminaries

Before we start, there are a few helper functions that I'll be using in much of the code. Rather than keep repeating myself, I'll supply it all here. The first is a swap function that takes pointers to two items and swaps the items:

1 void jsw_swap ( int *a, int *b ) 2 { 3 int c = *a; 4 *a = *b; 5 *b = c; 6 }

Next, it's useful to be able to quickly determine if an array is already sorted or not for testing, and we'll use it as an optimization feature later, so here is a function that returns 1 if the array is sorted and 0 if it isn't:

1 int is_sorted ( int a[], int n ) 2 { 3 int i; 4 5 for ( i = 0; i < n - 1; i++ ) { 6 if ( a[i] > a[i + 1] ) 7 return 0; 8 } 9 10 return 1; 11 }

Finally, some of the algorithms work with linked lists. To save you the trouble of figuring out what I mean, here's the node and list structure that I built the lists with:

1 struct jsw_node { 2 int data; 3 struct jsw_node *next; 4 }; 5 6 struct jsw_list { 7 struct jsw_node *head; 8 int size; 9 };

 

Selection Sort

One of the two most obvious ways to sort data is to find the largest (or smallest) item in the collection, and then move it to the back. This straight selection process is repeated for all other items in the list except for the last of them, then again for all items except for the last two, and so on until all items are ignored. When all items are ignored, they are all in their sorted position, and the algorithm is complete. An example of this simple process follows:

 |41|67|34|00|69|24|78|58|62|64<
 |41|67|34|00|69|24|64|58|62<78|
 |41|67|34|00|62|24|64|58<69|78|

 |41|58|34|00|62|24|64<67|69|78|
 |41|58|34|00|62|24<64|67|69|78|
 |41|58|34|00|24<62|64|67|69|78|
 |41|24|34|00<58|62|64|67|69|78|
 |00|24|34<41|58|62|64|67|69|78|

 |00|24<34|41|58|62|64|67|69|78|
 |00<24|34|41|58|62|64|67|69|78|
 <00|24|34|41|58|62|64|67|69|78|

The code to do this is as simple as finding the largest item in the array. A few notes should be made though. If the largest item is the current end of the array, no swap needs to be done, so we use a conditional test to avoid a function call and the operations of swapping an item with itself. This algorithm looks at every item in the array to find the largest. It has to, because no other order has been imposed on the array to make selection easier, so it's slow:

1 void jsw_selection ( int a[], int n ) 2 { 3 while ( --n > 0 ) { 4 int i, max = n; 5 6 for ( i = 0; i < n; i++ ) { 7 if ( a[i] > a[max] ) 8 max = i; 9 } 10 11 if ( max != n ) 12 jsw_swap ( &a[n], &a[max] ); 13 } 14 }

There are a few important little factoids about straight selection sort that you need to remember. First, all of the items have to be present before the sort can begin. That means that selection sort isn't a viable solution for sorting items as they come though an input stream or a random number generator. The array has to be completely filled before you can sort it. Second, straight selection sort is slow because the process of selection is slow. Since straight selection is an O(N) operation, and the algorithm does it approximately N times, straight selection sort is a quadratic sorting algorithm. Quadratic sorts are the slowest practical algorithms you'll see in the real world. Finally, selection sort is very efficient in terms of data movement. If comparisons are significantly cheaper than swaps, selection sort performs well.

As presented above, the straight selection sort is not stable. That is, if the items have more data attached to them that aren't a part of the comparison, the relative order of duplicate items is unspecified by the algorithm. To make the sort stable such that duplicate items retain their relative order, we can select the last of any duplicates and shift instead of swap Of course, this removes the primary benefit of selection sort, lack of data movement, because we're moving items around in the shift:

1 void jsw_selection ( int a[], int n ) 2 { 3 while ( --n > 0 ) { 4 int i, max = n; 5 6 for ( i = 0; i < n; i++ ) { 7 if ( a[i] >= a[max] ) 8 max = i; 9 } 10 11 if ( max != n ) { 12 int save = a[max]; 13 14 for ( i = max; i < n; i++ ) 15 a[i] = a[i + 1]; 16 17 a[n] = save; 18 } 19 } 20 }

Let's look at how that works with a quick trace by adding some extra data to each item and throwing in a few duplicates. Notice how the secondary numbers are in the same order after the sort as before the sort. For example, 69,4 is in front of 69,5 both before and after the algorithm touches them:

 |41,0|41,1|34,2|00,3|69,4|69,5|78,6|78,7|62,8|64,9<
 |41,0|41,1|34,2|00,3|69,4|69,5|78,6|62,8|64,9<78,7|
 |41,0|41,1|34,2|00,3|69,4|69,5|62,8|64,9<78,6|78,7|

 |41,0|41,1|34,2|00,3|69,4|62,8|64,9<69,5|78,6|78,7|
 |41,0|41,1|34,2|00,3|62,8|64,9<69,4|69,5|78,6|78,7|
 |41,0|41,1|34,2|00,3|62,8<64,9|69,4|69,5|78,6|78,7|
 |41,0|41,1|34,2|00,3<62,8|64,9|69,4|69,5|78,6|78,7|
 |41,0|34,2|00,3<41,1|62,8|64,9|69,4|69,5|78,6|78,7|

 |34,2|00,3<41,0|41,1|62,8|64,9|69,4|69,5|78,6|78,7|
 |00,3<34,2|41,0|41,1|62,8|64,9|69,4|69,5|78,6|78,7|
 <00,3|34,2|41,0|41,1|62,8|64,9|69,4|69,5|78,6|78,7|

 

Heap Sort

The straight selection sort is probably the simplest sorting algorithm, but that simplicity is really the only benefit. Selection sort has quadratic time complexity, which is the worst complexity of the practical sorting algorithms that you'll see in the real world. There are a handful of quadratic sorting algorithms in common use, and while selection sort isn't the worst of them, you're really comparing the lowest common denominators with each other. Maybe we should work smarter instead of trying to pick the best of the worst. ;-)

The very name of selection sort suggests a way to improve the algorithm. Since the biggest problem with selection sort is the linear time it takes to select the largest item, we can find the most improvement by solving the selection problem more efficiently. In real code, we would do that with a priority queue, where items are ordered so that extracting the largest item is quick and easy. The code to do that if we assume an existing priority queue library is simplicity itself:

1 void jsw_selection ( int a[], int n ) 2 { 3 pqueue pq; 4 int i; 5 6 for ( i = 0; i < n; i++ ) 7 pq_insert ( pq, a[i] ); 8 9 while ( !pq_empty ( pq ) ) 10 a[--n] = pq_max ( pq ); 11 }

Okay, but what if that nifty little library doesn't already exist? Well, we need to do it manually. The best known priority queue implementation is implemented with a max heap. The heap data structure has two very simple rules. It's a complete binary tree where the children of a node cannot be larger (or smaller if you want a min heap) than their parent. If you have a valid max heap then the largest item is the root of the tree. This structure can be easily converted into an array because the tree is complete, so there are no gaps:

 |962|500|724|467|464|334|478|358|41|169|

 is equivalent to

                           962

                    /              \

                500                   724

           /           \            /     \

       467               464    334         478

     /     \           /

 358         41    169

By converting an array into a heap structure, we can then easily extract the largest item from index 0. After doing that, the heap needs to be fixed so that it's still a valid heap again, and the process is repeated. This method of sorting is called heap sort, and it's one of the faster sorting algorithms with a worst case that has the same time complexity as the average case! By working smarter with the selection problem, we've turned one of the worst sorting algorithms into one of the best. Not bad, eh? :-)

Cool, so how do we do all of that junk? Well, naturally we need to figure out how to force an array to have the heap property. Basically, we can say that for each index i, the child nodes are at i * 2 + 1 and i * 2 + 2. In the above tree, we can see that 467 is a child of 500. In the array, 500 is at index 1 and 1 * 2 + 1 is index 3, which is where 467 is in the array. Try this with any of the indices and you'll see that this property holds for all of them. Also notice that the nodes with children are roughly the first half of the array and the nodes without children are the last half. So to build a heap, all we need to do is force the heap property from the lowest nodes to the root. Let's look at an example:

 |1|67|34|0|69|24|78|58|62|64|


                         1

                 /              \

              67                  34

         /          \           /    \

       0              69     24        78

     /   \          /

 58        62    64

To build a heap out of this array (I'll also include a tree representation to make things more obvious), we need to work with subtrees first. We start with 69 and its one child. Since 64 is smaller than 69, the heap property holds. Then we move to 0 and its children. 62 is the largest of the three, so we swap 62 and 0. Note that the relative order of the children is irrelevant:

 |1|67|34|62|69|24|78|58|0|64|


                         1

                 /              \

              67                  34

         /          \           /    \

       62             69     24        78

     /    \         /

 58         0    64

Now we move up a level and all the way to the right. In the array we're just decrementing the parent index though. At 34, we test the 3 nodes and find that 78 is the largest. So we swap 78 and 34 to restore the heap property:

 |1|67|78|62|69|24|34|58|0|64|


                         1

                 /              \

              67                  78

         /          \           /    \

       62             69     24        34

     /    \         /

 58         0    64

Now we go left to 67 and its two children. Since 69 is the greatest of the children and is greater than 67, we swap 67 and 69:

 |1|69|78|62|67|24|34|58|0|64|


                         1

                 /              \

              69                  78

         /          \           /    \

       62             67     24        34

     /    \         /

 58         0    64

Finally, at the root of the tree, testing 1 and its children we find that 78 is greater than both 1 and 69, so we make 78 the new root of the tree. Uh oh, this isn't a valid heap though. 78 is greater than 1, but 1 isn't greater than either 24 or 34, so we've got a problem. It appears that we can't just do local changes to a parent and its two children, we need to push the parent down as long as it violates the heap property. We can do this because we build a heap from the bottom up, so the lower subtrees are always valid heaps unless a change higher up the tree violates it. Let's do the final step by fixing 78's right subtree:

 |1|69|78|62|67|24|34|58|0|64|


                        78

                 /             \

              69                 34

         /          \          /    \

       62             67    24        1

     /    \         /

 58         0    64

Now we have a valid heap! Let's write a jsw_do_heap function that walks down the tree, making sure that the heap property is met. This is a fairly obvious method. The outer loop of jsw_do_heap handles cases where multiple parents need to be fixed down the tree. The first step inside the loop finds the largest child, the second step swaps the largest child and the parent if the child is larger. Notice how the i * 2 + 1 trick is used to find the first child. jsw_heapsort is just a translation of the priority queue pseudocode:

1 void jsw_do_heap ( int a[], int i, int n ) 2 { 3 int k = i * 2 + 1; 4 5 while ( k < n ) { 6 if ( k + 1 < n && a[k] < a[k + 1] ) 7 ++k; 8 9 if ( a[i] < a[k] ) 10 jsw_swap ( &a[i], &a[k] ); 11 12 k = ++i * 2 + 1; 13 } 14 } 15 16 void jsw_heapsort ( int a[], int n ) 17 { 18 int i = n / 2; 19 20 while ( i-- > 0 ) 21 jsw_do_heap ( a, i, n ); 22 23 while ( --n > 0 ) { 24 jsw_swap ( &a[0], &a[n] ); 25 jsw_do_heap ( a, 0, n ); 26 } 27 }

This is a surprisingly simple algorithm for how good heap sort is supposed to be, but it's not very fast, and we can do much better with only minor changes to jsw_do_heap. The current algorithm walks down every node in the tree until the end of the array. That means we're doing a lot of unnecessary work if only one push down is required in an array of 1,000,000 items. We also don't need to call jsw_swap in the loop. We can simply save the first root that we're pushing down, move children into its place, and after the loop is finished, put that saved node into its final resting place:

1 void jsw_do_heap ( int a[], int i, int n ) 2 { 3 int k = i * 2 + 1; 4 int save = a[i]; 5 6 while ( k < n ) { 7 if ( k + 1 < n && a[k] < a[k + 1] ) 8 ++k; 9 10 if ( save >= a[k] ) 11 break; 12 13 a[i] = a[k]; 14 k = ++i * 2 + 1; 15 } 16 17 a[i] = save; 18 }

That's much faster than our last attempt, but because we're only incrementing i by 1 each time, we're basically fixing the entire heap when we only need to fix the subtree that we swapped with, not both subtrees. So by making the simple change of setting i to k instead of incrementing it, we improve the speed of the algorithm by several orders of magnitude. jsw_heapsort doesn't change throughout all of these improvements, and we come to the final (and fastest) incarnation of heapsort for this tutorial:

1 void jsw_do_heap ( int a[], int i, int n ) 2 { 3 int k = i * 2 + 1; 4 int save = a[i]; 5 6 while ( k < n ) { 7 if ( k + 1 < n && a[k] < a[k + 1] ) 8 ++k; 9 10 if ( save >= a[k] ) 11 break; 12 13 a[i] = a[k]; 14 i = k; 15 k = i * 2 + 1; 16 } 17 18 a[i] = save; 19 }

Okay, I'm a liar. :-) We'll look at another alternative before moving on. A lot of people are surprised to learn that heap sort can be implemented any number of ways. The heap property is a very flexible rule when it comes to how you want to enforce it, so there can be a lot of variation in how heap sort is implemented. The best way to find alternative implementations is to think about how heaps work and look for patterns. One pattern in particular that we will look at is the relationship between each index and its child indices. If you think about the simple calculation for finding a child, you'll come up with the following pattern:

 |41|67|34|0|69|24|78|58|62|64|
 |41|67|34|
    |67|  |0|69|
       |34|    |24|78|
          |0|        |58|62|
            |69|           |64|

This shows the parents and their direct children. You can verify that the right children go with the right parents by using the i * 2 + 1 trick. 41 is index 0, 0 * 2 + 1 is 1 and 0 * 2 + 2 is 2. Okay, that's an easy one, how about the last parent and child? 69 is index 4, 4 * 2 + 1 is 9, which is the index for 64. So we know that the pattern matches what we were doing by finding k with i * 2 + 1. But since the parent index increments by 1 each time and the index of the first child increments by 2, we can work directly with the pattern by using two indices (once again, jsw_heapsort remains unchanged):

1 #define jmax(p,q) ( *(p) > *(q) ? (p) : (q) ) 2 #define maxof3(p,q,r) jmax ( (p), jmax ( (q), (r) ) ) 3 4 void jsw_do_heap ( int a[], int i, int n ) 5 { 6 int k; 7 8 for ( k = i + 1; k < n; i++, k += 2 ) { 9 if ( k == n - 1 ) { 10 if ( a[k] > a[i] ) 11 jsw_swap ( &a[i], &a[k] ); 12 } 13 else 14 jsw_swap ( &a[i], maxof3 ( &a[i], &a[k], &a[k + 1] ) ); 15 } 16 }

But this is really two steps backward since it's even slower than our first attempt at heapsort, and implementing the same optimizations that we used before would make the code much more complicated, so this is really a substandard solution even though it's intellectually interesting to play around with alternatives. :-) I like to think that if you understand the variations we've looked at, you'll have a good feel for heap sort.

One problem that heap sort has (though you won't see it here) is that the traditional algorithm assumes 1-based array indices, while C uses 0-based arrays. That's proven to be a very frustrating problem because getting a 1-based algorithm to work with 0-based arrays can be tricky. As such, you'll probably see different tricks that avoid the issue, like ignoring the 0th index, or shifting the array so that the 0th index actually refers to one element in front of the array. The former is awkward and the latter is undefined by C. Other attempts are just wrong in subtle ways, and while they may seem to work most of the time, they'll break, usually at the worst possible time. ;-) No worries here though, all of my implementations are written specifically with 0-based arrays in mind. You're welcome.

 

Insertion Sort

The other most obvious way to sort a collection of items is by insertion. Insertion takes each new item, then locates the current ordered position in the collection and inserts the item there. The common example is sorting a bridge hand, but I don't play bridge and I don't know anyone that does, so an execution trace on our list of numbers as an example would fare better:

  1)  |41|
  2)  |41|67|
  3)  |34|41|67|
  4)  |00|34|41|67|
  5)  |00|34|41|67|69|
  6)  |00|24|34|41|67|69|
  7)  |00|24|34|41|67|69|78|
  8)  |00|24|34|41|58|67|69|78|
  9)  |00|24|34|41|58|62|67|69|78|
 10)  |00|24|34|41|58|62|64|67|69|78|

Each new item is moved into the correct position as it arrives. Unlike sorting by selection, this suggests that sorting by insertion is well suited to sorting an unknown number of items, and that the items don't need to be present before sorting can begin. So acquiring items through a stream or a generator is perfectly acceptable when sorting by insertion.

However, that very sequential nature suggests that sorting by insertion has potential performance issues when used with arrays. Insertion into the middle of an array requires that a hole be created by shifting all items after the hole to the right. For a large array, that can be incredibly slow and expensive. On the other hand, sorting by insertion into a linked list can be far more efficient in theory because the shifting step is not required.

Note that sorting by insertion can be done in-place without using any extra memory. Using insertion sort on an existing array of items would result in the following execution:

  1)  |41>67|34|00|69|24|78|58|62|64|
  2)  |41|67>34|00|69|24|78|58|62|64|
  3)  |34|41|67>00|69|24|78|58|62|64|
  4)  |00|34|41|67>69|24|78|58|62|64|
  5)  |00|34|41|67|69>24|78|58|62|64|
  6)  |00|24|34|41|67|69>78|58|62|64|
  7)  |00|24|34|41|67|69|78>58|62|64|
  8)  |00|24|34|41|58|67|69|78>62|64|
  9)  |00|24|34|41|58|62|67|69|78>64|
 10)  |00|24|34|41|58|62|64|67|69|78|

The code to do this is pretty simple. All we need to do is pick each item in the array, then make a hole where it needs to go and insert it. In fact, insertion sort is considered to be one of the simplest sorting algorithms both in concept and in implementation:

1 void jsw_insertion ( int a[], int n ) 2 { 3 int i; 4 5 for ( i = 1; i < n; i++ ) { 6 int j, save = a[i]; 7 8 for ( j = i; j >= 1 && a[j - 1] > save; j-- ) 9 a[j] = a[j - 1]; 10 11 a[j] = save; 12 } 13 }

Unfortunately, being one of the simplest sorting algorithms also means that insertion sort is also one of the least efficient. It's in the class of sorting algorithms that have quadratic time complexity, which means that it doesn't scale well to large arrays. However, insertion sort has one really awesome feature that makes it useful as a part of more sophisticated algorithms: insertion sort is blazing fast on arrays that are sorted or almost sorted. This makes it ideal for finishing up a partial quicksort, which we'll see shortly.

Insertion sort on a linked list is almost as simple as insertion sort on an array. With linked lists we may not be able to go backward, but that's hardly a stumbling block since we can easily save the sorted items in a temporary list by extracting the nodes from the original list, inserting them into the temporary list, then pointing the original list pointer back to the temporary list before returning. The temporary list only has the overhead of an extra variable, so it's space efficient:

1 void jsw_insertion ( struct jsw_list *list ) 2 { 3 struct jsw_node result = {0}; 4 struct jsw_node *i, *step, *next; 5 6 for ( step = list->head; step != NULL; step = next ) { 7 next = step->next; 8 9 for ( i = &result; i->next != NULL; i = i->next ) { 10 if ( i->next->data > step->data ) 11 break; 12 } 13 14 step->next = i->next; 15 i->next = step; 16 } 17 18 list->head = result.next; 19 }

 

Shell Sort

While straight insertion is particularly efficient on already sorted or almost sorted arrays or lists, it's still a quadratic sort because the overall result only sorts the items by one position at a time. Now, while this may sound like a given, it's possible to force the algorithm to move items over wider ranges by dividing the array into multiple interleaved groups. For example, dividing the insertion sort example into two groups of five, then sorting each corresponding value in those groups, then combining the groups and sorting the one group results in the following execution:

  1)  |41|67|34|00|69|   |24|78|58|62|64|
  2)  |24|67|34|00|69|   |41|78|58|62|64|
  3)  |24|67|34|00|69|   |41|78|58|62|64|
  4)  |24|67|34|00|69|   |41|78|58|62|64|
  5)  |24|67|34|00|64|   |41|78|58|62|69|
  6)  |00|24|67|34|64|41|78|58|62|69|
  7)  |00|24|34|64|67|41|78|58|62|69|
  8)  |00|24|34|41|64|67|78|58|62|69|
  9)  |00|24|34|41|58|64|67|78|62|69|
 10)  |00|24|34|41|58|62|64|67|78|69|
 11)  |00|24|34|41|58|62|64|67|69|78|

Hmm, not much better, is it? How about we divide the array into four groups instead of two? That might be better:

  1)  |41|67|   |34|00|69|   |24|78|   |58|62|64|
  2)  |24|67|   |34|00|69|   |41|78|   |58|62|64|
  3)  |24|00|   |34|62|69|   |41|67|   |58|78|64|
  4)  |24|00|   |34|62|64|   |41|67|   |58|78|69|
  5)  |00|24|34|62|64|41|67|58|78|69|
  6)  |00|24|34|41|62|64|67|58|78|69|
  7)  |00|24|34|41|58|62|64|67|78|69|
  8)  |00|24|34|41|58|62|64|67|69|78|

Sorting interleaved groups is the idea behind the diminishing increment sort, better known by the popular name Shell sort. Donald Shell capitalized on the fact that sorting by insertion is very fast for small arrays that are almost sorted. He then introduced the concept of breaking the array into increasing groups of smaller numbers (usually called the h-increment) of items. As long as the groups eventually shrink down to N groups of 1, the final step is a straight insertion and the algorithm terminates successfully:

1 void jsw_shellsort ( int a[], int n ) 2 { 3 int i, h = 1; 4 5 while ( h <= n / 9 ) 6 h = 3 * h + 1; 7 8 while ( h > 0 ) { 9 for ( i = h; i < n; i++ ) { 10 int j, save = a[i]; 11 12 for ( j = i; j >= h && a[j - h] > save; j -= h ) 13 a[j] = a[j - h]; 14 15 a[j] = save; 16 } 17 18 h /= 3; 19 } 20 }

The only difference between this code and the insertion sort is that we've added an extra loop around the insertion sort that handles the step increments and replaced references to 1 with h. By moving items that are separated by a large number of items, they get closer to their final resting place faster. The examples above don't do the algorithm justice, but until quicksort was developed, Shell sort was the fastest sorting algorithm available.

A careless choice of values for the h-increments, such as powers of two, could result in strikingly bad performance, while a seemingly arbitrary sequence can result in very good performance. To date, nobody has determined what the very best sequence is. Two sequences known to be good are the square of the Fibonacci sequence (1,4,9,25,64,169,441,...), and a sequence recommended by Donald Knuth that's fast and easy to calculate (1,4,13,40,121,364,1093,3280,9841...). While there are better sequences, these two are not substantially worse, and they have the benefit of being easy to create with minimal cost. The function above uses Knuth's sequence.

The Shell sort is simple to understand, easy to implement, and has good performance for small to medium arrays, but its time complexity is difficult to analyze and depends heavily on the sequence of h-increments chosen. In general, Shell sort should be the first sort used because it's easier to implement than the higher order sorting algorithms (with the possible exception of heap sort) and has good performance. If Shell sort turns out to be too slow, then some of the faster algorithms can be considered.

 

Bubble Sort

Sorting by exchanging is just that. Pairs of items that are out of order are exchanged repeatedly until no more pairs exist. When no more pairs exist that are out of order, the array must be completely sorted. Sorting by exchanging is particularly dangerous because the efficient solutions aren't nearly as obvious as the hideously inefficient ones. This is true for all sorting categories, but sorting by exchanging has the great honor of boasting the worst sorting algorithm that you might see in the real world (if you're unlucky). Here is a look at the most obvious algorithm:

  1)  |41|67|34|00|69|24|78|58|62|64|
  2)  |41>34|67|00|69|24|78|58|62|64|
  3)  |41|34>00|67|69|24|78|58|62|64|
  4)  |41|34|00|67>24|69|78|58|62|64|
  5)  |41|34|00|67|24|69>58|78|62|64|
  6)  |41|34|00|67|24|69|58>62|78|64|
  7)  |41|34|00|67|24|69|58|62>64|78|
  8)  >34|41|00|67|24|69|58|62|64|78|
  9)  |34>00|41|67|24|69|58|62|64|78|
 10)  |34|00|41>24|67|69|58|62|64|78|
 11)  |34|00|41|24|67>58|69|62|64|78|
 12)  |34|00|41|24|67|58>62|69|64|78|
 13)  |34|00|41|24|67|58|62>64|69|78|
 14)  >00|34|41|24|67|58|62|64|69|78|
 15)  |00|34>24|41|67|58|62|64|69|78|
 16)  |00|34|24|41>58|67|62|64|69|78|
 17)  |00|34|24|41|58>62|67|64|69|78|
 18)  |00|34|24|41|58|62>64|67|69|78|
 19)  |00>24|34|41|58|62|64|67|69|78|

As you can clearly see, this particular exchange sort is exceptionally inefficient, even when compared to straight selection and insertion, which have similar time complexity properties, even when the execution left out any of the steps that didn't result in a swap. Hey, I don't want to spend half my life typing out those executions, so I made it shorter. In the process I made the algorithm look better than it is, and it STILL looks bad. ;-)

Unfortunately, because the algorithm is simple and easy to understand this so-called “bubble sort” is commonly taught to beginning programmers as the first sorting algorithm, and often as a first algorithm entirely. I think that's a shame, because bubble sort only has practical use in very specific and unlikely circumstances, and for everything else it just sucks too much. For comparison and completeness, I'll provide an implementation of the bubble sort as it's usually taught:

1 void jsw_bubblesort ( int a[], int n ) 2 { 3 int i; 4 5 for ( i = 0; i < n; i++ ) { 6 int j; 7 8 for ( j = n - 1; j > 0; j-- ) { 9 if ( a[j] < a[j - 1] ) 10 jsw_swap ( &a[j], &a[j - 1] ); 11 } 12 } 13 }

Well, you have to admit that it's simple. Of course, once you optimize it a bit, that simplicity starts to go away. A flag for testing whether a swap has been made is critical for bubble sort, because otherwise it would make several unnecessary passes over the collection before terminating. While strictly optional, a test for early completion is considered a mandatory optimization to the bubble sort algorithm. No further optimizations are necessary as they won't improve the performance significantly, and simpler algorithms, such as insertion sort, will almost always outperform it. Here's the “optimized” bubble sort:

1 void jsw_bubblesort ( int a[], int n ) 2 { 3 int i; 4 5 for ( i = 0; i < n; i++ ) { 6 int j, done = 1; 7 8 for ( j = n - 1; j > 0; j-- ) { 9 if ( a[j] < a[j - 1] ) { 10 jsw_swap ( &a[j], &a[j - 1] ); 11 done = 0; 12 } 13 } 14 15 if ( done ) 16 break; 17 } 18 }

Bubble sort is a quadratic algorithm, but it's frighteningly inefficient even if you only compare it with other quadratic algorithms. Words can't describe my loathing for this algorithm, and I've gotten into very heated debates about whether it should ever be used or not, even as an introductory algorithm. My recommendation: forget bubble sort exists. You'll be better off for it. :-)

 

Quicksort

Bubble sort is unmercifully slow. It's the worst possible sorting algorithm that you'll see in the real world, and even then only because the author of the code probably doesn't know any better. Fortunately, we can improve bubble sort in much the same way that we improved the straight selection sort. But since we don't have a neat and tidy concept to improve, along with a way to improve it (ie. the selection problem), we need another way to improve the performance of exchanging.

The good news is that we can use recursion to split the array at a median value at each step by moving the median value to its sorted position, then recursively sort each subarray the same way in a very good algorithm called quicksort. By dividing the problem at the median each time, we can guarantee that this part of the algorithm is O(log N). Now it's just a simple matter of figuring out a good way to sort the subarrays that doesn't take longer than O(N) and we'll have an optimal comparison sort.

Okay, maybe it's not that simple. :-) We have two immediate problems. First we need a way to partition the array. Partitioning is forcing a median value to its sorted position while making sure that all items less than the median come before it and all items greater than or equal to the median come after it. Then we need a way to find the median value without taking too much time. To really find the median requires at least an O(N) algorithm on top of the partitioning which will at best be O(N). Now, O(N) + O(N) is still O(N), but the algorithm will run more slowly than if we only had one O(N) part, and we want this to be as fast as possible.

Rather than waste cycles trying to find the true median, we'll just guess. Our first attempt at guessing is the simplest. We just take the first item in the subarray and pretend that it's the median value. :-) This could be a serious problem if the first item is the smallest item in the array, but we'll look at that a bit later. One problem at a time. Assuming a partition function, we can easily write a recursive function to partition, then recursively do the same thing on each subarray as long as the subarray has more than one item:

1 void jsw_quicksort_r ( int a[], int first, int last ) 2 { 3 if ( first < last ) { 4 int pivot = jsw_partition ( a, first, last ); 5 jsw_quicksort_r ( a, first, pivot - 1 ); 6 jsw_quicksort_r ( a, pivot + 1, last ); 7 } 8 } 9 10 void jsw_quicksort ( int a[], int n ) 11 { 12 jsw_quicksort_r ( a, 0, n - 1 ); 13 }

Let's look at a sample execution to see how this function could possibly work. It's not obvious at first unless you're familiar with recursion and can see how partitioning forces an item to its sorted position recursively results in a sorted file. In this example, we take the first item in each subarray and use it as the median and ignore cases where the subarray is too small to partition:

 |41|67|34|00|69|24|78|58|62|64|
 |41|34|00|24|58|62|64|69|67|78|
 |41|34|00|24|58|62|
 |41|34|00|24|58|
 |00|24|41|34|
 |00|24|34|41|58|62|64|
                      |69|67|78|
 |00|24|34|41|58|62|64|67|69|78|

The base case for recursion is when the indices first and last cross. So we stop the recursion if we get to a subarray with only a single item. jsw_quicksort passes an inclusive array to jsw_quicksort_r, which basically means that both a[first] and a[last] are valid items. This is different from most of our other algorithms (such as heap sort) where last is the first index after the last element of the array. This is for the benefit of jsw_partition, which is more easily written if both first and last refer to valid indices. The pivot variable refers to the index of the median value so there's no need to include it in the recursive calls as it's already sorted after partitioning. Here's the code that performs the partition magic:

1 int jsw_partition ( int a[], int first, int last ) 2 { 3 int pivot = first; 4 5 while ( first < last ) { 6 if ( a[first] <= a[last] ) 7 jsw_swap ( &a[pivot++], &a[first] ); 8 9 ++first; 10 } 11 12 jsw_swap ( &a[pivot], &a[last] ); 13 14 return pivot; 15 }

Partitioning an array is surprisingly difficult to get right. The algorithm above is a relatively safe, simple, and efficient one that gets the job done in O(N) time. The idea is to walk across the array, swapping the pivot with items that are less than the pivot as we go, and resetting the pivot to the next item when we swap. When we get to the end of the array, the current pivot is the median value that's in its sorted position. Note that the median isn't necessarily the pivot that we chose at the beginning of the partition.

That's all there is to quicksort! Once you have a partition algorithm that works, writing the code to do a quicksort is trivial, and then you have the fastest non-hybrid sorting algorithm at your disposal. But we're not done yet because we can improve the algorithm. There are a few problems that need to be fixed and optimizations that can make quicksort even faster.

The first problem is the biggest. If the pivot value chosen in jsw_partition is always the smallest item in the array, such as when the array is already sorted, no swaps will be made and the median will be the first item in the array. This results in a vastly unbalanced split where the left half has one item and the right half has the rest of the array. This is a degenerate case where quicksort really isn't that quick, with a quadratic time complexity.

There are two widely accepted ways to minimize the chances of hitting a degenerate case. First, we can use a random number generator to pick which value in the array to use as a pivot. Then we guarantee that nothing untoward can happen by swapping that value with the first item in the array because the partition algorithm assumes that pivot starts at first. Otherwise we would end up walking off the end of the array, and that's never a good thing. ;-) The changes are minimal:

1 int jsw_partition ( int a[], int first, int last ) 2 { 3 int pivot = first + rand() % ( last - first + 1 ); 4 5 jsw_swap ( &a[pivot], &a[first] ); 6 pivot = first; 7 8 while ( first < last ) { 9 if ( a[first] <= a[last] ) 10 jsw_swap ( &a[pivot++], &a[first] ); 11 12 ++first; 13 } 14 15 jsw_swap ( &a[pivot], &a[last] ); 16 17 return pivot; 18 }

Notice that the code to find a random index within the range is somewhat complicated. We need to take into account that first is not always zero, so we use last - first + 1 to find the number of items in the subarray (we add 1 because last is the last valid item, not one beyond the last valid item), then use that as the modulus for rand (ignoring the issues with rand). Finally, because the resulting number is ranged from zero to the number of elements, we add first to fit it within the range of [first,last] as a valid index.

The second, and preferred, option is to take the median of three items picked at random from the array (though the first, last, and middle are most common), and use that as the pivot. The process is the same, except instead of picking a random index and swapping its item with a[first], we pick the median of a[first], a[mid], and a[last], then swap that median with a[first]:

1 int jsw_partition ( int a[], int first, int last ) 2 { 3 int pivot, mid = ( first + last ) / 2; 4 5 /* Largest of two */ 6 pivot = a[first] > a[mid] ? first : mid; 7 8 /* Smallest of remaining */ 9 if ( a[pivot] > a[last] ) 10 pivot = last; 11 12 jsw_swap ( &a[pivot], &a[first] ); 13 pivot = first; 14 15 while ( first < last ) { 16 if ( a[first] <= a[last] ) 17 jsw_swap ( &a[pivot++], &a[first] ); 18 19 ++first; 20 } 21 22 jsw_swap ( &a[pivot], &a[last] ); 23 24 return pivot; 25 }

The next improvement is an optimization that doesn't touch jsw_partition (yay!). Since it's really a waste to recurse on small subarrays, we can set a threshold for when to stop calling jsw_quicksort_r recursively. As it is, we're recursing until first and last cross, which is equivalent to a threshold of 0. But if we set a cutoff of, say, 15 or 20, we can mostly sort a large array, then hand it off to insertion sort, which is much faster for small or nearly sorted arrays. This suggests two possible variations. First, we could call insertion sort inside jsw_quicksort_r, if we get to the cutoff. Robert Sedgewick studied this thoroughly and discovered that it's slightly more efficient to call insertion sort after quicksort finishes, which brings us to this (where THRESHOLD is a macro with whatever cutoff value you choose):

1 void jsw_quicksort_r ( int a[], int first, int last ) 2 { 3 if ( last - first > THRESHOLD ) { 4 int pivot = jsw_partition ( a, first, last ); 5 jsw_quicksort_r ( a, first, pivot - 1 ); 6 jsw_quicksort_r ( a, pivot + 1, last ); 7 } 8 } 9 10 void jsw_quicksort ( int a[], int n ) 11 { 12 jsw_quicksort_r ( a, 0, n - 1 ); 13 jsw_insertion ( a, n ); 14 }

The next optimization that we'll look at handles the problem of already sorted subarrays. If the subarrays are already sorted or have a large number of duplicates, quicksort will take its sweet time going through them all. There's no point in using a sophisticated sort on an already sorted subarray, and we can make quicksort consistently fast by testing if the subarray is already sorted and doing nothing if it is. The change is made to jsw_quicksort_r, and we can use the is_sorted function given earlier:

1 void jsw_quicksort_r ( int a[], int first, int last ) 2 { 3 if ( last - fir