Friday Algorithms: Sorting a Set of Integers – Far Quicker than Quicksort!

Yes! It’s really really fast, and it’s far quicker than the quicksort algorithm, which is considered as the fastest sorting algorithm in practice. However how it’s possible to be faster than the quicksort, which is the fastest algorithm?! Is that true? Actually it’s true, but only in few cases. It works with integers, you’ve to know the first and the last element from that set and you’ve to be sure that every element is unique

Imagine you’ve a set of numbers all of them greater than 1 and lesser than 1000. Of course you’re not suppose to have all of the integers between 1 and 1000, but only few of them – think of 500 numbers between 1 and 1000! Here’s important to note – that this is only an example, you can have far more than only few numbers between 1 and 1000 – what about the numbers between 1 and 1,000,000 – this is a big set, isn’t it.

The question is – if there are so many constraints, why should I use that algorithm instead of quicksort, or another sorting algorithm, that works with everything. The answer is clear – yes, you’d prefer quicksort if you’ve to sort some arbitrary data, but when it comes to integers, and you’ve, let’s say, 1,000,000 integers, my advice is – use this algorithm!

Sorting the Set

1. First Pass

First we have an unsorted array, but we know the minimum and maximum of the set.

an unsorted array of integers

On the first pass initialize an empty array with as many elements, as they are between the first and the last element of the set – for a set between 1 and 1000 – that will be an array with 1000 elements – each of which will be a zero in the beginning.

temporary array with 0 on every element

Than loop trough the set and for every element in the set – you should put a 1 on it’s place

sort a set - first pass

Now we have an array of 0 and 1.

2. Second Pass

After the first pass, you’d guess what you’ve to do – loop trough the second array and print the keys of the elements different from 0 – those that are 1.

a sorted set of integers

Now the array is sorted!

Source Code

var a = [34, 203, 3, 746, 200, 984, 198, 764];
 
function setSort(arr)
{
    var t = [], len = arr.length;
    for (var i = 0; i < 1000; i++) {
        t[i] = 0;
    }
 
    for (i = 0; i < len; i++) {
        t[arr[i]] = 1;
    }
 
    for (i = 0; i < 1000; i++) {
        if (1 == t[i]) {
            console.log(i);
        }
    }
}
 
setSort(a);

Now that you’ve seen that algorithm, perhaps you’d guess that it’s no so difficult to change from integers to any other set, and once again I should say that in many cases this is the best algorithm for sorting! Very often quicksort is preferred, but not always there isn’t something faster!

14 thoughts on “Friday Algorithms: Sorting a Set of Integers – Far Quicker than Quicksort!

  1. Nice trick! But not all elements have to be unique with a little adjustment:
    change the line

        t[arr[i]] = 1;

    to

        t[arr[i]]++;

    and

        if (1 == t[i]) {
            console.log(i);
        }

    to

        for(j = 1; j <= t[i]; j++ ) {       
            console.log(i);
        }
  2. @jeroen – first of all thanks for the comment! However despite I wrote that one of the conditions is to have only unique elements, I know that we can have different items, and than loop as you showed in your example. But let me say that this slightly changes the complexity of the algorithm! For a set where all the elements are equal – the complexity is quite different with that solution.

    Also let me add that the code I wrote is JavaScript – something that is not evident from the example.

  3. Only thing that might come up with this one is it might be fairly memory intensive if you have a very wide range of numbers (like say 1 to 1 billion+). It might work very well for small numbers (1-1000 or so), but it might not work so well on a larger scale.

  4. Congratulations for noticing the Radix sort algorithm. The reason why it’s not used 99% of the time is that while it is arguably very fast computationally, it is insanely heavy on memory. Also, you have to know the COMPLETE DOMAIN of all possible objects/values beforehand.

    Split the difference, use something like a tree sort that is O(nlog n), keep your memory footprint to a worst case of O(N) and be the better for it.

  5. There are several problems with this approach.

    1) It requires extra memory space equal to the array’s largest value – the smallest value. This is a major scaling problem. Lets pretend I have a set of 50 numbers, but the range of values is in the billions. I will need a massive ancillary array to sort a small situation. Your solution does not scale well at all potentially requiring a memory requirement many times N.

    2) It does not actually sort the array. For this to be a useful sort, you need to be able to return a sorted array. This requires making and returning a third array based on T.

    3) The speed is debatable. The iteration over T(twice) could be MANY times larger than the iteration over a, even in an nlog(n) situation. It certainly works in small situations, but again, it fails to scale well.

    That said, the algorithm has its uses, limited as they may be, but it is actually very similar to a hash sort. If you hash your inputs and then insert the hash values into hash table, you get a very rough hash sort. Of course, this greatly increases the complexity, but it would reduce your memory footprint while increasing your speed, especially if you hash all new inputs as they come in to maintain a sorted table at all time.

  6. @kjordan – of course this is an algorithm using transformation instead of comparison like the quicksort, and thus it’s using more memory. That’s why you should be very careful when choosing this algorithm!

  7. @Matthew – thanks for the comment – I completely agree with you! Indeed sometimes choosing an algorithm is as difficult as the implementation itself!

  8. Also known as bitmap sort. This algorithm is very useful if you want to sort integers in a range and when you have to use minimal primary memory but have secondary memory available.

  9. FYI, this tends to be really slow in JavaScript. JS uses hashs for their arrays, so what should be O(n) becomes O(n log n). In IE, it’s even worse because they use linked lists. So O(n) becomes O(n^2) or worse.

    I remember using this Radix trick on my professors in school to be a prick. It was always fun to make them include a memory big-O with their questions.

  10. Here’s a faster and smaller version.

    function sort(a){
        var d=[],f=[],e=Math.min.apply(Math,a),g=Math.max.apply(Math,a),c=g-1;
        do d[c]=0;while(c--);
        for(c in a)d[a[c]]=1;
        do d[e]&&f.push(e);while(e++<g);
        return f
    };
    
  11. CORRECTION. Looks like this form doesn’t like — or minus minus.

    My code suppose to be while(c–); //c minus minus
    and NOT while(c–); //WRONG from auto correct

  12. Even smaller

    function sort(a){var d=[],o=[],e=Math.min.apply(Math,a),g=Math.max.apply(Math,a);for(var i in a)d[a[i]]=1;do d[e]&&o.push(e);while(e++<g);return o};

  13. @Eddy Vang – first of all do you think this code is maintainable? Yes, it’s javascript so it should be minified. What about “function s(a) …” ;)?

    However by saying “faster” do you consider the speed of Math functions? Have you benchmarked this code?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>