The binary search tree is a very useful data structure, where searching can be significantly faster than searching into a linked list. However in some cases searching into a binary tree can be as slow as searching into a linked list and this mainly depends on the input sequence. Indeed in case the input is sorted the binary tree will seem much like a linked list and the search will be slow.
To overcome this we must change a bit the data structure in order to stay well balanced. It’s intuitively clear that the searching process will be better if the tree is well branched. This is when finding an item will become faster with minimal effort.
Since we know how to construct a binary search tree the only thing left is to keep it balanced. Obviously we will need to re-balance the tree on each insert and delete, which will make this data structure more difficult to maintain compared to non-balanced search trees, but searching into it will be significantly faster. Continue reading Computer Algorithms: Balancing a Binary Search Tree→
I wrote about binary search in my previous post, which is indeed one very fast searching algorithm, but in some cases we can achieve even faster results. Such an algorithm is the “interpolation search” – perhaps the most interesting of all searching algorithms. However we shouldn’t forget that the data must follow some limitations. In first place the array must be sorted. Also we must know the bounds of the interval.
Why is that? Well, this algorithm tries to follow the way we search a name in a phone book, or a word in the dictionary. We, humans, know in advance that in case the name we’re searching starts with a “B”, like “Bond” for instance, we should start searching near the beginning of the phone book. Thus if we’re searching the word “algorithm” in the dictionary, you know that it should be placed somewhere at the beginning. This is because we know the order of the letters, we know the interval (a-z), and somehow we intuitively know that the words are dispersed equally. These facts are enough to realize that the binary search can be a bad choice. Indeed the binary search algorithm divides the list in two equal sub-lists, which is useless if we know in advance that the searched item is somewhere in the beginning or the end of the list. Yes, we can use also jump search if the item is at the beginning, but not if it is at the end, in that case this algorithm is not so effective.
So the interpolation search is based on some simple facts. The binary search divides the interval on two equal sub-lists, as shown on the image bellow.
What will happen if we don’t use the constant ½, but another more accurate constant “C”, that can lead us closer to the searched item.
The binary search is perhaps the most famous and best suitable search algorithm for sorted arrays. Indeed when the array is sorted it is useless to check every single item against the desired value. Of course a better approach is to jump straight to the middle item of the array and if the item’s value is greater than the desired one, we can jump back again to the middle of the interval. Thus the new interval is half the size of the initial one.
If the searched value is greater than the one placed at the middle of the sorted array, we can jump forward. Again on each step the considered list is getting half as long as the list on the previous step, as shown on the image bellow.
Here’s a sample implementation of this algorithm on PHP. Obviously the nature of this approach is guiding us to a recursive implementation, but as we know, sometimes recursion can be dangerous. That’s why here we can see either the recursive and iterative solution. Continue reading Computer Algorithms: Binary Search→
In my previous article I discussed how the sequential (linear) search can be used on an ordered lists, but then we were limited by the specific features of the given task. Obviously the sequential search on an ordered list is ineffective, because we consecutively check every one of its elements. Is there any way we can optimize this approach? Well, because we know that the list is sorted we can check some of its items, but not all of them. Thus when an item is checked, if it is less than the desired value, we can skip some of the following items of the list by jumping ahead and then check again. Now if the checked element is greater than the desired value, we can be sure that the desired value is hiding somewhere between the previously checked element and the currently checked element. If not, again we can jump ahead. Of course a good approach is to use a fixed step. Let’s say the list length is n and the step’s length is k. Basically we check list(0), then list(k-1), list(2k-1) etc. Once we find the interval where the value might be (m*k-1 < x <= (m+1)*k – 1), we can perform a sequential search between the last two checked positions. By choosing this approach we avoid a lot the weaknesses of the sequential search algorithm. Many comparisons from the sequential search here are eliminated.
How to choose the step’s length
We know that it is a good practice to use a fixed size step. Actually when the step is 1, the algorithm is the traditional sequential search. The question is what should be the length of the step and is there any relation between the length of the list (n) and the length of the step (k)? Indeed there is such a relation and often you can see sources directly saying that the best length k = √n. Why is that?
Well, in the worst case, we do n/k jumps and if the last checked value is greater than the desired one, we do at most k-1 comparisons more. This means n/k + k – 1 comparisons. Now the question is for what values of k this function reaches its minimum. For those of you who remember maths classes this can be found with the formula -n/(k^2) + 1 = 0. Now it’s clear that for k = √n the minimum of the function is reached.
Of course you don’t need to prove this every time you use this algorithm. Instead you can directly assign √n to be the step length. However it is good to be familiar with this approach when trying to optimize an algorithm.
Let’s cosider the following list: (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610). Its length is 16. Jump search will find the value of 55 with the following steps.
The expression “linear search in sorted lists” itself sounds strange. Why should we use this algorithm for sorted lists when there are lots of other algorithms that are far more effective? As I mentioned in
my previous post the sequential search is very ineffective in most of the cases and it is primary used for unordered lists. Indeed sometimes it is more useful first to sort the data and then use a faster algorithm like the binary search. On the other hand the analysis shows that for lists with less than ten items the linear search is much faster than the binary search. Although, for instance, binary search is more effective on sorted lists, sequential search can be a better solution in some specific cases with minor changes. The problem is that when developers hear the expression “sorted list” they directly choose an algorithm different from the linear search. Perhaps the problem lays in the way we understand what an ordered list is?
What is a sorted list?
We used to think that this list (1, 1, 2, 3, 5, 8, 13) is sorted. Actually we think so because it is … sorted, but the list (3, 13, 1, 3, 3.14, 1.5, -1) is also sorted, except that we don’t know how. Thus we can think that any array is sorted, although it is not always obvious how. There are basically two cases when sequential search can be very useful. First when the list is very short or when we know in advance that there are some values that are very frequently searched. Continue reading Computer Algorithms: Linear Search in Sorted Lists→