Efficiently Searching a Sorted ArrayBy Scott Mitchell
Pop quiz, hotshot: you have an array of strings consisting of the various IP addresses of the visitors of the last 24 hours to your Web site. You want to determine if your site has been visited by the IP address 126.96.36.199. How can you quickly figure out if the string "188.8.131.52" is in the array?
Well, the answer depends largely upon whether or not the array is sorted. If the array is not sorted the best you can do is iterate through all of the items in the array, checking each element, until you either find the matching string or reach the end of the array. This approach is said to run in linear time since the amount of time required for the algorithm to run is linearly proportional to the length of the array. That is, if the array contains 1,000 elements, you will have to check at most 1,000 elements (assuming the element is in the last position of the array, or is not found at all in the array); if the array contains 1,000,000 elements, you will have to check at most 1,000,000 elements. Hence, there is a linear proportionality between the amount of work that needs to be done to search the array and the size of the array (namely, a proportion of 1).
However, if your array is sorted there is a much faster algorithm you can employ to determine whether or
not a particular element exists within the array, known as the binary search algorithm. In this article
we will examine how this algorithm works, its running time, and how to use the
method, which searches a sorted array using the binary search algorithm.
Dissecting the Binary Search Algorithm
In order to use the binary search algorithm it is vital that the array be sorted. This means your array must be storing data that can be sorted in some defined order (commonly referred to as lexicographical order). A number of built-in data types are lexicographically sortable, such as integers and strings, so, for the time being, let's focus on an array of strings that are sorted in alphabetical order. So, assume we have an array whose contents are each word of the string "There are few girls in Rolla, Missouri." The array, in sorted order, would be:
A = "are" A = "few" A = "girls" A = "in" A = "Missouri." A = "Rolla," A = "There"
Now, say that you wanted to determine if the string "Rolla" existed in the array. If the array was not sorted you would have to iterate through all seven array elements, checking each to see if the value "Rolla" was found in the particular array element. With a sorted array, however, you can use the binary search algorithm, which performs the following steps:
a = 0and
b = 6(or more generally,
bis the upper bounds of the array, which is 6 in our example above)
k = (b - a) / 2.
A[k]equals "Rolla" then we've found the string "Rolla", end searching.
A[k]does not equal "Rolla", determine if
A[k]appears lexicographically before or after "Rolla". If it appears after "Rolla", then, if "Rolla" exists in the array it must be somewhere between
k. So, set
b = kand return to step 2. If, however,
A[k]appears lexicographically before "Rolla" then, if "Rolla" exists, it must be somewhere between
b, so, let
a = kand return to step 2.
The above steps are the nuts and bolts of the binary search algorithm. In plain English, the algorithm starts by checking the midpoint of the array. If, in sorted order, the element in the middle is appears after the element we're looking for then the element we're looking for must, intuitively, be somewhere between the start of the array and the midpoint, so we repeat our search using the midpoint of the beginning of the array and the midpoint of the array. If, however, the element at the array's midpoint appears before the element we're looking for, then we need only search the latter half of the array.
If you're still having a bit of trouble wrapping your head around the binary search algorithm, consider how you might lookup a given word in the dictionary, say, "efficient"; assume that the dictionary has 800 pages. You know that "efficient", if it exists in the dictionary, must be located somewhere between page 0 and page 800, so you start by opening up the dictionary to page 400 (the middle of the current range) and checking to see what word was at the top of the page. Assume it was "neither". Since you know "efficient" comes before "neither" you know that "efficient" must be found somewhere between pages 0 and 400, so you turn to the middle of that range, page 200. Say the word you find there is "hibernate". Again, since "efficient" comes before "hibernate" you know that "efficient" must be found somewhere between pages 0 and 200, so you turn to page 100. There, you find the word "clammy"; hence, you know that "efficient" must be somewhere between page 100 and 200, so you turn to page 150... this would repeat until you find the word "efficient" or until your range reaches 1 page and "efficient" is not found (meaning it does not exist in the dictionary).
|Binary Search is a Recursive Algorithm|
|The binary search algorithm is known as a recursive algorithm since it can be defined in terms of itself, with a shrinking problem space to consider. Recursion is a useful programming technique and is quite common in algorithm design and analysis. To learn more about this neat topic be sure to read: Recursion: Why It's Cool!|
The Running Time of the Binary Search Algorithm
The running time of the binary search algorithm is logarithmic to the size of the array, specifically log2 N, where N is the array size. This means that as the array increases in size, the amount of work required to search the array only increases logarithmically. Specifically, doubling the array size only incurs one extra unit of work! That is, if your array has 1,000 elements the binary search algorithm will have to examine, at most, 10 elements. If your array has 2,000 elements, the binary search algorithm will have to examine, at most, 11 elements! This is much less than the iterative approach, obviously.
Of course, to see the benefits of the binary search algorithm you must have your array in sorted order. Assume you have an unsorted array and you want to determine if an element exists. Would it make sense to sort the array and then use binary search? If you were planning on only using binary search once, then, no, it would not be smart, because the best sorting algorithms take time proportional to the size of the array times log2 of the size of the array. This means it would be quicker to use the iterative search method than to sort the array and then use the binary search method. Of course, if you planned on using binary search on the contents of a sorted array more than "log2 of the size of the array" times, then it would make sense to first sort the array and then use binary search.
While the binary search algorithm is not too difficult to implement, ASP.NET developers don't need to both implementing it themselves due to the built-in
Array.BinarySearchmethod. (If you are using classic ASP and wish to implement the binary search algorithm, take a gander at this code provided by Brandon Hunt.) This method, in its simplest form, takes two input parameters: the Array to search and the value to look for. This form assumes not only assumes that the array is sorted but that the Array elements implement the
IComparableinterface, meaning that the values of the array can be compared against the value to search for. If your Array contains standard data types, like integers, doubles, strings, etc., you can use this form of the
BinarySearch method returns an integer indicating the position the array element
was found at; if the array element was not found, a negative value is returned.
A simple example can be seen below:
This simple example starts with an unsorted array and then calls the
Array.Sort method to
sort the array. Next, it uses the
Array.BinarySearch method to quickly determine if a
particular value exists within the array. Be sure to check out the live
demo to see an example of the above code in action.
Now that we've seen how to use the
Array.BinarySearch method on arrays whose elements implement
IComparable interface, let's turn our attention to seeing how to use the
Array.BinarySearch method on arrays whose elements do not implement the