A few days ago Andrei Alexandrescu published a possibly new idea for a search-datastructure that tries to be yet another compromise between to be faster than plain search in unordered container but doesn’t on the other hand require acompletely sorted one as well.
The idea is basically this:
Note that this does require for the array to be partially sorted, but not completely.
So, what is the advantage? Basically that we can speed up insertion in the middle while keeping a clearly sublinear search.
In the following I will attempt to analyze the algorithmic complexities of each of the relevant operations.
In order to do that, we will need some knowledge about the Buckets.
First: How many Elements can Buckets hold? The answer is obviously:
Which far less obviously is equal to:
This implies that if the container holds elements, that there are Buckets and the largest one contains
Elements.
Searching for an element consists of two operations:
The first part can be done using a binary search, resulting in a runtime in
Which is certainly okay.
The second part is somewhat worse though: Since the buckets are only organized as binary heaps, there is basically no way around doing a linear search, giving us a runtime in
This clearly is the dominating term and therby also the runtime of a search-operation. While it really isn’t great it still is a lot better then doing a search in an unsorted array.
Surprisingly this is faster then searching. The algorithm is basically this:
Considering that, the complexity must be in:
This one is horrible:
Obviously searching through half the elements of the container is an -operation. The good news is, that this is the same for both the sorted and the unsorted array, so we don’t actually loose algorithmic performance compared to them.
Another nice thing about the datastructure is that it has already done half of the job if we want to completely sort it:
What remains is therefore basically to call sort-heap () on each Bucket, giving us an overall complexity in
Well, that is simply the minimal complexity that any sort-operation can reach on unsorted data, so it sadly isn’t overly impressive, but I am willing to believe, that it may in practice still be quite a bit faster to sort this particular data-structure than something without any structure at all on it.
While this is not yet a proof that sorting this structure cannot become faster, I consider it a strong argument.
Sadly the creation of this datastruture cannot become faster than as well, based on a similar reasoning as the proof that sorting cannot become faster:
The number of legal partitions of buckets (before we call make-heap on them) is
Wheras the number of total permutations of the range is trivially .
Using binary comparission we can therefore not do better then using
Operations to create such a structure from arbitrary inputs. Sadly that means that creating the structure isn’t less complex than just sorting the whole array.
Getting the element at a certain index is obviously an -Operation.
This basically works like the search except for being able to directly access the bucket, resulting in a complexity in .
The following table provides an overview over the previously argued complexities.
Operation | Complexity |
---|---|
Searching | |
Insertion | |
Deletion | |
Sorting | |
Creation | |
Access | |
nth-Element |
It should be noted that a sorted list can in combination with an index-tree achieve the same and often much better complexities for every operation aside from accessing the element at a certain index. It does however come with significant indirections in memory and may (or may not) loose with regards to performance for many practical applications due to the vastly superior cache-friendlyness of Andreis datastructure.
Furthermore I would like to point out, that a lot of the performance-advantages were removed by the calculation in Big-O, implying potentially big constant factors that were neglected in this analysis. This is also where I assume that some tuning might be able to improve by picking another growth-factor for the buckets may still have merrit (it would have to be super-polynomial to solve that problem, but must also be sub-exponential, otherwise the biggest bucket would have a size in ).