Best worst case sorting algorithm

I want to know which algorithm performs best when dealing with its worst case compared to others each dealing with its own worst case, all applied in array ordering.

To better understand what I want to know, there are some questions that lead to the answer.

Is there a consensus which Array-oriented comparison sorting algorithm is best when comparing with others, each in their respective worst cases?

Being merge sort, timsort, smootchsort, intro sort and heap sort O(n*log(n)) complexity algorithms of comparisons and time, which has the lowest factor of n*log(n)?

Is there another traditional algorithm that surpasses the five in the worst case? Or a widely known variant of some algorithm (like quick sort, since the simple pivot selection criterion has a lot of variety and impact) or joining multiple algorithms it overcomes (type intro sort + insertion sort)?

Anyway, What is the best algorithm for worst case ordering ? Take into account that it is array ordering and not only asymptotic complexity is accepted but also the factor of this term in the formula of comparisons or experimental measures of time in large problems.

Author: RHER WOLF, 2020-08-21

1 answers

We have to take care of expectations and realities when dealing with arrays. Since arrays are structures without so much dynamics, we find in them complications and inconveniences in implementations of algorithms to order them that, even, may seem even catchphrases.

Coming out of arrays, the thing is much less restrictive. In them, the quick access to the data having its index is a very sought-after benefit and used for good performance in various operations, but this does not guarantee low complexity of operations and has its price and inconveniences in certain circumstances, which makes it difficult to choose the algorithm.

  • Merge sort into lists

For example, in chained lists we only need to insert the element in a chained position there, not having to give shift as you do in array. This makes merge sort much more suitable on chained lists because, in addition to not needing additional memory of complexity O(n) (unlike the version of arrays) and stability, one can still scroll through it O(log(n)) times to give merge of the pieces comparing O(n) times and properly using shortcut pointers transit between pieces of the list to be joined and move nodes simply triggering them from where they are and re-falling into their destinations.

This not only results in 1+n*floor(log2(n))+n-2^floor(log2(n)+1) comparisons in lists (as well as arrays), summed up to n*log(n) = O(n*log(n)), but also pointer handling (of varying cost depending on the implementation of the list and the algorithm) that well done can total, depending on rounding in the recurrence ratio formula, at least n+ceil(n/2)*floor(log2(n))-2^floor(log2(n)) Complete movements of nodes and one more additional movements that goes up to floor(2^ceil(log2(n)-1)/3), summed all to n*log(n)/2 = O(n*log(n)) movements.

  • Merge sort in arrays

These moves in arrays occur with assignments of new values over the old ones in the position. The cost of this move in arrays in merge sort depends on whether there is enough memory to use an auxiliary array and with this, as well as in chained list, keep in the worst case the complexity O(n*log(n)) not only in comparisons but also in movements. Otherwise, having no memory available you must shift the elements (after all there is nowhere else to put them) and this results in complexity O(n²*log(n)) in the worst case (O(log(n)) merge levels, for each level O(n) comparisons and for each comparison O(n) assignments). No matter the exact number of assignments, it is absurdly costly. In in other words, array has this dependency on the condition of having memory to sort with good performance using merge sort.

  • tree sort in binary trees

In trees, although complicated, one can temporarily adapt the pointers so that each node functions as a list node and merge sort to then reassemble the tree in order. But apparently the best algorithm in this case is tree sort because, in addition to being stable, insert each element neatly on the tree. After all, you can implement the structure already as a balanced binary search tree and apply iteration of the algorithm each time you have to insert an element to keep it ordered without having to sort everything. Even if you don't have it done, one can simply rebuild the tree in the correct way based on the algorithm and this without having to store vast additional data. Several of these facilities are also not found in array ordering.

If the tree is complete, the number of comparisons in the worst case is equal to that of merge sort, but tree sort has almost the same performance between best and worst case, which gives an advantage to merge sort in this aspect since it, from worst case to Best, has almost 50% fewer comparisons. In addition, each insert may require O(n) rotations throughout the tree to keep it complete, which causes the algorithm a complexity O(n²) of movements in the worst case.

The full tree height, which determines an almost cost ratio of comparisons per insertion, is floor(log2(n)+1), since AVL height ranges from floor(log2(n)+1) to floor(log(( n+1.5 )/( 0.5+sqrt(0.45) ))/log( sqrt(1.25)+0.5 )), then it is considered in the worst case about 1.44*log2(n)).

So if instead the tree is AVL, the rotations by insertion that preserve the balance occur O(log(n)) times even in the worst case, preserving the complexity O(n*log(n)) of ordering not only comparisons but also movements. However, as seen in the previous paragraph, the number of comparisons is greater than the merge sort (despite the same complexity), in the worst case around 44% More (x1.44). Thus, even with the huge gain it remains losing in performance.

  • tree sort in arrays

Anyway, as far as I know, there are not widely studied means of applying this algorithm in an array or simulating a complete tree or AVL in an array in such a way that one can sort by tree sort. Even if one discovers a means of doing this, it does not seem worth it because theoretically the tree sort loses to merge sort, only if there is some genius balcony that simplifies operations and so far I have not found and did not think of anything for it.

Curiosity:

It is possible to implement insertion sort with O(n*log(n)) complexity of comparisons (binary search of the correct position with each insertion). The number of comparisons in the worst case is the same as full tree sort and merge sort. In fact, one can see the steps of comparisons as similar to of a complete tree, it is as if it were the tree sort in the aspect of comparisons. And guess what? That doesn't matter. The problem of insertion in arrays of having to shift, move elements in a complexity O(n) in the worst case with each insertion, thus ensuring O(n²). It doesn't really work like a tree.

  • insert sort

Speaking of insertion sort, let's talk about the obvious: sorts arrays and is stable but does shift, has complexity O(n²) of comparisons and moves in the worst case, so it's a bad idea in big problems. Okay, but what about the little problems? Have you ever thought of having to sort several small sequences that together weigh? Well, it may be worth testing the performance of insertion sort in cases like these, even in the worst.

Insertion sort is considered a great algorithm for sorting short arrays. If you are going to order several small ones so that they weigh together, maybe he is a good option. Tests with integers that I did some time ago indicate that in average case it is usually good with about twenty elements and worse case, around eight. Even with many comparisons, they are very simple and quick instructions. The problem is if the comparisons are of data that cost a lot, such as strings (even worse with similar initials) and structures with very repeated data and full of tiebreakers (such as vestibulands with walled scores, similar names, in the same age groups).

In addition, many algorithms break the arrays into pieces that will be sorted separately. Like mergesort, it can split the array into 32 sequences that can be sorted with insertion sort, then join in 16, 8, 4, 2 and finally the ordered integer array. Quicksort also often starts on either side of the pivot smaller and smaller pieces of arrays that at some point may be so small that it's best to apply quicksort. You may find it a small benefit, but it is a benefit and can come in handy in various arrays of small and medium sizes, even in worst cases (provided you choose very well until what array size the insertion sort will sort).

  • Heap sort

Unlike tree sort, this algorithm is also geared towards sorting arrays despite theoretically abstracting a tree structure from them. Its complexity of operations, both comparisons of data and their movements, is O(n*log(n)). Also, it has a benefit that merge sort no: no auxiliary array. There is no need to allocate memory to order with maximum performance and no shift.

But it has two weaknesses. First, it repeatedly exchanges positions of pairs of elements of various parts of the vector in such a way that it breaks stability. Second, the number of array value comparisons in the worst case heap sort is around twice the merge sort. It's like inserting each element into a complete binary tree twice, in the worst case is placed as far as possible from the depth that is initially placed.

Thus, it apparently loses to the previous two algorithms, but at least it serves for arrays and does not need an auxiliary vector. Until then, it seems like the way is to use merge sort if you can, and if you can't allocate the helper array, unfortunately you choose between stability (I think insertion sort "binary") and performance (heap sort). Of course, with more data (associating to each element the initial index) one can give stability to heap sort, but this tiebreaker of the comparison and the movement of data costs.

  • Quick Sort

Is an algorithm known to be applicable in arrays and considered fast in their average cases (O(n*log(n)) comparisons and swaps). But it is also known to be unstable, losing to insertion sort in the best case (O(n*log(n)) against O(n)) and still losing to heap sort (and therefore merge sort) in the worst case (O(n²) against O(n*log(n))). In addition, there are no known ways to implement it without data stacking (memory complexity O(log(n)) if well implemented), either with recursive calls or "manually" with arrays.

Still, the truth is that it has flexible potential for variants with distinct speeds. The quick sort follows the principle of selecting (by some criteria, not being a standard but several popular ones) a pivot and separating the array into two to be next ordered, one to the left of the pivot with the values that should be to the left of it when you finish ordering and another to the right with those that should be to the right. Only the choice of Criterion causes great impact on the result in terms of performance.

To analyze, I developed in Maple 15 the following procedure that serves to build recursive formulas with reusable result storage to speed up the calculation and thus allow calculating in time with relatively large values. It will now be used to calculate number of quick sort comparisons in the worst case from the recursions.

insert the description of the image here

If the pivot choice criterion, even if random, is to immediately select a value without comparing with others, the worst case scenario involves that value staying at one end of the array, separating one of size "n" into one of size "0" and another of size "n-1". I mean, now you have to sort an array of size "n-1". This results in the same number of comparisons that the insertion sort in the worst case, i.e. quadratic complexity.

insert the description of the image here

If the criterion is the classic median of three (sort any three to know which is in the middle), the division of the worst case is into "1" and "n-2" elements and this greatly improves performance. Using the maple function, we see that apparently the velocity grows from x1 to convergence to x2.

insert the description of the image here

If we invent to take the median of five, the worst case results in pieces of size " 2 "and"n-3". Speed up from x1 to x3.

insert the description of the image here

Median thirst? It's "3" and " n-4." Speed up x1 to x4.

insert the description of the image here

In other words, generically to have minimum stretch of" m "elements are ordered" p=2 * m+1 " values(odd quantity), then "m=(p-1)/2". Notice that if "p = n" then you sort the whole array to get the pivot to sort the array, i.e. not it makes sense to fetch the pivot, the array is ordered, it has to have "p

As a result, at a recursion level the algorithm orders "p" elements, one of them is pivot, "m" is to the left of it, "m" is to the right, there are "n-p" left to be compared with the pivot and in the worst case they all end up on one side to then sort the two sections (going all Recursively, the cost of comparing size array "n" is the cost of ordering " p " plus the cost of comparing and moving "m-p "to the other side of the pivot plus the cost of ordering" m "and plus the cost of ordering" m+n-p " (n-m-1).

Also, apparently increasing the ordered stretch in pivot Selection improves performance in the worst case of large arrays. Is it? Taking the integer array for this causes infinite recursive call, of course. What if you get "n-1"?

insert the description of the image here

When sorting "n-1" and selecting pivot, only one more it will have to be compared with the He to decide which side it stands on. On one side are floor((n-2)/2), on the other ceil((n-2)/2+1) ("+1" because it includes in this major what was not used to choose pivot) and, despite the division into almost equal pieces (it is almost better case), for this it needed an ordering of almost everything. The result was almost zero speed multiplier, that is, lost speed. What if you look for pivot in" n-2 " elements?

insert the description of the image here

The factor is much greater than the previous, but remains practically zero. That is, small arrays in the search improve by increasing them and large ones improve by decreasing them, which means that it must have a breakeven point.

Even notice that by switching to using three elements (instead of one) to find the pivot there are performance improvements in all arrays, already from 3 to 5 it worsens the arrays of size n=6,7,8,9,10 but ties with all the others below "n=11" and improves from there on. From 5 to 7, there is also a size a from which it is worth, but it is much larger.

In other words, apparently the larger the array the greater is this correct amount of values to be used in the pivot search. Will it be possible to find a formula for this optimal value? If you find and apply it to every recursion of quick sort, does the complexity in the worst case reduce to O(n*log(n))? Does it win at least the heap sort, who knows the merge sort?

insert the description of the image here

Although I did not find answer for all questions, yet it was possible to implement recursion that finds comparisons of the worst case and optimal value by means of a recursion of two values [ Comparações , ValorÓtimo ] and answer some of the most important ones.

First, notice that for each quantity " p=1,3,5,..."the only one that appears only once is 19, just before 17 appears several times. It doesn't seem easy to find a formula that calculates "p," right? It seems to be a sequence as well unstable.

Second, notice that the speedup at "n = 9999" was x58. It means that the improvement over the more basic algorithm of constant "p=1" is absurdly high. On the other hand, using the merge sort comparison formula we find x7 from speed up when exiting this quick sort to merge sort at "n=9999". It means that, apparently, not only merge sort but also heap sort beat you in number of comparisons at worst case.

insert the description of the image here

But it is noticed that initially the number of comparisons of quick sort and merge sort are equated and are distancing, that is, until a certain moment the improved quick sort wins the Heap sort and from it the roles are reversed.

There are more variants of quick sort, for example in means of selecting the position of the stretch where it searches for the pivot, adaptation to gain stability, number of pivots, etc. But it goes from creativity, are endless possibilities and never ending when you stick to it. In addition, delves into this do not seem compensating in this context, because after all if you look at the examples I cited:

  • snippet position does not change complexity or stability,
  • this stability is gained with shift or memory allocation (same pair of options when implementing merge sort, which is better in worst case) and
  • more pivots are experimentally good (near-average cases) but it doesn't seem efficient in the worst case as there are more pivots to multiply the comparisons and elements to be moved to each recursion level with low (or no) reduction in the number of recursions.

In other words, until then the best seems to be to use merge sort if there is memory availability, otherwise you can choose to sort with heap sort or" quick optimal sort " depending on the size of the array. Since these two algorithms are unstable, prioritize stability above performance even with difference in complexity, one can think of the insertion sort.

  • conclusion

The difference in performance of an ordering algorithm is noticeable not only between the various ways of implementing the same and in the same type of sortable structure but also of the versions in different types of structures. There are unique algorithms of certain types, others serve for several but are more suitable for one than another. Up there are cases of heavier structures that, even so, an algorithm is better in them than others because they offer features that make it possible in them less complexities. There is a vastness of facts to consider.

With the use of pointers, one can structure data so that one can insert, Access, move and remove them in the most particular ways, far beyond copying values saved there and overlapping the old ones to simulate data movement (as one does in arrays). While in arrays we read and write when locating with index, other structures have benefits such as ways to search for a data and insert in the vicinity of it another. This certainly impacts the performance of a sorting algorithm, after all an expensive shift in an array is not necessary in a list for example, only chaining with the resolve pointers. Finding a data in an index instantly is better than in order O(log(n)), but inserting it between two others is simpler in a tree.

Due to this, we cannot speak as if algorithms are always going to have the performance results that we think they will have. What is good in one situation may not be good in another. Attention is required to each case. We can not only in an insertion sort do element shift, which naturally happens to need to shift several elements to insert one, but we can also have the idea of doing this in merge sort, quick sort and other algorithms (which were not made for this) to deploy artificial stabilities, lack of memory allocation and other things that ultimately impair performance even while maintaining the complexity of comparisons. Care must be taken not to fall into traps like these and be surprised later. The main costs are general instruction and time, not comparisons as it became popular.

Without certain benefits, choosing an algorithm for arrays becomes difficult. Sorting may require space allocation (such as traditionally it is done, for example, in the array mergesort and creates memory demand), shift values (as traditionally it is done in isertion sort and creates performance problems in long shifts), swap values (as traditionally it is done in quicksort and destroys the stability of the sorting), not having stability, not having the desired complexity, all for not having resources that can think.

  • verdict

How then to define a better algorithm in worst cases?

Let's start with the simplest. Small Array? Insert sort. Comparisons of simple data types? Even better.

And large volumes of data? It may be appropriate to do algorithm mixes, such as an algorithm that sorts small chunks with insert sort, medium with "quick sort optimal", and large with heap sort or merge sort. Yes, the algorithm main can be the one that creates the divisions or that completely sorts a large array but not medium and small arrays. The intermediate algorithm can break even more the array or piece of it that it has to sort, the little one sorts the smaller pieces and if after that the intermediate or Main has more to do (like merge, which makes the biggest merge at the end) it takes these ordered pieces to complete the service with the strategy that most reduces complexity.

More to choose the algorithms we need to observe their properties in array ordering and evaluate the conditions.

As we noted earlier, the only flaw in merge sort is the memory demand. If you have space, it's a good one to use. If you have huge and complex data structures, already taking up a good part of the memory, you may need to give up the benefits of this algorithm.

If it prioritizes performance without risking in the worst cases, heap sort does the service without the demand of memory. If you accept to risk a little, quick sort is considered a good idea. Perhaps the best to do thinking about worst cases is in case of large array apply heap sort (total or largely and leave the rest with insertion sort) and in case of not so large array apply "quick sort optimal", perhaps refining into small pieces to sort with insertion sort.

I do not know reasonable performance stable algorithm that saves memory.

I still have to take a good look at the timsort and smootchsort.

Continue

 3
Author: RHER WOLF, 2020-09-04 02:10:45