incubator-stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Gardner (JIRA)" <j...@apache.org>
Subject [jira] Commented: (STDCXX-397) std::sort introsort implementation error
Date Mon, 30 Apr 2007 14:38:15 GMT

    [ https://issues.apache.org/jira/browse/STDCXX-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492679
] 

Todd Gardner commented on STDCXX-397:
-------------------------------------

Martin,

This code can generate a killer set for the problem:

#include <vector>
#include <fstream>

const int SetSize = 1000;
int main(){
  using std::vector;
  vector<int> KillerSet;

  KillerSet.push_back(9998);
  KillerSet.push_back(9997);
  KillerSet.push_back(9997);

  int CurrNumber = 9996;
  while(KillerSet.size() < SetSize) {
    vector<int>::iterator Middle = KillerSet.end()-((KillerSet.size()/2));
    int MidValue = *Middle;
    *Middle = CurrNumber;
    KillerSet.push_back(MidValue);
    KillerSet.push_back(CurrNumber-1);
    CurrNumber-=2;
  }
  ofstream OutFile("killerset2.txt");
  for(vector<int>::reverse_iterator KSCurr = KillerSet.rbegin(), KSEnd = KillerSet.rend();
KSCurr !=KSEnd; ++KSCurr) {
    OutFile << *KSCurr << "\n";
  }
  OutFile.close();
}


Modifying the algorithm to keep track of depth on the right side recursion:

// David R. Musser's Introspective Sorting algorithm
// O(N * log (N)) worst case complexity
_EXPORT
template <class _RandomAccessIter, class _Dist, class _Compare>
void __introsort_loop (_RandomAccessIter __first, _RandomAccessIter __last,
                       _Dist __max_depth, _Compare __comp, int Depth)   // Pass in 0 on the
initial call
{
    std::cout << "Depth: " << Depth << endl;
    for (; __last - __first > __rw_threshold; __max_depth /= 2) {
        if (0 == __max_depth) {
            __partial_sort (__first, __last, __last,
                            _RWSTD_VALUE_TYPE (_RandomAccessIter), __comp);
            break;
        }

        _RandomAccessIter __cut =
            __unguarded_partition (__first, __last,
                                   __median (*__first,
                                             *(__first + (__last - __first) /2),
                                             *(__last - 1), __comp), __comp);

        // limit the depth of the recursion tree to log2 (last - first)
        // where first and last are the initial values passed in from sort()
        __introsort_loop (__cut, __last, __max_depth, __comp, Depth+1);
        __last = __cut;
    }
} 

And then running it on the output of the first code gives the output:
Depth: 0
Depth: 1
Depth: 2
Depth: 3
Depth: 4
...
Depth: 489
Depth: 490
Depth: 491
Depth: 492
Depth: 493

Showing that the recursion doesn't have a logarithmic limit on the right hand side.

> std::sort introsort implementation error
> ----------------------------------------
>
>                 Key: STDCXX-397
>                 URL: https://issues.apache.org/jira/browse/STDCXX-397
>             Project: C++ Standard Library
>          Issue Type: Bug
>          Components: 25. Algorithms
>    Affects Versions: 4.1.3
>         Environment: Bug in algorithm.cc effects all platforms
>            Reporter: Joshua Lehrer
>
> introsort is designed to detect when an input set would push quicksort into its worst-case
scenario N^2 and fall back on a slower, yet still NLogN algorithm.
> The implementation in __introsort_loop has a bug, however, and it fails to catch all
of the scenarios.  While I can not supply an exact input set to demonstrate the bug, I can
explain the bug very easily.
> First, allow me to paste in the code:
> // David R. Musser's Introspective Sorting algorithm
> // O(N * log (N)) worst case complexity
> _EXPORT
> template <class _RandomAccessIter, class _Dist, class _Compare>
> void __introsort_loop (_RandomAccessIter __first, _RandomAccessIter __last,
>                        _Dist __max_depth, _Compare __comp)
> {
>     for (; __last - __first > __rw_threshold; __max_depth /= 2) {
>         if (0 == __max_depth) {
>             __partial_sort (__first, __last, __last,
>                             _RWSTD_VALUE_TYPE (_RandomAccessIter), __comp);
>             break;
>         }
>         _RandomAccessIter __cut =
>             __unguarded_partition (__first, __last,
>                                    __median (*__first,
>                                              *(__first + (__last - __first) /2),
>                                              *(__last - 1), __comp), __comp);
>         // limit the depth of the recursion tree to log2 (last - first)
>         // where first and last are the initial values passed in from sort()
>         __introsort_loop (__cut, __last, __max_depth, __comp);
>         __last = __cut;
>     }
> }
> the variable '__max_depth' is supposed to be cut in half on each subsequent "recursive"
call.  Once it reaches zero, LogN recurisve calls have been made, and the algorithm falls
back on a different sorting algorithm for the remainder.
> The algorithm, as implemented, uses real recursion and tail recursion.
> First, the pivot is selected, the pivot is done, and the algorithm has a left and a right
half, hopefully balanced.
> Consider what happens for the LEFT half, which is done using tail recursion.  '__last'
gets assigned '__cut', then the code goes to the top of the 'for' loop.  The test condition
of the loop is run, which divides '__max_depth' by two, bringing it closer to zero.
> Now consider what happens for the RIGHT half, which is done using real recursion.  The
function is called recurisvely on the right.  __max_depth is NOT cut in half.
> What would happen if a poor pivot was selected causing the right half to be large and
the left half to be small?  What if that happens again and again?  The real-recursion case
is failing to decrement __max_depth until it starts working on the left half.  You can see
how if the algorithm continually built right-halves that were relatively large that __max_depth
never gets decremented, and the algorithm never detects that it has made LogN recurisve calls.
> I believe the proper fix is as follows:
> // David R. Musser's Introspective Sorting algorithm
> // O(N * log (N)) worst case complexity
> _EXPORT
> template <class _RandomAccessIter, class _Dist, class _Compare>
> void __introsort_loop (_RandomAccessIter __first, _RandomAccessIter __last,
>                        _Dist __max_depth, _Compare __comp)
> {
>     for (; __last - __first > __rw_threshold; ) {
>         if (0 == __max_depth) {
>             __partial_sort (__first, __last, __last,
>                             _RWSTD_VALUE_TYPE (_RandomAccessIter), __comp);
>             break;
>         }
>         _RandomAccessIter __cut =
>             __unguarded_partition (__first, __last,
>                                    __median (*__first,
>                                              *(__first + (__last - __first) /2),
>                                              *(__last - 1), __comp), __comp);
>         // limit the depth of the recursion tree to log2 (last - first)
>         // where first and last are the initial values passed in from sort()
>         __max_depth /= 2;
>         __introsort_loop (__cut, __last, __max_depth, __comp);
>         __last = __cut;
>     }
> }
> "__max_depth/=2" is removed from the "for" loop and placed just above the two recursive
calls.
> This fixes the worst-case sample set that I have generated.
> I look forward to your response,
> joshua lehrer
> http://www.lehrerfamily.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message