hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3442) QuickSort may get into unbounded recursion
Date Tue, 10 Jun 2008 07:55:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603792#action_12603792
] 

chris.douglas edited comment on HADOOP-3442 at 6/10/08 12:53 AM:
-----------------------------------------------------------------

bq.  QuickSort.getMaxDepth can be more aggressive: since the index are int, we have n <=
2^32 and lg n <= 32. So, we could safely let QuickSort.getMaxDepth be something around
(lg n)^3 or it should depend on the maximum size of system stack.

The 2*log\(n) heuristic was intended to bail out of a worst case, not to protect against the
StackOverflowError. The recursion on the system stack is already limited to log\(n), but quicksort
will always have O(n^2) cases that this needs to handle. I'm happy to discuss an appropriate
value for k, but k*log\(n) seems appropriate to this purpose.

bq. HeapSort is a singleton. Define a static variable HeapSort.INSTANCE. Then, we don't have
to new HeapSort() in QuickSort.
bq. There are quite a few sorting classes/interfaces. Consider create a new package for them.

Of course, you're right; all the algorithms should be singletons, and further, having sort
algorithms as objects at all is not the best design. Still, given that MapTask remains the
only user, adding factories, moving these into a separate package, etc. seems like overkill
at this point. Would it be sufficient to add a singleton instance of HeapSort to QuickSort?

bq. public methods HeapSort.sort(IndexedSortable s, int p, int r) and QuickSort.sort(...)
need javadoc.

OK.

bq. Remove BatcherSort in this patch. Create a new issue if it is useful.

Fair enough. BatcherSort is quick for small sets of data (particularly when compared to QuickSort,
which uses an insertion sort for small partitions), but small datasets aren't exactly a typical
use case. :)

      was (Author: chris.douglas):
    bq.  QuickSort.getMaxDepth can be more aggressive: since the index are int, we have n
<= 2^32 and lg n <= 32. So, we could safely let QuickSort.getMaxDepth be something around
(lg n)^3 or it should depend on the maximum size of system stack.

The 2*log\(n) heuristic was intended to bail out of a worst case, not to protect against the
StackOverflowError. The recursion on the system stack is already limited to log\(n), but quicksort
will always have O(n^2) cases that this needs to handle. I'm happy to discuss an appropriate
value for k, but k*log\(n) seems appropriate to this purpose.

bq. HeapSort is a singleton. Define a static variable HeapSort.INSTANCE. Then, we don't have
to new HeapSort() in QuickSort.
bq. There are quite a few sorting classes/interfaces. Consider create a new package for them.

Of course, you're right; all the algorithms should be singletons, and further, having sort
algorithms as objects at all is not the best design. Still, given that MapTask remains the
only user, adding factories, moving these into a separate package, etc. seems like overkill
at this point. Would it be sufficient to add a singleton instance of HeapSort to QuickSort?

bq. public methods HeapSort.sort(IndexedSortable s, int p, int r) and QuickSort.sort(...)
need javadoc.

OK.

bq. Remove BatcherSort in this patch. Create a new issue if it is useful.

Fair enough. BatcherSort is quick for small sets of data (particularly when compared to QuickSort,
which uses an insertion sort for small partitions), but small datasets aren't exactly a typical
use case. :)

The quicksort implementation might also benefit from some adaptive partitioning, but that
should probably be left to a separate issue.
  
> QuickSort may get into unbounded recursion
> ------------------------------------------
>
>                 Key: HADOOP-3442
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3442
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>            Priority: Blocker
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: 3442-0.patch, 3442-0v17.patch, 3442-1.patch, 3442-2.patch, 3442-3.patch,
CheckSortBuffer.java, HADOOP-3442.patch, overflow.zip, spillbuffers.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message