hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3442) QuickSort may get into unbounded recursion
Date Thu, 05 Jun 2008 19:17:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602752#action_12602752
] 

Chris Douglas commented on HADOOP-3442:
---------------------------------------

bq. if p+r < 0, it is a problem.

Until the first compare, when the IndexedSortable (i.e. MapOutputBuffer) will throw an ArrayIndexOutOfBoundsException.

bq. Is this a new bug? Has the sort code been change recently or something that feeds it?

The QuickSort is new in 0.17; we used to use MergeSort.

bq. Don't the classic texts have chapters on pivot selection?

The default HashPartitioner- which almost everybody uses- should produce fairly random distributions,
save equal keys (which is why increasing the number of reducers usually makes this go away).
It was verified that the "median-of-three" partitioning would be sufficient to handle the
sorted, single-reducer case, which is the canonical worst-case for quicksort. In the literature
reviewed, cases where "median-of-three" partitioning fails are spoken of as being deliberately
crafted, i.e. as possible DoS vectors, not as something common to real-world data. Someone
was able to get a sharable spill and indices, so we'll do some analysis to figure out why
it's more common than we thought it would be.

bq. Randomize the input data before calling quicksort (O\(n)) cost

Randomizing the indices should be pretty cheap, so this is probably a good idea. Since we're
confident that this really is a problem with the recursion, we should also investigate Chad's
suggestion of Introsort. This is what the STL uses: http://www.sgi.com/tech/stl/sort.html
(see [2])

> QuickSort may get into unbounded recursion
> ------------------------------------------
>
>                 Key: HADOOP-3442
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3442
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>         Attachments: 3442-0.patch, 3442-0v17.patch, CheckSortBuffer.java, HADOOP-3442.patch,
overflow.zip, spillbuffers.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message