hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3442) QuickSort may get into unbounded recursion
Date Thu, 05 Jun 2008 22:38:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602823#action_12602823
] 

Chris Douglas commented on HADOOP-3442:
---------------------------------------

Analysis of the data (thanks to everyone who provided their test cases) led us to consider
the following degenerate case:

Consider a partition:
{noformat}
a_n, a_1, a_2, ... , a_n-2, a_n-1
{noformat}

Where {{a_1 ... a_n-1}} are sorted. The median of three partitioning will consider {{a_n}},
{{a_n/2}}, and {{a_n-1}} and select {{a_n-1}} as the pivot. While the sort runs:
{noformat}
a_n-1, a_1, a_2, ... , a_n-2, a_n
{noformat}

The left index will run all the way to {{a_n}} and swap the pivot into place, yielding the
following:
{noformat}
a_n-2, a_1, a_2, ... , a_n-3, a_n-1, a_n
{noformat}

So the next partition will get:
{noformat}
a_n-2, a_1, a_2, ... , a_n-4, a_n-3
{noformat}
So while sorted data will yield a series of optimal partitions, nearly sorted data like this
can cause the sort to fall into a degenerate case. Among the suggestions to ameliorate this:
# Consider the median and two random offsets for the median-of-three partitioning (or three
random offsets, etc.)
# Always pick a random pivot
# After swapping the pivot into place, swap what it replaced into a random position in the
left partition

Randomizing the input data makes this case far less common and Introsort regards it as an
inevitable, degenerate case; both are also sound additions.

> QuickSort may get into unbounded recursion
> ------------------------------------------
>
>                 Key: HADOOP-3442
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3442
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>         Attachments: 3442-0.patch, 3442-0v17.patch, CheckSortBuffer.java, HADOOP-3442.patch,
overflow.zip, spillbuffers.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message