[ https://issues.apache.org/jira/browse/HADOOP3442?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12602823#action_12602823 ]
Chris Douglas commented on HADOOP3442:

Analysis of the data (thanks to everyone who provided their test cases) led us to consider the following degenerate case:
Consider a partition:
{noformat}
a_n, a_1, a_2, ... , a_n2, a_n1
{noformat}
Where {{a_1 ... a_n1}} are sorted. The median of three partitioning will consider {{a_n}}, {{a_n/2}}, and {{a_n1}} and select {{a_n1}} as the pivot. While the sort runs:
{noformat}
a_n1, a_1, a_2, ... , a_n2, a_n
{noformat}
The left index will run all the way to {{a_n}} and swap the pivot into place, yielding the following:
{noformat}
a_n2, a_1, a_2, ... , a_n3, a_n1, a_n
{noformat}
So the next partition will get:
{noformat}
a_n2, a_1, a_2, ... , a_n4, a_n3
{noformat}
So while sorted data will yield a series of optimal partitions, nearly sorted data like this can cause the sort to fall into a degenerate case. Among the suggestions to ameliorate this:
# Consider the median and two random offsets for the medianofthree partitioning (or three random offsets, etc.)
# Always pick a random pivot
# After swapping the pivot into place, swap what it replaced into a random position in the left partition
Randomizing the input data makes this case far less common and Introsort regards it as an inevitable, degenerate case; both are also sound additions.
> QuickSort may get into unbounded recursion
> 
>
> Key: HADOOP3442
> URL: https://issues.apache.org/jira/browse/HADOOP3442
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Runping Qi
> Assignee: Chris Douglas
> Attachments: 34420.patch, 34420v17.patch, CheckSortBuffer.java, HADOOP3442.patch, overflow.zip, spillbuffers.patch
>
>

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.