hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1306) [zebra] Support of locally sorted input splits
Date Sat, 27 Mar 2010 02:34:27 GMT

    [ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850460#action_12850460
] 

Hadoop QA commented on PIG-1306:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12439840/PIG-1306.patch
  against trunk revision 927640.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 32 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/251/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/251/console

This message is automatically generated.

> [zebra] Support of locally sorted input splits
> ----------------------------------------------
>
>                 Key: PIG-1306
>                 URL: https://issues.apache.org/jira/browse/PIG-1306
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>             Fix For: 0.7.0
>
>         Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch,
PIG-1306.patch
>
>
> Current Zebra supports sorted or unsorted input splits on sorted table or sorted table
unions. The sorted input splits are based upon key ranges which do not overlap. And the splits
are basically globally sorted in that they are locally sorted, and their key ranges do not
overlap.
> The biggest problem of the key-range splits are performance hits suffered if data skew
is present, particularly if a key range contains a duplicate key solely which makes the data
trunk of the duplicate keys virtually unsplittable regardless how many mappers are available:
it just has to be processed by a single mapper.
> On the other hand, there are scenarios when the globally sorted splits are a over-kill
and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables
as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message