hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin
Date Fri, 11 Sep 2009 01:33:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753941#action_12753941

Hadoop QA commented on PIG-951:

+1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 813601.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/23/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/23/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/23/console

This message is automatically generated.

> Reset parallelism to 1 for indexing job in MergeJoin
> ----------------------------------------------------
>                 Key: PIG-951
>                 URL: https://issues.apache.org/jira/browse/PIG-951
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: pig-951.patch
> After sampling one tuple from every block, one reducer is used to sort the index entries
in reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of
index job should be explictly set to 1. Currently, its not.
> Currently, this is a non-issue, since we don't allow any blocking operators in pipeline
before merge-join. However, later when we do allow blocking operators, then parallelism of
indexing job will be that of preceding blocking operator. Even then, job will complete successfully
because all tuple will go to only one reducer, because we are grouping on only one key "all".
However, it will waste cluster resources by starting all the extra reducers which get no data
and thus do nothing.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message