pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-895) Default parallel for Pig
Date Tue, 28 Jul 2009 03:36:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735903#action_12735903

Hadoop QA commented on PIG-895:

+1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 797290.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/143/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/143/console

This message is automatically generated.

> Default parallel for Pig
> ------------------------
>                 Key: PIG-895
>                 URL: https://issues.apache.org/jira/browse/PIG-895
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>             Fix For: 0.4.0
>         Attachments: PIG-895-1.patch, PIG-895-2.patch, PIG-895-3.patch
> For hadoop 20, if user don't specify the number of reducers, hadoop will use 1 reducer
as the default value. It is different from previous of hadoop, in which default reducer number
is usually good. 1 reducer is not what user want for sure. Although user can use "parallel"
keyword to specify number of reducers for each statement, it is wordy. We need a convenient
way for users to express a desired number of reducers. Here is my propose:
> 1. Add one property "default_parallel" to Pig. User can set default_parallel in script.
>    set default_parallel '10';
> 2. default_parallel is a hint to Pig. Pig is free to optimize the number of reducers
(unlike parallel keyword). Currently, since we do not have a mechanism to determine the optimal
number of reducers, default_parallel will be always granted, unless it is override by "parallel"
> 3. If user put multiple default_parallel inside script, the last entry will be taken.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message