pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2779) Refactoring the code for setting number of reducers
Date Wed, 25 Jul 2012 04:56:35 GMT

    [ https://issues.apache.org/jira/browse/PIG-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422001#comment-13422001

Bill Graham commented on PIG-2779:

Jie, as part of this clean up it would be really useful to record the various parallelism
values in the job conf for later analysis. I was thinking we capture defaultParallel, requestedParallelism
and runtimeParallelism (which should == {{mapred.reduce.tasks}}, right). That way we can see
later which values were set and which was used. It would be great to know whether parallelism
was determined by a {{PARALLEL}} statement, via estimation, or via a default. This would be
in addition to the following related params we currently capture:


Do you want to add this to this issue or do you think we should we do this in a separate JIRA?

I use IntelliJ and I've just set the syntax to match Apaches.
> Refactoring the code for setting number of reducers
> ---------------------------------------------------
>                 Key: PIG-2779
>                 URL: https://issues.apache.org/jira/browse/PIG-2779
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jie Li
>            Assignee: Jie Li
>             Fix For: 0.11
>         Attachments: PIG-2779.0.patch, PIG-2779.1.patch, PIG-2779.2.patch, TestNumberOfReducers.java,
> As PIG-2652 observed, currently the code for setting number of reducers is a little messy.
MapReduceOper.requestedParallelism seems being misused in some plases, and now we support
runtime estimation of #reducer which further complicates the problem.
> For example, if we specify parallel 1 for the order-by, the estimated #reducer will be
used. If we specify parallel 2 while it estimates 4, order-by will fail due to "Illegal partition
for Null". If we specify parallel 4 while it estimates 2, then some reducers will have nothing
to do. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message