pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2779) Refactoring the code for setting number of reducers
Date Fri, 29 Jun 2012 17:48:42 GMT

    [ https://issues.apache.org/jira/browse/PIG-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404063#comment-13404063

Jie Li commented on PIG-2779:

For the order-by, we need to pass its *final* #reducer (not the estimated one) to the sample
job to generate the partition file, otherwise the partition file will be inconsistent and
cause errors.

The final #reducer is calculated based on the requested one and the estimated one, the latter
of which is calculated based on the input data size. Luckily the sample job has the same input
data with the order-by, thus it can calculate in advance the final #reducer of the order-by.
> Refactoring the code for setting number of reducers
> ---------------------------------------------------
>                 Key: PIG-2779
>                 URL: https://issues.apache.org/jira/browse/PIG-2779
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jie Li
>             Fix For: 0.11
> As PIG-2652 observed, currently the code for setting number of reducers is a little messy.
MapReduceOper.requestedParallelism seems being misused in some plases, and now we support
runtime estimation of #reducer which further complicates the problem.
> For example, if we specify parallel 1 for the order-by, the estimated #reducer will be
used. If we specify parallel 2 while it estimates 4, order-by will fail due to "Illegal partition
for Null". If we specify parallel 4 while it estimates 2, then some reducers will have nothing
to do. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message