pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3928) Reducer estimator gets wrong configuration for ORDER_BY job
Date Tue, 28 Apr 2015 23:07:06 GMT

     [ https://issues.apache.org/jira/browse/PIG-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-3928:
----------------------------
    Fix Version/s:     (was: 0.15.0)
                   0.16.0

> Reducer estimator gets wrong configuration for ORDER_BY job
> -----------------------------------------------------------
>
>                 Key: PIG-3928
>                 URL: https://issues.apache.org/jira/browse/PIG-3928
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.1, 0.13.0
>            Reporter: Aniket Mokashi
>             Fix For: 0.16.0
>
>
> SAMPLER job requires a parameter that needs to be equal to number of reducers used by
ORDER_BY job. This is done by getting successor of SAMPLER job and estimating reducers for
it in the following code. However, job (conf) passed to calculateRuntimeReducers is corresponding
to SAMPLER job instead of ORDER_BY job which causes problems in some custom reducer estimators
that depend on the configuration.
> {code}
> // inside org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>     public void adjustNumReducers(MROperPlan plan, MapReduceOper mro,
>             org.apache.hadoop.mapreduce.Job nwJob) throws IOException {
>         int jobParallelism = calculateRuntimeReducers(mro, nwJob);
>         if (mro.isSampler() && plan.getSuccessors(mro) != null) {
>             // We need to calculate the final number of reducers of the next job (order-by
or skew-join)
>             // to generate the quantfile.
>             MapReduceOper nextMro = plan.getSuccessors(mro).get(0);
>             // Here we use the same conf and Job to calculate the runtime #reducers of
the next job
>             // which is fine as the statistics comes from the nextMro's POLoads
>             int nPartitions = calculateRuntimeReducers(nextMro, nwJob);
>             // set the runtime #reducer of the next job as the #partition
>             ParallelConstantVisitor visitor =
>                     new ParallelConstantVisitor(mro.reducePlan, nPartitions);
>             visitor.visit();
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message