hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-858) Order By followed by "replicated" join fails while compiling MR-plan from physical plan
Date Sun, 19 Jul 2009 19:53:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733057#action_12733057

Ashutosh Chauhan commented on PIG-858:

While POFRJoin is getting compiled in MRCompiler, it needs to identify for each of its 
predecessor in physical plan of which compiled MROperator they are part of. Currently, it
assumed to be one of the compiledInputs(an array of MRoper which are immediate predecessor
of current MROper in MROper DAG). 
Mostly this is true, but in cases where one physical operator results in two or more MR operator,
this may not be true, as is the
case here. When there is an order-by before FRJoin; one of the inputs of POFRJoin will be
POSort, but POSort operator will be in the first MROper of the two generated MROperator
and thus will not be found in compiledInputs (which contains second MROper). Thus,
current way of identifying corresponding MRoper of a physical operator is unreliable.
This bug also affects the implementation of merge-sort join 
https://issues.apache.org/jira/browse/PIG-845 . Since POMergeJoin needs to know which MROper
corresponds to its left input and which one corresponds to its right. It can do so by looking
into compiledInputs as long as there is no order-by (or similiar PO which results in
multiple MROper) as its predecessors. Doing order-by before using merge
join is however a natural use-case there.

Proposal is to introduce a new private member variable in MRCompiler phyToMROperMap 
(similiar to logToPhyMap) using which leaf MROper for a given
physical operator can be identified. Thoughts?

> Order By followed by "replicated" join fails while compiling MR-plan from physical plan
> ---------------------------------------------------------------------------------------
>                 Key: PIG-858
>                 URL: https://issues.apache.org/jira/browse/PIG-858
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.4.0
> Consider the query:
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0;
> explain C;
> {code}
> works. But if replicated join is used instead
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0 using "replicated";
> explain C;
> {code}
> this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error compiling
operator POFRJoin
> relevant stacktrace:
> {code}
> Caused by: java.lang.RuntimeException: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
ERROR 2034: Error compiling operator POFRJoin
>         at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
>         at org.apache.pig.PigServer.explain(PigServer.java:574)
>         ... 8 more
> Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
ERROR 2034: Error compiling operator POFRJoin
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
>         at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
>         ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
>         ... 16 more
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message