hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach
Date Fri, 04 Dec 2009 20:08:20 GMT

    [ https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786124#action_12786124

Pradeep Kamath commented on PIG-747:

I did some investigation and here are some observations:
Consider the following foreach segment which is similar to the script above:
foreach a generate {
 X = 10;
 Y = X + X;
 generate Y;

Currently it looks like in the logical plan we connect the same instance of LOConst (X) twice
to the LOAdd (Y). In LogToPhyTranslationVisitor,  each successor of an operator is supposed
to get a different instance of the operator as its predecessor  because DependencyOrderWalkerWOSeenChk
is used to visit the inner foreach plan and a new Physical Operator is created each time a
Logical operator is seen (even if it is the same instance of the Logical Operator). However
the LogToPhyTranslationVisitor maintains a LogToPhyMap which is hashmap for mapping between
a logicaloperator and translated PhysicalOperator. Since this is a HashMap and not a MultiMap,
the LOConst gets mapped to the last POConst created and POAdd gets connected to it twice.

Options to solve this:
1) Change the design in LogToPhyTranslationVisitor to handle this by using a MultiMap - this
might be pretty involved - not sure on the extent of changes required
2) Change the parser to create copies originally in the nested foreach of the LogicalPlan
and then LogToPhyTranslation doesn't need to worry about this case - this seems more cleaner
- again unsure on how easy this is.

> Logical to Physical Plan Translation fails when temporary alias are created within foreach
> ------------------------------------------------------------------------------------------
>                 Key: PIG-747
>                 URL: https://issues.apache.org/jira/browse/PIG-747
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Viraj Bhat
>            Assignee: Daniel Dai
>             Fix For: 0.7.0
>         Attachments: physicalplan.txt, physicalplanprob.pig, PIG-747-1.patch
> Consider a the pig script which calculates a new column F inside the foreach as:
> {code}
> A = load 'physicalplan.txt' as (col1,col2,col3);
> B = foreach A {
>    D = col1/col2;
>    E = col3/col2;
>    F = E - (D*D);
>    generate
>    F as newcol;
> };
> dump B;
> {code}
> This gives the following error:
> =======================================================================================================================================
> Caused by: org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
ERROR 2015: Invalid physical operators in the physical plan
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377)
>         at org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63)
>         at org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29)
>         at org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908)
>         at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
>         at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
>         at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
>         at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>         at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
>         ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give operator
of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide
multiple outputs.  This operator does not support multiple outputs.
>         at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373)
>         ... 19 more
> =======================================================================================================================================

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message