pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4790) Join after union fail due to UnionOptimizer
Date Wed, 10 Feb 2016 09:54:18 GMT

    [ https://issues.apache.org/jira/browse/PIG-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140582#comment-15140582
] 

Rohini Palaniswamy commented on PIG-4790:
-----------------------------------------

The difference in the complex script is that one of the edges is a vertex group. The fix has
a problem though. It turns of UnionOptimizer for the simple case as well where the edges are
normal which the previous patch handled. We should avoid turning off UnionOptimizer as much
as possible because the performance of UnorderedPartitionedKVOutput is currently very bad
and is not fixed yet. Would be good to add the script to TestTezCompiler as well. 

> Join after union fail due to UnionOptimizer
> -------------------------------------------
>
>                 Key: PIG-4790
>                 URL: https://issues.apache.org/jira/browse/PIG-4790
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.16.0
>
>         Attachments: PIG-4790-1.patch, PIG-4790-2.patch
>
>
> The following script fail to run:
> {code}
> rmf ooo
> a = load 'student.txt' as (name:chararray, age:int, gpa:double);
> b = filter a by age > 65;
> c = filter a by age <=10;
> d = union b, c;
> e = join a by name left, d by name;
> store e into 'ooo';
> {code}
> Exception stack:
> {code}
> Caused by: java.lang.IllegalArgumentException: Edge [scope-43 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor]
-> [scope-55 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor]
({ SCATTER_GATHER : org.apache.tez.runtime.library.input.OrderedGroupedKVInput >> PERSISTED
>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput >> NullEdgeManager
}) already defined!
>         at org.apache.tez.dag.api.DAG.addEdge(DAG.java:272)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:311)
>         at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:252)
>         at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
>         at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
>         at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:65)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:111)
>         ... 20 more
> {code}
> Disable pig.tez.opt.union the script runs fine.
> Seems we shall detect this patten and disallow merge vertex group into a pair already
has an edge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message