pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4790) Join after union fail due to UnionOptimizer
Date Wed, 10 Feb 2016 01:20:18 GMT

     [ https://issues.apache.org/jira/browse/PIG-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-4790:
----------------------------
    Attachment: PIG-4790-2.patch

The patch fix the script above. However, a more complex script fail for the same stack:
{code}
a = LOAD 'studenttab10k' AS (name:chararray, age:int, gpa:double);

SPLIT a INTO b IF age > 40,
             c IF age <= 40;

d = FOREACH c GENERATE name, age, gpa;

e = FILTER d BY gpa > 3;
f = FILTER d BY gpa <= 3;

g = JOIN e BY name LEFT, f BY name;
h = FOREACH g GENERATE e::name as name, e::age as age, e::gpa as gpa;

i = DISTINCT h;

j = FILTER b BY gpa > 3;
k = FILTER b by gpa <= 3;

l = JOIN j BY name LEFT, k BY name;
m = FOREACH l generate j::name as name, j::age as age, j::gpa as gpa;
n = DISTINCT m;

m = UNION e, i, j, n;

n = JOIN a BY name, m BY name;

STORE n INTO 'ooo';
{code}

Attach another patch to fix both.

> Join after union fail due to UnionOptimizer
> -------------------------------------------
>
>                 Key: PIG-4790
>                 URL: https://issues.apache.org/jira/browse/PIG-4790
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.16.0
>
>         Attachments: PIG-4790-1.patch, PIG-4790-2.patch
>
>
> The following script fail to run:
> {code}
> rmf ooo
> a = load 'student.txt' as (name:chararray, age:int, gpa:double);
> b = filter a by age > 65;
> c = filter a by age <=10;
> d = union b, c;
> e = join a by name left, d by name;
> store e into 'ooo';
> {code}
> Exception stack:
> {code}
> Caused by: java.lang.IllegalArgumentException: Edge [scope-43 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor]
-> [scope-55 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor]
({ SCATTER_GATHER : org.apache.tez.runtime.library.input.OrderedGroupedKVInput >> PERSISTED
>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput >> NullEdgeManager
}) already defined!
>         at org.apache.tez.dag.api.DAG.addEdge(DAG.java:272)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:311)
>         at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:252)
>         at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
>         at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
>         at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:65)
>         at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:111)
>         ... 20 more
> {code}
> Disable pig.tez.opt.union the script runs fine.
> Seems we shall detect this patten and disallow merge vertex group into a pair already
has an edge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message