pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2563) IndexOutOfBoundsException: while projecting fields from a bag
Date Wed, 07 Mar 2012 07:42:02 GMT

    [ https://issues.apache.org/jira/browse/PIG-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224076#comment-13224076
] 

Daniel Dai commented on PIG-2563:
---------------------------------

bq. Cheap code style comments
sure will change

bq. More expensive code content comments
Not sure if I completely understand your point, let me explain the design of foreach nested
plan and why I make the change. Let me know if you need further explanation. Uid and schema
inference process is very core to logical plan. If one changes anywhere in the process, he
needs to make sure the existing functionality is not broken. In the patch, I change the way
project infer its uid, because earlier, it does not generate new uid for the new bag after
nested foreach. Here is how uid for foreach inner plan works:
# every foreach statement starts with LOInnerLoad, ends with LOGenerate
# simple foreach should keep uid, eg: foreach a generate $1, $2, we shall keep the uid for
$1, $2, even if it is a bag column, there are couple of places make this assumption
# if input column is a bag, LOInnerLoad take the schema of its inner schema, eg, if $1 is
bag#2{t#3(x#4, y#5)}, LOInnerLoad will have the schema (x#4, y#5), it can be followed with
nested operator
# LOGenerate regenerates the bag after the inner operator pipeline, in this case, bag#2{t#3(x#4,
y#5)}, we need to keep uid
# currently all nested operator does not change uid, except ForEach, that is the approach
I took in the patch: unless see a ForEach, reuse uid

Here are complete examples:
{code}
b = foreach a generate a1, a2; (a0:xxxx, a1:chararray#1, a2:bag#2{t#3(x#4, y#5)})

LOInnerLoad(a1:chararray)     LOInnerLoad(x#4, y#5)
                    \            /
                    LOGenerate(a1:chararray#1, a2:bag#2{t#3(x#4, y#5)})
{code}

{code}
b = foreach a { c = filter a2 by x==1;generate a1, c; }; (a0:xxxx, a1:chararray#1, a2:bag#2{t#3(x#4,
y#5)})

LOInnerLoad(a1:chararray)     LOInnerLoad(x#4, y#5)
                    \            /
                     \        LOFilter(x#4, y#5)
                      \        /
                    LOGenerate(a1:chararray#1, c:bag#2{t#3(x#4, y#5)})
{code}

{code}
b = foreach a { c = a2.x;generate a1, c; }; (a0:xxxx, a1:chararray#1, a2:bag#2{t#3(x#4, y#5)})

LOInnerLoad(a1:chararray)     LOInnerLoad(x#4, y#5)
                    \            /
                     \        LOForEach(x#4)
                      \        /
                    LOGenerate(a1:chararray#1, c:bag#7{t#6(x#4)})
{code}
                
> IndexOutOfBoundsException: while projecting fields from a bag
> -------------------------------------------------------------
>
>                 Key: PIG-2563
>                 URL: https://issues.apache.org/jira/browse/PIG-2563
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.10
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.10, 0.11
>
>         Attachments: PIG-2563-1.patch
>
>
> The below script fails with Pig 0.9 / Pig 0.10 but works fine for Pig 0.8.
> {code}
> A = load 'i1' as (a,b,c:chararray);
> B = load 'i2' as (d,e,f:chararray);
> C = cogroup A by a, B by d;
> D = foreach C { 
>   tmp = B.d;
>   tmp_dis = distinct tmp;
>   generate A,B,tmp_dis ; } ;
> E = foreach D generate B.(d,e) as v;
> dump E;
> {code}
> The script fails with the below exception. Looks like DereferenceExpression is using
wrong schema to build inner schema.
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> 	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> 	at java.util.ArrayList.get(ArrayList.java:322)
> 	at org.apache.pig.newplan.logical.relational.LogicalSchema.getField(LogicalSchema.java:653)
> 	at org.apache.pig.newplan.logical.expression.DereferenceExpression.getFieldSchema(DereferenceExpression.java:167)
> 	at org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:88)
> 	at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:160)
> 	at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message