hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: LocalRearrange out of bounds exception - tips for debugging?
Date Tue, 13 Oct 2009 20:01:57 GMT
Have you checked that each record your input data has at least the  
number of fields you specify?  Have you checked that the field  
separator in your data matches the default for PigPerformanceLoader  
(^A I think)?

Alan.

On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote:

> We ran into what looks like some edge case bug in Pig, which causes it
> to throw an IndexOutOfBoundsException (stack trace below).  The script
> just joins two relations; it looks like our data was generated
> incorrectly, and the join is empty, which may be what's causing the
> failure. It also appears to only happen when at least one of the
> inputs is on the large size (at least a few hundred megs).  Any ideas
> on what could be happening and how to zoom in on the underlying cause?
> We are running off unmodified trunk.
>
> Script:
>
> register datagen.jar;
> E =  load 'Employee' using
> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
> (id,name,cc,dc);
> D =  load 'Department' using
> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
> (dept_id,dept_nm);
> P =  load 'Project' using
> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
> (id,emp_id,role);
> R1 = JOIN E by dc, D by dept_id;
> R2 = JOIN R1 by E::id, P by emp_id;
> store R2 into 'TestCase2Output';
>
> R2 join fails with the stack trace below. It also fails if we
> pre-calculate R1, store it, and load it directly (so, load R1, load P,
> join R1 by $0, P by emp_id). We've verified that the records in R1 and
> R2 have the expected fields, etc.
>
>
> Stack Trace:
>
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>        at java.util.ArrayList.get(ArrayList.java:322)
>        at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
>        at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .physicalLayer.expressionOperators.POProject.getNext(POProject.java: 
> 148)
>        at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .physicalLayer.expressionOperators.POProject.getNext(POProject.java: 
> 226)
>        at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .physicalLayer 
> .relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java: 
> 260)
>        at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>        at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>        at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>        at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce 
> $Map.map(PigMapReduce.java:93)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 
> 358)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)


Mime
View raw message