pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
Date Wed, 10 Feb 2010 01:38:27 GMT

     [ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Chauhan updated PIG-1131:
----------------------------------

    Attachment: pig-1131.patch

In POLocalRearrange number of elements in tuple not present in key (and thus put in value)
is computed first time and then cached as an optimization. This patch removes this caching
because of the problem illustrated in the bug. Test case included which reproduces the bug.


> Pig simple join does not work when it contains empty lines
> ----------------------------------------------------------
>
>                 Key: PIG-1131
>                 URL: https://issues.apache.org/jira/browse/PIG-1131
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Viraj Bhat
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: junk1.txt, junk2.txt, pig-1131.patch, simplejoinscript.pig
>
>
> I have a simple script, which does a JOIN.
> {code}
> input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
> describe input1;
> input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
> describe input2;
> joineddata = JOIN input1 by $0, input2 by $0;
> describe joineddata;
> store joineddata into 'result';
> {code}
> The input data contains empty lines.  
> The join fails in the Map phase with the following error in the PRLocalRearrange.java
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> 	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> 	at java.util.ArrayList.get(ArrayList.java:322)
> 	at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> I am surprised that the test cases did not detect this error. Could we add this data
which contains empty lines to the testcases?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message