hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-558) Distinct followed by a Join results in Invalid size 0 for a tuple error
Date Fri, 02 Jan 2009 23:59:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pradeep Kamath updated PIG-558:
-------------------------------

    Assignee: Pradeep Kamath
      Status: Patch Available  (was: Open)

The issue was that table 1 has only one column which is also the join key. Due to a recent
optimization wherein parts of the value which are in the key would be omitted, this results
in an empty tuple being sent as the value from POLocalRearrange.  The POPackage following
the POLocalRearrange would look at metadata stored in itself to figure out how to construct
the value out of the key if necessary. However when the POLocalRearrange is in a reduce and
the POPackage is in the next map, the POLocalRearrange output gets written to DFS in BinStorage
format resulting in a tuple of size 0 being written out. BinStorage while reading considers
a tuple of size 0 to be a fatal error.

Fix:
The patch fixes BinStorage to consider a tuple of size 0 to be a valid tuple which is reconstructed
as such. The POPackage then builds up the correct value from the key. The patch also has a
unit test to test this.

The unit test depends on certain functions introduced in MiniCluster and test/org/apache/pig/test/Util.java
as of the patch in PIG-580. If PIG-580 is not committed before this patch, then the "additional"
patch ("PIG-558-additional.patch") attached here should also be applied.

> Distinct followed by a Join results in Invalid size 0 for a tuple error
> -----------------------------------------------------------------------
>
>                 Key: PIG-558
>                 URL: https://issues.apache.org/jira/browse/PIG-558
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: table1, table2
>
>
> The following Pig script does a right outer join after the DISTINCT.
> {code}
> nonuniqtable1 = LOAD 'table1' AS (f1:chararray);
> table1 = DISTINCT nonuniqtable1;
> table2 = LOAD 'table2' AS (f1:chararray, f2:int);
> temp = COGROUP table1 BY f1 INNER, table2 BY f1;
> DESCRIBE temp;
> explain temp;
> dump temp;
> {code}
> ========================================================================================================
> It results in the following error. This is true for other join types as well.
> ========================================================================================================
> java.io.IOException: Invalid size 0 for a tuple
> 	at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:57)
> 	at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:62)
> 	at org.apache.pig.builtin.BinStorage.getNext(BinStorage.java:90)
> 	at org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:103)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:157)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:133)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ========================================================================================================

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message