hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-537) Failure in Hadoop map collect stage due to type mismatch in the keys used in cogroup
Date Tue, 18 Nov 2008 04:27:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Viraj Bhat updated PIG-537:
---------------------------

    Attachment: explain_aliasC.log

Explain output for alias C

> Failure in Hadoop map collect stage due to type mismatch in the keys used in cogroup
> ------------------------------------------------------------------------------------
>
>                 Key: PIG-537
>                 URL: https://issues.apache.org/jira/browse/PIG-537
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: explain_aliasC.log
>
>
> Consider the following pig query, which demonstrates various problems during the Logical
Plan creation and the subsequent execution of the M/R job. In this query we do two cogroups,
one between A and B to generate an alias ABtemptable. Then we again cogroup A with ABtemptable
based on marks which was read in as an int. 
> ==================================================================================
> {code}
> A = load 'mymarks.txt' as (username:chararray,marks:int);
> B = load 'mygrades.txt' as (username:chararray,grade:chararray);
> ABtemp = cogroup A by username, B  by username;
> ABtemptable = foreach ABtemp generate
>            group as username,
>            flatten(A.marks) as newmarks;
> --describe ABtemptable;
> C = cogroup A by marks, ABtemptable by newmarks;
> --describe C;
> explain C;
> dump C;
> {code}
> ==================================================================================
> The schema for C and ABtemptable which pig reports:
> ==================================================================================
> {code}describe ABtemptable;{code} ABtemptable: {username: chararray,newmarks: int}
> {code}describe C;{code} C: {group: int,A: {username: chararray,marks: int},ABtemptable:
{username: chararray,newmarks: int}}
> ==================================================================================
> If you run the above query you get the following error:
> ==================================================================================
> 2008-11-18 03:57:14,372 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher
- Error message from task (map) task_200810152105_0156_m_000000java.io.IOException: Type mismatch
in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableIntWritable
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:97)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:82)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ==================================================================================
> Looking at the {code}explain C;{code} output, you see that newmarks has become a chararray
(surprising!!)
> ==================================================================================
> |---CoGroup viraj-Tue Nov 18 03:49:42 UTC 2008-25 Schema: {group: Unknown,{username:
bytearray,marks: int},ABtemptable: {username: chararray,newmarks: chararray}} Type: bag
>     |   |
>     |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-23 Projections: [1] Overloaded: false
FieldSchema: marks: int Type: int
>     |   Input: SplitOutput[null] viraj-Tue Nov 18 03:49:42 UTC 2008-29
>     |   |
>     |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-24 Projections: [1] Overloaded: false
FieldSchema: newmarks: chararray Type: chararray
>     |   Input: ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22
>     |---ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 Schema: {username: chararray,newmarks:
chararray} Type: bag
> ==================================================================================
> In Summary this script demonstrates the following problems:
> 1) Logical Plan creation
> 2) When cogrouping with fields of different types which results in group unknown is not
caught during compile phase.
> Additionally I am enclosing the explain output of alias C and testfiles to run the script
which is on this jira!!
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message