hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization
Date Thu, 26 Nov 2009 07:40:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782787#action_12782787
] 

Ankur commented on PIG-1108:
----------------------------

In my test run on 0.6.0 branch, disabling MQ did not work. Pig client logs showed that MQ
was still kicking in and the mappers failed with the same error message as in description.
It will be good if we can add few points about "SecondaryKey" here - http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an M/R boundary,
PIG sets the map-output key type incorrectly resulting in the following error:-
> java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText,
recieved org.apache.pig.impl.io.NullableTuple
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> 	at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message