hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-425) Split -> distinct or order -> cogroup fails
Date Wed, 17 Sep 2008 23:17:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632002#action_12632002
] 

Alan Gates commented on PIG-425:
--------------------------------

I found one significant downside to the approach of this patch.  Not moving the local rearrange
into the map removes the possibility of running the combiner.  So if you have a query like:

C = cogroup A, B;
D = foreach C flatten(A), (B);
E = group D by $0;
F = foreach E generate group, COUNT(D)

that count will not be done in the combiner now.   That seems like a significant downside.

> Split -> distinct or order -> cogroup fails
> -------------------------------------------
>
>                 Key: PIG-425
>                 URL: https://issues.apache.org/jira/browse/PIG-425
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: 425.patch
>
>
> A script like:
> {code}
> \a = load 'myfile' as (name:chararray, age:int, gpa:double);
> split a into a1 if age > 50, a2 if name < 'm';
> b2 = distinct a2;
> b1 = order a1 by name;
> c = cogroup b2 by name, b1 by name;
> d = foreach c generate flatten(group), COUNT($1), COUNT($2);
> store d into 'OUTPATH';
> {code}
> Will abort with the error:
> {code}
> 08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException:
org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
>     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
>     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
>     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
>     at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> {code}
> The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as
a cogroup and moving the localrearrnge out of the reduce and into the
> map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message