crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-129) Cache the Iterable values for each key when a groupByKey op has multiple children
Date Thu, 11 Apr 2013 14:33:16 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628972#comment-13628972
] 

Gabriel Reid commented on CRUNCH-129:
-------------------------------------

[~joshwills] are these both (i.e. the title and the description of the issue) both talking
about the same thing? It seems like the ClassCastException in the description is more of a
planner (?) issue, whereas the caching of the iterables for multiple children is more of an
execution issue. 

Or is the ClassCastException just covering up the real iterable issue that would come up if
the code could get to the point of actually using the iterable?
                
> Cache the Iterable values for each key when a groupByKey op has multiple children
> ---------------------------------------------------------------------------------
>
>                 Key: CRUNCH-129
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-129
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Jonathan Natkins
>
> Given a simple Avro pipeline like this:
>     PGroupedTable<String, MyAvroObject> processedData = data.parallelDo(new DoFn<String,
Pair<String, MyAvroObject>>() {
>       public void process(String line, Emitter<Pair<String, MyAvroObject>>
emitter) {
>         String key = getKey(line);
>         MyAvroObject value = convertToAvroObject(line);
>         emitter.emit(Pair.of(key, value));
>       }
>     }, Avros.tableOf(Avros.strings(), Avros.specifics(MyAvroObject.class)))
>     .groupByKey(3);
>     PTable<MyAvroGroup, Pair<String, Iterable<MyAvroObject>>> groupedData
=
>         processedData.by(new MapFn<Pair<String, Iterable<MyAvroObject>>,
MyAvroGroup>() {
>             @Override
>             public MyAvroGroup map(Pair<String, Iterable<MyAvroObject>> input)
{
>               MyAvroGroup group = new MyAvroGroup();
>               group.objects = Lists.<MyAvroObject>newArrayList();
>              
>               for (MyAvroObject obj : input.second()) {
>                 group.objects.add(obj);
>               }
>              
>               return group;
>             }
>           },
>           Avros.specifics(MyAvroGroup.class));
> An exception is thrown when the by() code is run:
> 12/12/10 14:11:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.ClassCastException: org.apache.crunch.types.avro.AvroGroupedTableType
cannot be cast to org.apache.crunch.types.avro.AvroType
>     at org.apache.crunch.types.avro.Avros.tableOf(Avros.java:608)
>     at org.apache.crunch.types.avro.AvroTypeFamily.tableOf(AvroTypeFamily.java:135)
>     at org.apache.crunch.impl.mem.collect.MemCollection.by(MemCollection.java:222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message