crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: How to count MemPipeline?
Date Wed, 11 Mar 2015 13:29:38 GMT
Kristoffer,
  What PTypeFamily are you using for the "tableOf(strings(), strings())"?
It looks like you are using Writables.strings() up above but looks like you
are using static imports down below so wasn't sure if you had switched to
AvroTypeFamily instead.

Micah

On Wed, Mar 11, 2015 at 8:17 AM, Kristoffer Sjögren <stoffe@gmail.com>
wrote:

> Hi
>
> I'm trying to count the occurrence of a key in a grouped table. But the
> following code snippet [1] fails [2] when calling count() on a MemPipeline
> in version 0.8.2+71-cdh4.6.0.
>
> Am I using the API incorrectly or is this a bug?
>
> Cheers,
> -Kristoffer
>
> [1]
>
> PCollection<String> lines =
> MemPipeline.typedCollectionOf(Writables.strings(), input);
> lines.parallelDo(new DoFn<String, Pair<String, String>>() {
>   @Override
>   public void process(String input, Emitter<Pair<String, String>> emitter)
> {
>     emitter.emit(Pair.of(input, input));
>   }
> }, tableOf(strings(), strings()))
> .groupByKey()
> .count();
>
> [2]
>
> java.lang.IllegalArgumentException: Key type must be of class WritableType
> at org.apache.crunch.types.writable.Writables.tableOf(Writables.java:351)
> at
> org.apache.crunch.types.writable.WritableTypeFamily.tableOf(WritableTypeFamily.java:95)
> at org.apache.crunch.lib.Aggregate.count(Aggregate.java:65)
> at org.apache.crunch.lib.Aggregate.count(Aggregate.java:56)
> at
> org.apache.crunch.impl.mem.collect.MemCollection.count(MemCollection.java:230)
> at
> mapred.functions.FunctionsTest.testGroupActionCount(FunctionsTest.java:79)
>

Mime
View raw message