crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject How to count MemPipeline?
Date Wed, 11 Mar 2015 13:17:44 GMT
Hi

I'm trying to count the occurrence of a key in a grouped table. But the
following code snippet [1] fails [2] when calling count() on a MemPipeline
in version 0.8.2+71-cdh4.6.0.

Am I using the API incorrectly or is this a bug?

Cheers,
-Kristoffer

[1]

PCollection<String> lines =
MemPipeline.typedCollectionOf(Writables.strings(), input);
lines.parallelDo(new DoFn<String, Pair<String, String>>() {
  @Override
  public void process(String input, Emitter<Pair<String, String>> emitter) {
    emitter.emit(Pair.of(input, input));
  }
}, tableOf(strings(), strings()))
.groupByKey()
.count();

[2]

java.lang.IllegalArgumentException: Key type must be of class WritableType
at org.apache.crunch.types.writable.Writables.tableOf(Writables.java:351)
at
org.apache.crunch.types.writable.WritableTypeFamily.tableOf(WritableTypeFamily.java:95)
at org.apache.crunch.lib.Aggregate.count(Aggregate.java:65)
at org.apache.crunch.lib.Aggregate.count(Aggregate.java:56)
at
org.apache.crunch.impl.mem.collect.MemCollection.count(MemCollection.java:230)
at
mapred.functions.FunctionsTest.testGroupActionCount(FunctionsTest.java:79)

Mime
View raw message