crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: PGroupedTable.combineValues should allow composition with PGroupedTable.mapValues
Date Tue, 17 May 2016 13:20:10 GMT
Stan thanks for the question.

If I'm understanding your question correctly, you would like to have code
like the following:

PTable<String, Set<String>> values = ...;
PTable<String, Set<String>> values2 = ...;

PTable<String, Integer> countedValues =
values.union(values2).groupByKey().combineValues(combiner).parallelDo(sizeCalculator,
ptype);

static class SizeCalculator extends MapFn<Pair<String, Set<String>>,
Pair<String, Integer>>{
...
}

static class Combiner extends MapFn<Pair<String, Iterable<Set<String>>>,
Pair<String, Set<String>>>{
...
}

I don't think you want the combiner to actually change the type to integer
because that would make the operation non-commutative which is a
requirement of a combiner (both in Crunch and normal MR).  Also should
clarify that Crunch will still do all of the work defined above inside a
single MapReduce job.  Crunch will do multiple DoFns inside the map or
reduce task depending on how it is most efficient to do the work similar to
Spark.


On Tue, May 17, 2016 at 1:01 AM, Stan Rosenberg <stan.rosenberg@gmail.com>
wrote:

> Hi,
>
> I couldn't seem to find sufficient documentation or examples of using
> combiners in non-trivial ways. Say my map emits values of type Set<String>;
> after grouping by key I want to emit the _size_ of the union of the sets of
> strings, i.e., size(union(Iterable<Set<String>>))  Thus, the combiner's
> type is Iterable<Set<String>> -> Set<String> but the reduce's type
is
> Iterable<Set<String>> -> Int
>
> To my knowledge, both MapReduce and Spark allow a combiner to have a
> result type different from reducer's.  However, unless I missed something,
> this is not expressible in Crunch.  Shouldn't PGroupedTable.combineValues
> return PGroupedTable to allow composition with mapValues?
>
> Thanks,
>
> stan
>

Mime
View raw message