crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <dpo5...@gmail.com>
Subject Re: PGroupedTable.combineValues should allow composition with PGroupedTable.mapValues
Date Tue, 17 May 2016 13:15:22 GMT
You can still do parallelDo on a PGroupedTable to allow it to map to a
different type.  Just would be new DoFn<Pair<Key, Set<String>>, Pair<Key,
Integer>>

On Tue, May 17, 2016 at 2:01 AM Stan Rosenberg <stan.rosenberg@gmail.com>
wrote:

> Hi,
>
> I couldn't seem to find sufficient documentation or examples of using
> combiners in non-trivial ways. Say my map emits values of type Set<String>;
> after grouping by key I want to emit the _size_ of the union of the sets of
> strings, i.e., size(union(Iterable<Set<String>>))  Thus, the combiner's
> type is Iterable<Set<String>> -> Set<String> but the reduce's type
is
> Iterable<Set<String>> -> Int
>
> To my knowledge, both MapReduce and Spark allow a combiner to have a
> result type different from reducer's.  However, unless I missed something,
> this is not expressible in Crunch.  Shouldn't PGroupedTable.combineValues
> return PGroupedTable to allow composition with mapValues?
>
> Thanks,
>
> stan
>

Mime
View raw message