beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <>
Subject [jira] [Commented] (BEAM-12) Apply GroupByKey transforms on PCollection of normal type other than KV
Date Sun, 14 Feb 2016 17:38:18 GMT


Daniel Halperin commented on BEAM-12:


I'm assuming that `ExtractFn` is some sort of function that turns `T` -> `KV<K, V>`.

Can you clarify what's awkward about Frances' idea of using a `ParDo`, or the more-Java-8-y

p.apply(MapElements.via(ExtractFn())) // or .via(lambda T: KV).withOutputTypeDescriptor(K,V)

to do the first step?

In general, we prefer modularity in the SDK. We want you to be able to make little reusable
bits of code here and there. Also, given that there is already a "1-liner" to turn a `T` into
` KV` using a `lambda`, we don't really want to add a second way. It only complicates SDKs
to have many ways of doing the exact same thing.

Note that if we added such a shortcut to `GroupByKey`, we really ought to also add it to `CoGroupByKey`
and `Combine.PerKey`? The latter two functions have significantly more complicated semantics
than GBK, and they may take a non-zero number of arguments. So either we "double" the number
of ways to construct these methods and users also have to worry about parameter order, or
we provide an inconsistent API surface -- neither of which is IMO good for our users -- or
we stick with the behavior now that focuses on modularity.

I'd re-emphasize Frances' point: anywhere the extra 1-liner seems to complicate your code,
you can add a composite PTransform that does exactly what you want: wrap GBK with your ExtractFn(),
and use it that way.


> Apply GroupByKey transforms on PCollection of normal type other than KV
> -----------------------------------------------------------------------
>                 Key: BEAM-12
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: bakeypan
>            Assignee: Frances Perry
>            Priority: Trivial
> Now the GroupByKey transforms can only apply on PCollection<KV<K,V>>.So I
have to transform PCollection<T> to PCollection<KV<K,V>> before I want to
apply GroupByKey.
> I think we can do better by apply GroupByKey on normal type of PCollection other than
KV.And user can offer one custome extract key function or we can offer default extract key
function.Just like this:
> PCollection<T> input = ...
> PCollection<KV<K,Iterable<V>>> result = input.apply(GroupByKey.<K,
V>create(new ExtractFn()));

This message was sent by Atlassian JIRA

View raw message