beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bakeypan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-12) Apply GroupByKey transforms on PCollection of normal type other than KV
Date Sun, 14 Feb 2016 16:16:18 GMT

    [ https://issues.apache.org/jira/browse/BEAM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146629#comment-15146629
] 

bakeypan commented on BEAM-12:
------------------------------

My point is that maybe we can omit the "ParDo.of(new ExtractFn())" step for convenient by
apply GroupByKey on the PCollection.
We just pass the ExtractFn to the GroupByKey.
For example we got PCollection<String> input,now we want to group by it by prefix,now
we have to write like :
PCollection<KV<String,Iterable<String>>> result = input.apply(ParDo.of(new
ExtractFn())).apply(GroupByKey.<String, String>create());
as your code before.
But if the GroupByKey can accept extract key function,we just write code like these:
PCollection<KV<String,Iterable<String>>> result = input.apply(GroupByKey.<String,
String>create(new ExtractFn))
Need not transform the PCollection<"NotKVType"> to PCollection<KV<K, V>>
by apply one more ParDo.
What do you think?


> Apply GroupByKey transforms on PCollection of normal type other than KV
> -----------------------------------------------------------------------
>
>                 Key: BEAM-12
>                 URL: https://issues.apache.org/jira/browse/BEAM-12
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: bakeypan
>            Assignee: Frances Perry
>            Priority: Trivial
>
> Now the GroupByKey transforms can only apply on PCollection<KV<K,V>>.So I
have to transform PCollection<T> to PCollection<KV<K,V>> before I want to
apply GroupByKey.
> I think we can do better by apply GroupByKey on normal type of PCollection other than
KV.And user can offer one custome extract key function or we can offer default extract key
function.Just like this:
> PCollection<T> input = ...
> PCollection<KV<K,Iterable<V>>> result = input.apply(GroupByKey.<K,
V>create(new ExtractFn()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message