flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-553) Add getGroupKey() method to group-at-time operators
Date Sat, 01 Nov 2014 19:46:33 GMT

    [ https://issues.apache.org/jira/browse/FLINK-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193412#comment-14193412

Stephan Ewen commented on FLINK-553:

Here is the gist of the discussions we had on that issue in the past:

The most tricky part of this issue is how to represent the key. Since keys in Flink can be
more than one field (in tuples or POJOs), the return type of that function is not easy to
  - Either it is always a tuple with all keys, making it again a bit inconvenient for the
standard case
  - Or we only return one field. In case of composite keys (more than one key field), we throw
an exception or we only return the first key field

There are two ways of implementing this issue:
  - Offer a "getKey()" method in the RichGroupReduceFunction and in the RichReduceFunction.
We need to do this in the rich function variants, because the functions themselves need to
be SAM interfaces (single abstract method).
  - Have a version of the GroupReduceFunction with an extended signature like "void reduceGroup(K
key, Iterable<V> records, Collector<O> out)". The iterable would still return
the whole records, but the key would be added extra, for convenience.

The first version has the disadvantage that programmers need to pick the rich function versions,
which prevents Java8 lambdas and is more clumsy in Scala. The second variant is a bit more
effort, but we have made this variations of the functions before (see for example FlatJoinFunction
vs JoinFunction)

> Add getGroupKey() method to group-at-time operators
> ---------------------------------------------------
>                 Key: FLINK-553
>                 URL: https://issues.apache.org/jira/browse/FLINK-553
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
> Group-at-a-time operators (Reduce & CoGroup) work on multiple records in one UDF
call. Often these UDFs need to access the key that is common to all records of a group.
> We could add a function to set a the key of a group before the UDF is called (``setGroupKey()``)
and a function to get the key (``getGroupKey()``) that can be called from the UDF.
> What do you think about this?
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/553
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, scala api, user satisfaction, 
> Assignee: [aalexandrov|https://github.com/aalexandrov]
> Created at: Mon Mar 10 22:28:27 CET 2014
> State: open

This message was sent by Atlassian JIRA

View raw message