flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Kruse (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1043) Alternative combine interface
Date Fri, 08 Aug 2014 13:38:12 GMT
Sebastian Kruse created FLINK-1043:
--------------------------------------

             Summary: Alternative combine interface
                 Key: FLINK-1043
                 URL: https://issues.apache.org/jira/browse/FLINK-1043
             Project: Flink
          Issue Type: Wish
            Reporter: Sebastian Kruse


The GroupReduce allows for the following combination reduce step: {{InputType}} -> combine
-> {{InputType}} -> reduce -> {{OutputType}}. However, in the use cases I have stumbled
upon so far, it would make more sense to have the following steps: {{InputType}} -> {{OutputType}}
-> {{OutputType}}. It seems more intuitive to me to create a set of partial results with
the combiners that will finally merged within the reduce phase into an overall result. This
sometimes bars me from using a combiner.
I provide some examples for this intuition.

* WordCount
** If you want to implement WordCount with as a combinable GroupReduce, then you have to preprocess
all words as {{Tuple2<String, 1>}}. This could be avoided if the combination result
was not necessarily equal to the input type.
* create a Bloom filter
** Bloom filters can be created locally on each node and later on be merged into a final,
global Bloom filter, thus lend themselves for a combine-reduce proceeding. Doing this with
a combinable GroupReduce would currently require to turn each input element into a singleton
Bloom filter before the combination phase.

Therefore, it would be nice to have the ability to use {{OutputType}} as the combiner result.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message