flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Mariano Guerra <mari...@event-fabric.com>
Subject Re: emit a single Map<String, T> per window
Date Fri, 02 Sep 2016 16:26:34 GMT
On Fri, Sep 2, 2016 at 5:24 PM, Aljoscha Krettek <aljoscha@apache.org>

> Hi,
> from this I would expect to get as many HashMaps as you have keys. The
> winFunction is also executed per-key so it cannot combine the HashMaps of
> all keys.
> Does this describe the behavior that you're seeing?

yes, it's the behaviour I'm seeing, I'm looking for a way to merge those
HashMaps from the same window into a single one, I can't find how.

> Cheers,
> Aljoscha
> On Fri, 2 Sep 2016 at 17:37 Luis Mariano Guerra <mariano@event-fabric.com>
> wrote:
>> hi!
>> I'm trying to collect some metrics by key per window and emiting the full
>> result at the end of the window to kafka, I started with a simple count by
>> key to test it but my requirements are a little more complex than that.
>> what I want to do is to fold the stream events as they come and then at
>> the end of the window merge them together and emit a singe result, I don't
>> want to accumulate all the events and calculate at the end of the window,
>> from my understanding of fold in other languages/libraries, this would be
>> what I need, using it via apply(stateIn, foldFun, windowFun) but it's not
>> working:
>> the basic is:
>>     input
>>                 .flatMap(new LineSplitter())
>>                 .keyBy(0)
>>                 .timeWindow(Time.of(5, TimeUnit.SECONDS))
>>                 .apply(new HashMap<String, Integer>(), foldFunction,
>> winFunction);
>> where foldFunction accumulates by key and winFunction iterate over the
>> hasmaps and merges them into a single result hashmap and emits that one at
>> the end.
>> this emits many one-key hash maps instead of only one with all the keys,
>> I tried setting setParallelism(1) in multiple places but still doesn't
>> work. More confusingly, in one run it emited a single map but after I ran
>> it again it went back to the previous behavior.
>> what I'm doing wrong? is there any other approach?
>> I can provide the implementation of foldFunction and winFunction if
>> required but I think it doesn't change much.
>> Reading the source code I see:
>>     Applies the given window function to each window. The window function
>> is called for each evaluation of the window for each key individually. The
>> output of the window function is interpreted as a regular non-windowed
>> stream.
>> emphasis on " for each key individually", the return type of apply is
>> SingleOutputStreamOperator which doesn't provide many operations to group
>> the emited values.
>> thanks in advance.

View raw message