flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@sics.se>
Subject How to preserve KeyedDataStream
Date Tue, 03 Nov 2015 10:28:25 GMT

I want to do the following thing:
1. Split a Stream of incoming Logs by host address.
2. For each Key, create time based windows
3. Count the number of items in the window
4. Feed it into a statistical model that is maintained for each host

Since I don't have fields to sum upon, I use a (window) fold function to
count the number of elements in the window. (Maybe there is a better way to
do this, or it could be part of the primitives)
My problem is now that I get back a DataStream so the distribution by key
is lost. Is there a way to preserve the distribution by key? Currently I
only store the count of element in the windows so I cannot simple do byKey

I could fold into tuples that have the count and also contain the host
address but that feels clumsy.

Any hints are welcome.

cheers Martin

View raw message