flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyf...@apache.org>
Subject Periodic full stream aggregations
Date Mon, 20 Apr 2015 20:32:07 GMT
Hey all,

I think we are missing a quite useful feature that could be implemented
(with some slight modifications) on top of the current windowing api.

We currently provide 2 ways of aggregating (or reducing) over streams:
doing a continuous aggregation and always output the aggregated value
(which cannot be done properly in parallel) or doing aggregation in a
window periodically.

What we don't have at the moment is periodic aggregations on the whole
stream. I would even go as far as to remove the continuous outputting
reduce/aggregate it and replace it with this version as this in return can
be done properly in parallel.

My suggestion would be that a call:

dataStream.reduce(..)
dataStream.sum(..)

would return a windowed data stream where the window is the whole record
history, and the user would need to define a trigger to get the actual
reduced values like:

dataStream.reduce(...).every(Time.of(4,sec)) to get the actual reduced
results.
dataStream.sum(...).every(...)

I think the current data stream reduce/aggregation is very confusing
without being practical for any normal use-case.

Also this would be a very api breaking change (but I would still make this
change as it is much more intuitive than the current behaviour) so I would
try to push it before the release if we can agree.

Cheers,
Gyula

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message