flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Flushing the result of a groupReduce to a Sink before all reduces complete
Date Wed, 26 Oct 2016 17:57:18 GMT
Hi Paul,

Flink pushes the results of operators (including GroupReduce) to the next
operator or sink as soon as they are computed. So what you are asking for
is actually happening.
However, before the GroupReduceFunction can be applied, the whole data is
sorted in order to group the data. This step is usually more expensive than
applying the GroupReduceFunction. Therefore, it looks like the output is
batched.
Flink does only support sort-based grouping, however also hash-based
grouping would not help, because Flink would not know when to close a group
until all data is consumed.

Please let me know if you have further questions.

Best, Fabian


2016-10-26 19:07 GMT+02:00 Paul Wilson <paulalexwilson@gmail.com>:

> Hi,
>
> DataSet API
> Flink 1.1.3
>
> I have an application where I'd like to perform some mapping before
> batching the results and passing them to the sink. I'm performing a
> 'composite' key selection to group the items by their natural key as well
> as a batch (itemCount / batchSize). When I reduce the batches and pass them
> to the sink, the whole flow is waiting for all reduces to complete before
> passing them to sink.
>
> Is there some way that the results of a single group reduce can be passed
> to the sink before all reduces are complete?
>
> Hope that makes sense,
> Regards,
> Paul
>

Mime
View raw message