flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elias Levy <fearsome.lucid...@gmail.com>
Subject Re: Kafka producer sink message loss?
Date Tue, 07 Jun 2016 14:52:34 GMT
On Tue, Jun 7, 2016 at 4:52 AM, Stephan Ewen <sewen@apache.org> wrote:

> The concern you raised about the sink being synchronous is exactly what my
> last suggestion should address:
>
> The internal state backends can return a handle that can do the sync in a
> background thread. The sink would continue processing messages, and the
> checkpoint would only be acknowledged after the background sync did
> complete.
> We should allow user code to return such a handle as well.
>

Sorry.  Apparently I hadn't had enough coffee and completely missed the
last paragraph of your response.  The async solution you propose seems
ideal.

What message ordering guarantees are you worried about?

I don't think you can do much about guaranteeing message ordering within
Kafka in case of failure, and you'll replay some messages.  And there isn't
any guarantee if you are writing to a Kafka topic with multiple partitions
from multiple sinks using a message key distinct from the key you used in a
keyBy in Flink, as you'll be writing from multiple sink instances in
parallel in what is essentially a shuffle.  It would seem the only ordering
guarantee is if you write from a sink into a Kafka topic using a message
key that is the same as the key used in a keyBy in Flink, and even that
will be violated during a failure and replay by the sink.

Mime
View raw message