flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Subject Re: Question about Checkpoint
Date Tue, 27 Jun 2017 05:30:25 GMT
Hi Desheng,

Welcome to the community!

What you’re asking alludes the question: How does Flink support end-to-end (from external
source to external sink, e.g. Kafka to database) exactly-once delivery?
Whether or not that is supported depends on the guarantees of the source and sink and how
they work with Flink’s checkpointing; see the overview of the guarantees here [1].

I think you already understand how sources work with checkpoints for exactly-once.
For sinks to be exactly-once, the external system needs to be able to participate in the checkpointing
mechanism, so it depends on what the sink is.
The participation is usually in some form of transaction, which is only committed once a job's
checkpoint is fully completed.
If the sink doesn’t support this, you can also achieve effective end-to-end exactly-once
by making your database writes / updates idempotent, so that replaying the stream since t1
does not affect the end results.


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/guarantees.html

On 27 June 2017 at 11:36:00 AM, ZalaCheung (gzzhangdesheng@corp.netease.com) wrote:

Hi Flink Community,

I am new to Flink and now looking at checkpoint of Flink.

After reading the document, I am still confused. Here is scene:

I have a datastream finally flow to a database sink. I will update one of the field in database
based on the incomming stream. I have now complete a snapshot, say t1, and snapshot t2 is
on progress. After snapshot t1 complete, I update my database several times.  Then BEFORE
snapshot t2 complete, my task fail. 

Based on my understanding of checkpoint on Flink, I will recover from snapshot t1 and re-execute
the Streams between t1 and t2. But I've already update my database several times after t1.
Does flink deal with this issue?

Desheng Zhang
E-mail: gzzhangdesheng@corp.netease.com;

View raw message