flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hequn Cheng <chenghe...@gmail.com>
Subject Re: State Recovery when job fails and auto-recovers
Date Thu, 18 Oct 2018 01:15:31 GMT
Hi Sameer,

In case of a failure, the job will restarts the operators and resets them
to the latest successful checkpoint. So if you turn off checkpoints, all
data will be lost.
Generally speaking, snapshots are very light-weight and can be drawn
frequently without much impact on performance. If it do affect performance
of your job and you don't want to lose all of your state, you can try to
increase the checkpoint interval.

> // start a checkpoint every 600000 ms (10min)
> env.enableCheckpointing(600000);


Best, Hequn

On Thu, Oct 18, 2018 at 7:19 AM Sameer Wadkar <sameer@axiomine.com> wrote:

> Hi,
>
> We have a job which is using ValueState. We have turned off checkpoints.
> The state is backed by rocksdb which is backed by S3.
>
>  If the job fails for any exception (ex. Partitions not available or an
> occasional S3 404 error) and auto-recovers, is the entire state lost or
> does it continue from the last saved state. We see that the job has the
> same identifier. We don’t mind losing data during the small interval when
> the job is recovering. But because we are using ValueState as a custom
> global window to accumulate state for a key over a 3 hour window we don’t
> want to lose all of it.
>
> Checkpointing is not an option because it takes longer per checkpoint and
> the state is huge.
>
> Thanks,
> Sameer
>
> Sent from my iPhone

Mime
View raw message