flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Richter <s.rich...@data-artisans.com>
Subject Re: RocksDB / checkpoint questions
Date Mon, 05 Feb 2018 10:31:54 GMT
Hi,

you are correct that RocksDB has a „working directory“ on local disk and checkpoints +
savepoints go to a distributed filesystem.

- if I have 3 TaskManager I should expect more or less (depending on how the tasks are balanced)
to find a third of my overall state stored on disk on each of this TaskManager node?

This question is not so much about RocksDB, but more about Flink’s keyBy partitioning, i.e.
how work is distributed between the parallel instances of an operator, and the answer is that
it will apply hash partitioning based on your event keys to distribute the keys (and their
state) between your 3 nodes. If your key space is very skewed or there are heavy hitter keys
with much larger state than most other keys, this can lead to some imbalances. If your keys
are not skewed and have similar state size, every node should have roughly the same state
size.

- if the local node/disk fails I will get the state back from the distributed disk and things
will start again and all is fine. However what happens if the distributed disk fails? Will
Flink continue processing waiting for me to mount a new distributed disk? Or will it stop?
May I lose data/reprocess things under that condition? 

Starting from Flink 1.5, this is configurable, please see https://issues.apache.org/jira/browse/FLINK-4809
<https://issues.apache.org/jira/browse/FLINK-4809> and https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/checkpointing.html
<https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/checkpointing.html>
in section „fail/continue task on checkpoint errors“. If you tolerate checkpoint failures,
you will not lose data: if your job fails, it can recover from the latest successful checkpoint
once your DFS is again available If the job does not fail, it will eventually make another
checkpoint once DFS is back. If you do not tolerate checkpoint failures, your job will simply
fail and restart from the last successful checkpoint and recover once DFS is back.

Best,
Stefan

> Am 03.02.2018 um 17:45 schrieb Christophe Jolif <cjolif@gmail.com>:
> 
> Thanks for sharing Kien. Sounds like the logical behavior but good to hear it is confirmed
by your experience.
> 
> --
> Christophe
> 
> On Sat, Feb 3, 2018 at 7:25 AM, Kien Truong <duckientruong@gmail.com <mailto:duckientruong@gmail.com>>
wrote:
> 
> 
> Sent from TypeApp <http://www.typeapp.com/r?b=11979>
> On Feb 3, 2018, at 10:48, Kien Truong <duckientruong@gmail.com <mailto:duckientruong@gmail.com>>
wrote:
> Hi, 
> Speaking from my experience, if the distributed disk fail, the checkpoint will fail as
well, but the job will continue running. The checkpoint scheduler will keep running, so the
first scheduled checkpoint after you repair your disk should succeed. 
> 
> Of course, if you also write to the distributed disk inside your job, then your job may
crash too, but this is unrelated to the checkpoint process. 
> 
> Best regards, 
> Kien 
> 
> Sent from TypeApp <http://www.typeapp.com/r?b=11979>
> On Feb 2, 2018, at 23:30, Christophe Jolif < cjolif@gmail.com <mailto:cjolif@gmail.com>>
wrote:
> If I understand well RocksDB is using two disk, the Task Manager local disk for "local
storage" of the state and the distributed disk for checkpointing.
> 
> Two questions:
> 
> - if I have 3 TaskManager I should expect more or less (depending on how the tasks are
balanced) to find a third of my overall state stored on disk on each of this TaskManager node?
> 
> - if the local node/disk fails I will get the state back from the distributed disk and
things will start again and all is fine. However what happens if the distributed disk fails?
Will Flink continue processing waiting for me to mount a new distributed disk? Or will it
stop? May I lose data/reprocess things under that condition? 
> 
> -- 
> Christophe Jolif
> 


Mime
View raw message