flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Richter <s.rich...@data-artisans.com>
Subject Re: State Backend
Date Fri, 04 Aug 2017 08:45:02 GMT

if the question is, if there are certain requirements for the filesystem that you use with
the state backends, then I think there might be a small misconception. Currently, all state
backends in Flink operator local to the task, i.e. either in memory (e.g. FsStateBackend)
or also on the local file system (RocksDBStateBackend) of the machine that runs the task.
The choice of distributed file system does only affect checkpoints and savepoints, and should
not have a true impact on your job’s performance. It can of course have an impact on the
checkpoint and restore duration. 

Checkpoints/savepoints must be written to a stable store like HDFS, that offers fault tolerance.
Writes and reads for checkpoints are sequential bulk ops and (within reasonable bound) do
not care too much about latencies. It is more important that your stable store offers a useful
consistency model.


> Am 03.08.2017 um 16:45 schrieb Vijay Srinivasaraghavan <vijikarthi@yahoo.com>:
> Hello,
> I would like to know if we have any latency requirements for choosing appropriate state
> For example, if an HCFS implementation is used as Flink state backend (instead of stock
HDFS), are there any implications that one needs to know with respect to the performance?
> - Frequency of read/write operations, random vs sequential reads
> - Load/Usage pattern (Frequent small updates vs bulk operation)
> - RocksDB->HCFS (Is this kind of recommended option to mitigate some of the challenges
outlined above)
> - S3 Vs HDFS any performance numbers?
> Appreciate any inputs on this.
> Regards
> Vijay

View raw message