flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: Clarification on state backend parameters
Date Sat, 04 Feb 2017 21:46:45 GMT
Thanks for the clarification!

On Sat, Feb 4, 2017 at 3:34 AM, Stefan Richter <s.richter@data-artisans.com>

> If you have configured RocksDB as backend, Flink typically has multiple
> RocksDB instances per job - one for each parallel operator instance with
> keyed state. Those RocksDB instances live local to their corresponding
> operator instances. Parameter state.backend.rocksdb.checkpointdir
> configures the working directory of those instances. Working directories
> are used to store files during the operation of RocksDB, therefore it
> should mainly allow for fast access, e.g. be resident on a local disk
> filesystem. In contrast to that, state.backend.fs.checkpointdir specifies
> where checkpoint data is stored. Think of this as a backup directory, where
> the most important properties are availability and fault tolerance. This
> would typically be located on a distributed file system like HDFS that is
> also accessible from each node, so that operators can be recovered on
> different machines in case of machine failures.
> Am 03.02.2017 um 20:55 schrieb Mohit Anchlia <mohitanchlia@gmail.com>:
> I thought rocksdb is used to as a store backend. If that is the case then
> why would are there 2 configuration parameter? Or in other words what is
> the behavior if both state.backend.fs.checkpointdir and
> state.backend.rocksdb is set?
> On Fri, Feb 3, 2017 at 1:47 AM, Stefan Richter <
> s.richter@data-artisans.com> wrote:
>> Hi,
>> the purpose of the configuration parameter is described in the
>> documentation under https://ci.apache.org/pr
>> ojects/flink/flink-docs-release-1.2/setup/config.html. In a nutshell,
>> state.checkpoints.dir contains the (small) meta data files for checkpoints,
>> which typically contains pointers to the files which contain the actual
>> state snapshot data. The state.backend.fs.checkpointdir is the directory
>> into which the actual state from the backends is written. Finally,
>> state.backend.rocksdb.checkpointdir is a poorly named key for the
>> directory of the RocksDB instance data and has in fact nothing to do with
>> checkpoints.
>> Best,
>> Stefan
>> Am 03.02.2017 um 01:45 schrieb Mohit Anchlia <mohitanchlia@gmail.com>:
>> Trying to understand these 3 parameters:
>> state.backend
>> state.backend.fs.checkpointdir
>> state.backend.rocksdb.checkpointdir
>> state.checkpoints.dir
>> As I understand stream of data and the state of operators are 2 different
>> concepts and that both need to be checkpointed. I am bit confused about the
>> purpose of these parameters and their applicability.

View raw message