flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishnu Viswanath <vishnu.viswanat...@gmail.com>
Subject Re: Checkpointing very large state in RocksDB?
Date Tue, 05 Jul 2016 19:07:51 GMT
Hi,

Is there any other disadvantage of using fullyAsyncSnapshot, other than
being slower. And would the slowness really matter since it is async
anyways?

Thanks and Regards,
Vishnu Viswanath,

On Thu, Jun 30, 2016 at 8:07 AM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Hi,
> are you taking about *enableFullyAsyncSnapshots()* in the RocksDB
> backend. If not, there is this switch that is described in the JavaDoc:
>
> /**
> * Enables fully asynchronous snapshotting of the partitioned state held in
> RocksDB.
> *
> * <p>By default, this is disabled. This means that RocksDB state is copied
> in a synchronous
> * step, during which normal processing of elements pauses, followed by an
> asynchronous step
> * of copying the RocksDB backup to the final checkpoint location. Fully
> asynchronous
> * snapshots take longer (linear time requirement with respect to number of
> unique keys)
> * but normal processing of elements is not paused.
> */
> public void enableFullyAsyncSnapshots()
>
> This also describes the implications on checkpointing time but please let
> me know if I should provide more details. We should probably also add more
> description to the documentation for this.
>
> Cheers,
> Aljoscha
>
> On Wed, 29 Jun 2016 at 23:04 Daniel Li <danielli90@gmail.com> wrote:
>
>> When RocksDB holds a very large state, is there a concern over the time
>> takes in checkpointing the RocksDB data to HDFS? Is asynchronous
>> checkpointing a recommended practice here?
>>
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/state_backends.html
>>
>> "The RocksDBStateBackend holds in-flight data in a RocksDB
>> <http://rocksdb.org/> data base that is (per default) stored in the
>> TaskManager data directories. Upon checkpointing, the whole RocksDB data
>> base will be checkpointed into the configured file system and directory.
>> Minimal metadata is stored in the JobManager’s memory (or, in
>> high-availability mode, in the metadata checkpoint).
>>
>> The RocksDBStateBackend is encouraged for:
>>
>>    - Jobs with very large state, long windows, large key/value states.
>>    - All high-availability setups."
>>
>>
>> thx
>> Daniel
>>
>

Mime
View raw message