flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Berman <igor.ber...@gmail.com>
Subject Re: Accessing StateBackend snapshots outside of Flink
Date Sat, 16 Apr 2016 13:43:15 GMT
thanks a lot for the info, seems not too complex
I'll try to write simple tool to read this state.

Aljoscha, does the key reflects unique id of operator in some way? Or key
is just a "name" that passed to ValueStateDescriptor.

thanks in advance

On 15 April 2016 at 15:10, Stephan Ewen <sewen@apache.org> wrote:

> One thing to add is that you can always trigger a persistent checkpoint
> via the "savepoints" feature:
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html
> On Fri, Apr 15, 2016 at 10:24 AM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>> Hi,
>> for RocksDB we simply use a TypeSerializer to serialize the key and value
>> to a byte[] array and store that in RocksDB. For a ListState, we serialize
>> the individual elements using a TypeSerializer and store them in a
>> comma-separated list in RocksDB. The snapshots of RocksDB that we write to
>> HDFS are regular backups of a RocksDB database, as described here:
>> https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB%3F. You
>> should be possible to read them from HDFS and restore them to a RocksDB
>> data base as described in the linked documentation.
>> tl;dr As long as you know the type of values stored in the state you
>> should be able to read them from RocksDB and deserialize the values using
>> TypeSerializer.
>> One more bit of information: Internally the state is keyed by (key,
>> namespace) -> value where namespace can be an arbitrary type that has a
>> TypeSerializer. We use this to store window state that is both local to key
>> and the current window. For state that you store in a user-defined function
>> the namespace will always be null and that will be serialized by a
>> VoidSerializer that simply always writes a "0" byte.
>> Cheers,
>> Aljoscha
>> On Fri, 15 Apr 2016 at 00:18 igor.berman <igor.berman@gmail.com> wrote:
>>> Hi,
>>> we are evaluating Flink for new solution and several people raised
>>> concern
>>> of coupling too much to Flink -
>>> 1. we understand that if we want to get full fault tolerance and best
>>> performance we'll need to use Flink managed state(probably RocksDB
>>> backend
>>> due to volume of state)
>>> 2. but then if we latter find that Flink doesn't answer our needs(for any
>>> reason) - we'll need to extract this state in some way(since it's the
>>> only
>>> source of consistent state)
>>> In general I'd like to be able to take snapshot of backend and try to
>>> read
>>> it...do you think it's will be trivial task?
>>> say If I'm holding list state per partitioned key, would it be easy to
>>> take
>>> RocksDb file and open it?
>>> any thoughts regarding how can I convince people in our team?
>>> thanks in advance!
>>> --
>>> View this message in context:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Accessing-StateBackend-snapshots-outside-of-Flink-tp6116.html
>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>> archive at Nabble.com.

View raw message