flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Accessing StateBackend snapshots outside of Flink
Date Mon, 18 Apr 2016 07:23:08 GMT
Hi,
key refers to the key extracted by your KeySelector. Right now, for every
named state (i.e. the name in the StateDescriptor) there is a an isolated
RocksDB instance.

Cheers,
Aljoscha

On Sat, 16 Apr 2016 at 15:43 Igor Berman <igor.berman@gmail.com> wrote:

> thanks a lot for the info, seems not too complex
> I'll try to write simple tool to read this state.
>
> Aljoscha, does the key reflects unique id of operator in some way? Or key
> is just a "name" that passed to ValueStateDescriptor.
>
> thanks in advance
>
>
> On 15 April 2016 at 15:10, Stephan Ewen <sewen@apache.org> wrote:
>
>> One thing to add is that you can always trigger a persistent checkpoint
>> via the "savepoints" feature:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html
>>
>>
>>
>> On Fri, Apr 15, 2016 at 10:24 AM, Aljoscha Krettek <aljoscha@apache.org>
>> wrote:
>>
>>> Hi,
>>> for RocksDB we simply use a TypeSerializer to serialize the key and
>>> value to a byte[] array and store that in RocksDB. For a ListState, we
>>> serialize the individual elements using a TypeSerializer and store them in
>>> a comma-separated list in RocksDB. The snapshots of RocksDB that we write
>>> to HDFS are regular backups of a RocksDB database, as described here:
>>> https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB%3F. You
>>> should be possible to read them from HDFS and restore them to a RocksDB
>>> data base as described in the linked documentation.
>>>
>>> tl;dr As long as you know the type of values stored in the state you
>>> should be able to read them from RocksDB and deserialize the values using
>>> TypeSerializer.
>>>
>>> One more bit of information: Internally the state is keyed by (key,
>>> namespace) -> value where namespace can be an arbitrary type that has a
>>> TypeSerializer. We use this to store window state that is both local to key
>>> and the current window. For state that you store in a user-defined function
>>> the namespace will always be null and that will be serialized by a
>>> VoidSerializer that simply always writes a "0" byte.
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Fri, 15 Apr 2016 at 00:18 igor.berman <igor.berman@gmail.com> wrote:
>>>
>>>> Hi,
>>>> we are evaluating Flink for new solution and several people raised
>>>> concern
>>>> of coupling too much to Flink -
>>>> 1. we understand that if we want to get full fault tolerance and best
>>>> performance we'll need to use Flink managed state(probably RocksDB
>>>> backend
>>>> due to volume of state)
>>>> 2. but then if we latter find that Flink doesn't answer our needs(for
>>>> any
>>>> reason) - we'll need to extract this state in some way(since it's the
>>>> only
>>>> source of consistent state)
>>>> In general I'd like to be able to take snapshot of backend and try to
>>>> read
>>>> it...do you think it's will be trivial task?
>>>> say If I'm holding list state per partitioned key, would it be easy to
>>>> take
>>>> RocksDb file and open it?
>>>>
>>>> any thoughts regarding how can I convince people in our team?
>>>>
>>>> thanks in advance!
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Accessing-StateBackend-snapshots-outside-of-Flink-tp6116.html
>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>> archive at Nabble.com.
>>>>
>>>
>>
>

Mime
View raw message