flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh <jof...@gmail.com>
Subject Re: Accessing StateBackend snapshots outside of Flink
Date Mon, 13 Jun 2016 08:14:49 GMT
I have a follow-up question to this: since Flink doesn't support state
expiration at the moment (e.g. expiring state which hasn't been updated for
a certain amount of time), would it be possible to clear up old UDF states
- store a 'last_updated" timestamp in the state value
- periodically (e.g. monthly) go through all the state values in RocksDB,
deserialize them using TypeSerializer and read the "last_updated" property
- delete the key from RocksDB if the state's "last_updated" property is
over a month ago

Is there any reason this approach wouldn't work, or anything to be careful


On Mon, Apr 18, 2016 at 8:23 AM, Aljoscha Krettek <aljoscha@apache.org>

> Hi,
> key refers to the key extracted by your KeySelector. Right now, for every
> named state (i.e. the name in the StateDescriptor) there is a an isolated
> RocksDB instance.
> Cheers,
> Aljoscha
> On Sat, 16 Apr 2016 at 15:43 Igor Berman <igor.berman@gmail.com> wrote:
>> thanks a lot for the info, seems not too complex
>> I'll try to write simple tool to read this state.
>> Aljoscha, does the key reflects unique id of operator in some way? Or key
>> is just a "name" that passed to ValueStateDescriptor.
>> thanks in advance
>> On 15 April 2016 at 15:10, Stephan Ewen <sewen@apache.org> wrote:
>>> One thing to add is that you can always trigger a persistent checkpoint
>>> via the "savepoints" feature:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html
>>> On Fri, Apr 15, 2016 at 10:24 AM, Aljoscha Krettek <aljoscha@apache.org>
>>> wrote:
>>>> Hi,
>>>> for RocksDB we simply use a TypeSerializer to serialize the key and
>>>> value to a byte[] array and store that in RocksDB. For a ListState, we
>>>> serialize the individual elements using a TypeSerializer and store them in
>>>> a comma-separated list in RocksDB. The snapshots of RocksDB that we write
>>>> to HDFS are regular backups of a RocksDB database, as described here:
>>>> https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB%3F. You
>>>> should be possible to read them from HDFS and restore them to a RocksDB
>>>> data base as described in the linked documentation.
>>>> tl;dr As long as you know the type of values stored in the state you
>>>> should be able to read them from RocksDB and deserialize the values using
>>>> TypeSerializer.
>>>> One more bit of information: Internally the state is keyed by (key,
>>>> namespace) -> value where namespace can be an arbitrary type that has
>>>> TypeSerializer. We use this to store window state that is both local to key
>>>> and the current window. For state that you store in a user-defined function
>>>> the namespace will always be null and that will be serialized by a
>>>> VoidSerializer that simply always writes a "0" byte.
>>>> Cheers,
>>>> Aljoscha
>>>> On Fri, 15 Apr 2016 at 00:18 igor.berman <igor.berman@gmail.com> wrote:
>>>>> Hi,
>>>>> we are evaluating Flink for new solution and several people raised
>>>>> concern
>>>>> of coupling too much to Flink -
>>>>> 1. we understand that if we want to get full fault tolerance and best
>>>>> performance we'll need to use Flink managed state(probably RocksDB
>>>>> backend
>>>>> due to volume of state)
>>>>> 2. but then if we latter find that Flink doesn't answer our needs(for
>>>>> any
>>>>> reason) - we'll need to extract this state in some way(since it's the
>>>>> only
>>>>> source of consistent state)
>>>>> In general I'd like to be able to take snapshot of backend and try to
>>>>> read
>>>>> it...do you think it's will be trivial task?
>>>>> say If I'm holding list state per partitioned key, would it be easy to
>>>>> take
>>>>> RocksDb file and open it?
>>>>> any thoughts regarding how can I convince people in our team?
>>>>> thanks in advance!
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Accessing-StateBackend-snapshots-outside-of-Flink-tp6116.html
>>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>>> archive at Nabble.com.

View raw message