flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sand Stone <sand.m.st...@gmail.com>
Subject Re: Flink docs in regards to State
Date Wed, 26 Apr 2017 15:24:24 GMT
To be clear, I like the direction of Flink is going with State:
Querytable State, MapState etc. MapState in particular is a great
feature and I am trying to find more documentation and/or usage
patterns with it before I dive into the deep end of the code. As far
as I can tell, the key in MapState does not have to be associated with
the key in keyed stream. So in theory, I should be able to use
MapState almost anywhere that accepts "RichXXX" functions.

Also, I wonder if it makes sense to have "global state" (stored in a
rocksdb backend) to be instantiated by Env and maintained by
JobManager. Sure the state access is RPC but the database lifetime is
maintained by the Flink cluster. Right now I think I could use a "long
running" job to expose a Queryable State to emulate this.


On Wed, Apr 26, 2017 at 8:01 AM, Timo Walther <twalthr@apache.org> wrote:
> Hi,
> you are right. There are some limitation about RichReduceFunctions on
> windows. Maybe the new AggregateFunction `window.aggregate()` could solve
> your problem, you can provide an accumulator which is your custom state that
> you can update for each record. I couldn't find a documentation page, it
> might be created in next weeks after the feature freeze.
> Regarding the MapState I loop in Stefan, maybe he can give you some advice
> here.
> Timo
> Am 26/04/17 um 04:25 schrieb Sand Stone:
>> Hi, Flink newbie here.
>> I played with the API (built from GitHub master), I encountered some
>> issues but I am not sure if they are limitations or actually by
>> design:
>>      1. the data stream reduce method does not take a
>> RichReduceFunction. The code compiles but throws runtime exception
>> when submitted. [My intent is to maintain a MapState, more below]
>>       2. Flink seems to be picky on where the MapState is used at
>> runtime. MapState is restricted to keyed stream, and cannot be used
>> with certain operators. However I might need to maintain a MapState
>> for certain (persistent) keyed state for processing contexts. [I could
>> use an external kv store via async io API, but I am hoping Flink could
>> help to maintain the (rocksdb) db instances so I could avoid another
>> layer of external store].
>> Any pointer to blog/doc/video is greatly appreciated.
>> Thanks!

View raw message