flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anwar Rizal <anriza...@gmail.com>
Subject Re: Apache Flink Operator State as Query Cache
Date Mon, 16 Nov 2015 14:00:39 GMT
Stephan,

Having a look at the brand new 0.10 release, I noticed that OperatorState
is not implemented for ConnectedStream, which is quite the opposite of what
you said below.

Or maybe I misunderstood your sentence here ?

Thanks,
Anwar.


On Wed, Nov 11, 2015 at 10:49 AM, Stephan Ewen <sewen@apache.org> wrote:

> Hi!
>
> In general, if you can keep state in Flink, you get better
> throughput/latency/consistency and have one less system to worry about
> (external k/v store). State outside means that the Flink processes can be
> slimmer and need fewer resources and as such recover a bit faster. There
> are use cases for that as well.
>
> Storing the model in OperatorState is a good idea, if you can. On the
> roadmap is to migrate the operator state to managed memory as well, so that
> should take care of the GC issues.
>
> We are just adding functionality to make the Key/Value operator state
> usable in CoMap/CoFlatMap as well (currently it only works in windows and
> in Map/FlatMap/Filter functions over the KeyedStream).
> Until the, you should be able to use a simple Java HashMap and use the
> "Checkpointed" interface to get it persistent.
>
> Greetings,
> Stephan
>
>
> On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05041@gmail.com> wrote:
>
>> Thanks for the answer.
>>
>> Currently the approach that i'm using right now is creating a base/marker
>> interface to stream different type of message to the same operator. Not
>> sure about the performance hit about this compare to the CoFlatMap
>> function.
>>
>> Basically this one is providing query cache, so i'm thinking instead of
>> using in memory cache like redis, ignite etc, i can just use operator state
>> for this one.
>>
>> I just want to gauge do i need to use memory cache or operator state
>> would be just fine.
>>
>> However i'm concern about the Gen 2 Garbage Collection for caching our
>> own state without using operator state. Is there any clarification on that
>> one ?
>>
>>
>>
>> On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anrizal05@gmail.com> wrote:
>>
>>>
>>> Let me understand your case better here. You have a stream of model and
>>> stream of data. To process the data, you will need a way to access your
>>> model from the subsequent stream operations (map, filter, flatmap, ..).
>>> I'm not sure in which case Operator State is a good choice, but I think
>>> you can also live without.
>>>
>>> val modelStream = .... // get the model stream
>>> val dataStream   =
>>>
>>> modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can
>>> keep the latest model in a CoFlatMapRichFunction, not necessarily as
>>> Operator State, although maybe OperatorState is a good choice too.
>>>
>>> Does it make sense to you ?
>>>
>>> Anwar
>>>
>>> On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan <if05041@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have a high density data that required a downsample. However this
>>>> downsample model is very flexible based on the client device and user
>>>> interaction. So it will be wasteful to precompute and store to db.
>>>>
>>>> So we want to use Apache Flink to do downsampling and cache the result
>>>> for subsequent query.
>>>>
>>>> We are considering using Flink Operator state for that one.
>>>>
>>>> Is that the right approach to use that for memory cache ? Or if that
>>>> preferable using memory cache like redis etc.
>>>>
>>>> Any comments will be appreciated.
>>>>
>>>>
>>>> Cheers
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>

Mime
View raw message