flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Lam <paullin3...@gmail.com>
Subject Re: How to clear keyed states periodically?
Date Fri, 14 Sep 2018 09:28:13 GMT
Hi David,

Your information is very helpful! Thank you!

BroadcastStream can definitely do the job, but I think it makes the architecture kind of complicated,
so it will be my last resort . 

I wonder if it’s possible to implement a clearAll() method for keyed states which clears
user states for all namespaces, and does it violate the principle of keyed states? 

Thanks again!

Best,
Paul Lam

> 在 2018年9月14日,16:00,David Anderson <david@data-artisans.com> 写道:
> 
> Paul,
> 
> Theoretically, processing-time timers will get the job done, but yes, you'd need a timer
per key -- and folks who've tried this with millions of keys, all firing at the same time,
have reported that this behaves badly. For some use cases it's workable to spread out the
timers over an interval, like an hour or two, to avoid this timer firing storm, but that doesn't
sound like it would work well for you.
> 
> You might instead try using broadcast state to deal with this. You would establish a
broadcast stream connected to your keyed stream that acts as a control stream for the keyed
state. Then in the processBroadcastElement method of a KeyedBroadcastProcessFunction you would
use applyToKeyedState to iterate over all the keyed state and clear everything. Unfortunately
it's not possible to use timers on broadcast state, so you'll have to find some other way
to trigger the event on the broadcast stream -- maybe a custom source that uses a ProcessingTimeCallback
to create events on the broadcast stream.
> 
> David
> 
> On Fri, Sep 14, 2018 at 7:18 AM Paul Lam <paullin3280@gmail.com <mailto:paullin3280@gmail.com>>
wrote:
> >
> > Hi vino,
> >
> > Thanks for the advice, but I think state TTL does not completely fit in my case.
> >
> > AFAIK, State TTL is per entry level and uses an inactive time threshold to expire
entries, but I need a TTL for the whole MapState, which does not depend on when the entries
are created or updated. Suppose I’m calculating stats of daily active users and use a userId
field as key, I want the state totally truncated at the very beginning of each day. 
> >
> > Thanks a lot!
> >
> > Best,
> > Paul Lam
> >
> >
> > 在 2018年9月14日,10:39,vino yang <yanghua1127@gmail.com <mailto:yanghua1127@gmail.com>>
写道:
> >
> > Hi Paul,
> >
> > Maybe you can try to understand the State TTL?[1]
> >
> > Thanks, vino.
> >
> > [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#state-time-to-live-ttl
<https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#state-time-to-live-ttl>
> >
> > Paul Lam <paullin3280@gmail.com <mailto:paullin3280@gmail.com>> 于2018年9月12日周三
下午6:06写道:
> >>
> >> Hi,
> >>
> >> I’m using MapState to deduplicate some ids and the MapState needs to be truncated
periodically. I tried to use ProcessingTimeCallback to call state.clear(), but in this way
I can only clear the state for one key, and actually I need a key group level cleanup. So
I’m wondering is there any best practice for my case? Thanks a lot!
> >>
> >> Best,
> >> Paul Lam
> >
> >
> 
> 
> --
> David Anderson | Training Coordinator | data Artisans
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time


Mime
View raw message