flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Cleanup of OperatorStates?
Date Fri, 27 Nov 2015 13:19:47 GMT
Hi,

Thanks for the explanation.
I have clickstream data arriving in realtime and I need to assign the
visitId and stream it out again (with the visitId now begin part of the
record) into Kafka with the lowest possible latency.
Although the Window feature allows me to group and close the visit on a
timeout/expire (as shown to me by Aljoscha in a separate email) it does
make a 'window'.

So (as requested) I created a ticket for such a feature:
https://issues.apache.org/jira/browse/FLINK-3089

Niels






On Fri, Nov 27, 2015 at 11:51 AM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Niels!
>
> Currently, state is released by setting the value for the key to null. If
> you are tracking web sessions, you can try and send a "end of session"
> element that sets the value to null.
>
> To be on the safe side, you probably want state that is automatically
> purged after a while. I would look into using Windows for that. The
> triggers there are flexible so you can schedule both actions on elements
> plus cleanup after a certain time delay (clock time or event time).
>
> The question about "state expiry" has come a few times. People seem to
> like working on state directly, but it should clean up automatically.
>
> Can you see if your use case fits onto windows, otherwise open a ticket
> for state expiry?
>
> Greetings,
> Stephan
>
>
> On Thu, Nov 26, 2015 at 10:42 PM, Niels Basjes <Niels@basjes.nl> wrote:
>
>> Hi,
>>
>> I'm working on a streaming application that ingests clickstream data.
>> In a specific part of the flow I need to retain a little bit of state per
>> visitor (i.e. keyBy(sessionid) )
>>
>> So I'm using the Key/Value state interface (i.e. OperatorState<MyRecord>)
>> in a map function.
>>
>> Now in my application I expect to get a huge number of sessions per day.
>> Since these sessionids are 'random' and become unused after the visitor
>> leaves the website over time the system will have seen millions of those
>> sessionids.
>>
>> So I was wondering: how are these OperatorStates cleaned?
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Mime
View raw message