Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 320741887F for ; Fri, 27 Nov 2015 13:19:57 +0000 (UTC) Received: (qmail 63585 invoked by uid 500); 27 Nov 2015 13:19:57 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 63492 invoked by uid 500); 27 Nov 2015 13:19:57 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 63475 invoked by uid 99); 27 Nov 2015 13:19:56 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Nov 2015 13:19:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3FFAAC0E02 for ; Fri, 27 Nov 2015 13:19:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.983 X-Spam-Level: ** X-Spam-Status: No, score=2.983 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=basj-es.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xKMCo7wy3SuS for ; Fri, 27 Nov 2015 13:19:48 +0000 (UTC) Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id D97F02050F for ; Fri, 27 Nov 2015 13:19:47 +0000 (UTC) Received: by wmec201 with SMTP id c201so70271053wme.0 for ; Fri, 27 Nov 2015 05:19:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=basj-es.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=WaRI0+UW5iNwrZPeYxi7WbpnTGuDCtxBPxEeAJTRl74=; b=QUllBu4WGfDn1SDjMjvU/ga1tFHnJMNVt4thqI5ghnzcZT+BBqkimdGfvCIeumscLo 5YsKWnpZx5RrvbZM6cXMMZf2nHrnyG+MmVsbTxshHm1oUPOixOqcWc100SvzGnqmjNA0 zZD5wqpS7daS6Ejkyy98SEEYZJvye2UHA/CZ7iESKZYqb8q6B/dyU3b47vZ0OseAiowc FJHviTmYYVPnUwYCXr9pu7DD/iay5qWiboNIUQL6rSt9D1bRElykLS3aIIoRg7JyGq3Q KcsWsiQbCFDPScT6ZYR3OiMcWo7rLtdWU2rVEjJE1GFEH4zcjedOWNUZZQKlVtcLz41M 7bfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=WaRI0+UW5iNwrZPeYxi7WbpnTGuDCtxBPxEeAJTRl74=; b=lvA9lMeBFPIChYsxXTBA3dQ+g2NJXrGu8UkTHO1kuQpYz38t8e1RII2XoYYv+VMVR5 S6LwHm51PXRnxs86k3xcMCiyXy1MQOmhIOQ4dETvMkqDeflKm7Z6WrjljgoEQ7EOTRUI zyu2nf6sdxtTD+tXTw+KX8UYfYY2zR397N2vOOQmxi1eZEbXUpPT4dzsF5GFzN2XjHu+ WQujDs4XvpNNbZUBmMByfsnmTui4CxtIdWGtlJNBHillX596yKB73Idgm6k579Cn1S4B /k/knRgP26XPxCnbM3+hfS4qpNhVEDk9M0i0//ls/yJOf9qQWbqWiz2IfyLoFvm0UxeQ Y3sQ== X-Gm-Message-State: ALoCoQluJ7901JT6jWebPQWxtDCgytmKTsPnxPAhKaerFwK02sTCpGWxsJ9Ax9/GN4fB53W6Ce01 MIME-Version: 1.0 X-Received: by 10.28.183.198 with SMTP id h189mr10103981wmf.44.1448630387595; Fri, 27 Nov 2015 05:19:47 -0800 (PST) Sender: niels@basj.es Received: by 10.28.54.141 with HTTP; Fri, 27 Nov 2015 05:19:47 -0800 (PST) X-Originating-IP: [91.195.1.33] In-Reply-To: References: Date: Fri, 27 Nov 2015 14:19:47 +0100 X-Google-Sender-Auth: 8O8sdwcz724vmkeK-ZQbtx8sLFg Message-ID: Subject: Re: Cleanup of OperatorStates? From: Niels Basjes To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a1148eb74a9ed8f0525858b61 --001a1148eb74a9ed8f0525858b61 Content-Type: text/plain; charset=UTF-8 Hi, Thanks for the explanation. I have clickstream data arriving in realtime and I need to assign the visitId and stream it out again (with the visitId now begin part of the record) into Kafka with the lowest possible latency. Although the Window feature allows me to group and close the visit on a timeout/expire (as shown to me by Aljoscha in a separate email) it does make a 'window'. So (as requested) I created a ticket for such a feature: https://issues.apache.org/jira/browse/FLINK-3089 Niels On Fri, Nov 27, 2015 at 11:51 AM, Stephan Ewen wrote: > Hi Niels! > > Currently, state is released by setting the value for the key to null. If > you are tracking web sessions, you can try and send a "end of session" > element that sets the value to null. > > To be on the safe side, you probably want state that is automatically > purged after a while. I would look into using Windows for that. The > triggers there are flexible so you can schedule both actions on elements > plus cleanup after a certain time delay (clock time or event time). > > The question about "state expiry" has come a few times. People seem to > like working on state directly, but it should clean up automatically. > > Can you see if your use case fits onto windows, otherwise open a ticket > for state expiry? > > Greetings, > Stephan > > > On Thu, Nov 26, 2015 at 10:42 PM, Niels Basjes wrote: > >> Hi, >> >> I'm working on a streaming application that ingests clickstream data. >> In a specific part of the flow I need to retain a little bit of state per >> visitor (i.e. keyBy(sessionid) ) >> >> So I'm using the Key/Value state interface (i.e. OperatorState) >> in a map function. >> >> Now in my application I expect to get a huge number of sessions per day. >> Since these sessionids are 'random' and become unused after the visitor >> leaves the website over time the system will have seen millions of those >> sessionids. >> >> So I was wondering: how are these OperatorStates cleaned? >> >> >> -- >> Best regards / Met vriendelijke groeten, >> >> Niels Basjes >> > > -- Best regards / Met vriendelijke groeten, Niels Basjes --001a1148eb74a9ed8f0525858b61 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Thanks for the explanation.
I have clickstream data arriving in realtime and I need to assign the vi= sitId and stream it out again (with the visitId now begin part of the recor= d) into Kafka with the lowest possible latency.
Although the Wind= ow feature allows me to group and close the visit on a timeout/expire (as s= hown to me by=C2=A0Aljoscha in a separate email) it does make a 'window= '.

So (as requested) I created a ticket fo= r such a feature:

=
On Fri, Nov 27, 2015 at 11:51 AM, Stephan Ewen <= span dir=3D"ltr"><= sewen@apache.org> wrote:
Hi Niels!

Currently, state is released by= setting the value for the key to null. If you are tracking web sessions, y= ou can try and send a "end of session" element that sets the valu= e to null.

To be on the safe side, you probably wa= nt state that is automatically purged after a while. I would look into usin= g Windows for that. The triggers there are flexible so you can schedule bot= h actions on elements plus cleanup after a certain time delay (clock time o= r event time).

The question about "state expi= ry" has come a few times. People seem to like working on state directl= y, but it should clean up automatically.

Can y= ou see if your use case fits onto windows, otherwise open a ticket for stat= e expiry?

Greetings,
Stephan
<= br>

On Thu, Nov 26, 2015 at 10:42 PM, Nie= ls Basjes <Niels@basjes.nl> wrote:
Hi,

I'm working on a strea= ming application that ingests clickstream data.
In a specific par= t of the flow I need to retain a little bit of state per visitor (i.e. keyB= y(sessionid) )

So I'm using the Key/Value stat= e interface (i.e.=C2=A0OperatorState<MyRecord<= span style=3D"color:inherit;font-family:Menlo,'Lucida Console',mono= space;font-size:inherit;white-space:pre-wrap;line-height:1.42857;font-weigh= t:bold;background-color:transparent">>) in a map function.
<= div>
Now in my application I expect to get a huge number of s= essions per day.
Since these sessionids are 'random' and = become unused after the visitor leaves the website over time the system wil= l have seen millions of those sessionids.

So I was= wondering: how are these OperatorStates cleaned?


--
Best regards / Met= vriendelijke groeten,

Niels Basjes




--
=
Best regards / Met vriendelijke groeten,
=
Niels Basjes
--001a1148eb74a9ed8f0525858b61--