Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9327B18728 for ; Thu, 12 Nov 2015 01:42:20 +0000 (UTC) Received: (qmail 66891 invoked by uid 500); 12 Nov 2015 01:42:20 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 66798 invoked by uid 500); 12 Nov 2015 01:42:20 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 66788 invoked by uid 99); 12 Nov 2015 01:42:20 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2015 01:42:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CEE831A076C for ; Thu, 12 Nov 2015 01:42:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.15 X-Spam-Level: *** X-Spam-Status: No, score=3.15 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id HF1sSvlDKRud for ; Thu, 12 Nov 2015 01:42:10 +0000 (UTC) Received: from mail-io0-f172.google.com (mail-io0-f172.google.com [209.85.223.172]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 4DC9B251D5 for ; Thu, 12 Nov 2015 01:42:09 +0000 (UTC) Received: by iouu10 with SMTP id u10so43531929iou.0 for ; Wed, 11 Nov 2015 17:42:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=z3TSkX3waoUz6zSADnojEULE9NqFItKMx2OkQ2Q+A34=; b=u1Tfr5snqC3qG15XmIg58qxposuRt7y3ZGXMsXIfyD/Wsn5vPDesTgqmlqNSiLw5WY 8zxxMCnJPpeKlSn/DgWVShLFyIpXftUHqP0qoxO+aADGmOXrIvGul2w+r7cOKd4IwfUD RrrcBEn4mUtimbqWaxwfUcNdD2lj+GyFyFzpFG4tA+bHjCUUAjB33rkYLUuIsmHrkC8V 328T6xp+gNBJPPy3kjsFLgQgCZ/B837jLr0ov25AnFxjpCdQ+RdOqjziOLguj0tfoGD1 OchEPPKCXtzoGpDvtmLZMTcBEBjMQ7ro+npIgW5lLvYovmdw4HJ6zyBFPztig29WpWoZ k0Ig== MIME-Version: 1.0 X-Received: by 10.107.25.1 with SMTP id 1mr12467051ioz.9.1447292528155; Wed, 11 Nov 2015 17:42:08 -0800 (PST) Received: by 10.107.18.86 with HTTP; Wed, 11 Nov 2015 17:42:08 -0800 (PST) In-Reply-To: References: Date: Thu, 12 Nov 2015 08:42:08 +0700 Message-ID: Subject: Re: Apache Flink Operator State as Query Cache From: Welly Tambunan To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a113ff22406e2e205244e0d9a --001a113ff22406e2e205244e0d9a Content-Type: text/plain; charset=UTF-8 Hi Stephan, >Storing the model in OperatorState is a good idea, if you can. On the roadmap is to migrate the operator state to managed memory as well, so that should take care of the GC issues. Is this using off the heap memory ? Which version we expect this one to be available ? Another question is when will the release version of 0.10 will be out ? We would love to upgrade to that one when it's available. That version will be a production ready streaming right ? On Wed, Nov 11, 2015 at 4:49 PM, Stephan Ewen wrote: > Hi! > > In general, if you can keep state in Flink, you get better > throughput/latency/consistency and have one less system to worry about > (external k/v store). State outside means that the Flink processes can be > slimmer and need fewer resources and as such recover a bit faster. There > are use cases for that as well. > > Storing the model in OperatorState is a good idea, if you can. On the > roadmap is to migrate the operator state to managed memory as well, so that > should take care of the GC issues. > > We are just adding functionality to make the Key/Value operator state > usable in CoMap/CoFlatMap as well (currently it only works in windows and > in Map/FlatMap/Filter functions over the KeyedStream). > Until the, you should be able to use a simple Java HashMap and use the > "Checkpointed" interface to get it persistent. > > Greetings, > Stephan > > > On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan wrote: > >> Thanks for the answer. >> >> Currently the approach that i'm using right now is creating a base/marker >> interface to stream different type of message to the same operator. Not >> sure about the performance hit about this compare to the CoFlatMap >> function. >> >> Basically this one is providing query cache, so i'm thinking instead of >> using in memory cache like redis, ignite etc, i can just use operator state >> for this one. >> >> I just want to gauge do i need to use memory cache or operator state >> would be just fine. >> >> However i'm concern about the Gen 2 Garbage Collection for caching our >> own state without using operator state. Is there any clarification on that >> one ? >> >> >> >> On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal wrote: >> >>> >>> Let me understand your case better here. You have a stream of model and >>> stream of data. To process the data, you will need a way to access your >>> model from the subsequent stream operations (map, filter, flatmap, ..). >>> I'm not sure in which case Operator State is a good choice, but I think >>> you can also live without. >>> >>> val modelStream = .... // get the model stream >>> val dataStream = >>> >>> modelStream.broadcast.connect(dataStream). coFlatMap( ) Then you can >>> keep the latest model in a CoFlatMapRichFunction, not necessarily as >>> Operator State, although maybe OperatorState is a good choice too. >>> >>> Does it make sense to you ? >>> >>> Anwar >>> >>> On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan >>> wrote: >>> >>>> Hi All, >>>> >>>> We have a high density data that required a downsample. However this >>>> downsample model is very flexible based on the client device and user >>>> interaction. So it will be wasteful to precompute and store to db. >>>> >>>> So we want to use Apache Flink to do downsampling and cache the result >>>> for subsequent query. >>>> >>>> We are considering using Flink Operator state for that one. >>>> >>>> Is that the right approach to use that for memory cache ? Or if that >>>> preferable using memory cache like redis etc. >>>> >>>> Any comments will be appreciated. >>>> >>>> >>>> Cheers >>>> -- >>>> Welly Tambunan >>>> Triplelands >>>> >>>> http://weltam.wordpress.com >>>> http://www.triplelands.com >>>> >>> >>> >> >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com >> > > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com --001a113ff22406e2e205244e0d9a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Stephan,=C2=A0

><= span style=3D"font-size:12.8px">Storing the model in OperatorState is a goo= d idea, if you can. On the roadmap is to migrate the operator state to mana= ged memory as well, so that should take care of the GC issues.
=
Is this using off the heap memory ? W= hich version we expect this one to be available ?=C2=A0

Another question is when will the release version of 0.10 will b= e out ? We would love to upgrade to that one when it's available. That = version will be a production ready streaming right ?



<= /div>


On Wed, Nov 11, 2015 at 4:= 49 PM, Stephan Ewen <sewen@apache.org> wrote:
Hi!

In general, if y= ou can keep state in Flink, you get better throughput/latency/consistency a= nd have one less system to worry about (external k/v store). State outside = means that the Flink processes can be slimmer and need fewer resources and = as such recover a bit faster. There are use cases for that as well.

Storing the model in OperatorState is a good idea, if you= can. On the roadmap is to migrate the operator state to managed memory as = well, so that should take care of the GC issues.

We are just adding functionality to make the Key/Value operator state us= able in CoMap/CoFlatMap as well (currently it only works in windows and in = Map/FlatMap/Filter functions over the KeyedStream).
Until the, yo= u should be able to use a simple Java HashMap and use the "Checkpointe= d" interface to get it persistent.

Greetings,=
Stephan


On Sun,= Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05041@gmail.com>= wrote:
Thanks for = the answer.=C2=A0

Currently the approach that = i'm using right now is creating a base/marker interface to stream diffe= rent type of message to the same operator. Not sure about the performance h= it about this compare to the CoFlatMap function.=C2=A0

=
Basically this one is providing query cache, so i'm thinking inste= ad of using in memory cache like redis, ignite etc, i can just use operator= state for this one.=C2=A0

I just want to gauge do= i need to use memory cache or operator state would be just fine.=C2=A0

However i'm concern about the Gen 2 Garbage Colle= ction for caching our own state without using operator state. Is there any = clarification on that one ?=C2=A0


=

On Sat,= Nov 7, 2015 at 12:38 AM, Anwar Rizal <anrizal05@gmail.com> wrote:
=

Let me understand your case better here. You have = a stream of model and stream of data. To process the data, you will need a = way to access your model from the subsequent stream operations (map, filter= , flatmap, ..).
I'm not sure in which case Operator State is a= good choice, but I think you can also live without.

val model= Stream =3D .... // get the model stream
val dataStream=C2=A0=C2=A0= =3D=C2=A0

modelStream.broadcast.connect(dataStream). coFlatM= ap(=C2=A0 ) Then you can keep the latest model in a CoFlatMapRichFunction, = not necessarily as Operator State, although maybe OperatorState is a good c= hoice too.

Does it make sense to you ?
Anwar
On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambuna= n <if05041@gmail.com> wrote:
Hi All,=C2=A0

We have a high density= data that required a downsample. However this downsample model is very fle= xible based on the client device and user interaction. So it will be wastef= ul to precompute and store to db.=C2=A0

So we want= to use Apache Flink to do downsampling and cache the result for subsequent= query.=C2=A0

We are considering using Flink Opera= tor state for that one.=C2=A0

Is that the right ap= proach to use that for memory cache ? Or if that preferable using memory ca= che like redis etc.=C2=A0

Any comments will be app= reciated.=C2=A0


Cheers<= /div>--




--
=




--
=
Welly Tambunan
Triplelands=C2=A0

<= a href=3D"http://weltam.wordpress.com" target=3D"_blank">http://weltam.word= press.com
--001a113ff22406e2e205244e0d9a--