=C2=A0 =C2=A0 public void open(Configuration cfg) {

=C2=A0 =C2=A0 =C2=A0 =C2=A0 state =3D getRuntimeContext().getState(

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 new ValueS= tateDescriptor<Long>("count", LongSerializer.INSTANCE, 0L))= ;

=C2=A0 =C2=A0 }

=C2=A0 =C2=A0 public T= uple2<MyType, Long> map(MyType value) {

=C2=A0 =C2=A0 =C2= =A0 =C2=A0 long count =3D state.value() + 1;

=C2=A0 =C2=A0 =C2=A0= =C2=A0 state.update(value);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 return n= ew Tuple2<>(value, count);

=C2=A0 =C2=A0 }

});

Best,

Aljoscha

On Fri, 27 May 2016 at 18:59 Malgorza= ta Kudelska <m.j.kudelska@gmai= l.com> wrote:

Hi,
If I specify the userId as the key variable as you suggested, will the stat= e variables be kept for every observed value of the key? I have a situation= where I have a lot of userIds and many of them occure just once, so I don&= #39;t want to keep the state for them for ever. I need the possibility to s= et a timeout to forget the data regarding users that don't produce any = events for a certain amount if time. Is that possible with flink?
In order to add some custom information for every userId to the checkpointe= d state, do you suggest to make a ValueState variable for a stream keyed by= userId. If yes, could you give some example?

Cheers,
Gosia

Hi,
newly added nodes would sit idle, yes. Only when= we finish the rescaling work mentioned in the link will we be able to dyna= mically adapt.

The internal implementation of this= will in fact hash keys to a larger number of partitions than the number of= individual partitions and use these "key groups" to allows scali= ng to differing numbers of partitions. Once this is in it will also work on= Yarn. Right now, running on Yarn does not allow a job to dynamically pick = up new computing resources.

Cheers,
Aljo= scha

On Thu, 26 = May 2016 at 15:50 Malgorzata Kudelska <m.j.kudelska@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
Hi,
So is there any possibility to utilize an extra node that joins the cluster= or will it remain idle?
What if I use a custom key function that matches the key variable to a numb= er of keys bigger than the initial number of nodes (following the idea from= your link)?
What about running flink on yarn, would that solve anything?

Cheers,
Gosia

25 maj 2016 22:54 "Aljoscha Krettek" &= lt;aljoscha@apache= .org> napisa=C5=82(a):
Hi,
first question: are you manually keying = by "userId % numberOfPartitions"? Flink internally does roughly &= quot;key.hash() % numPartitions" so it is enough to specify the userId= as your key.

Now, for you questions:
1. What Flink guarantees is that the state for a key k is alwa= ys available when an element with key k is being processed. Internally, thi= s means that elements with the same key will be processed by the same parti= tion, though there would be other ways of achieving those guarantees.
=

2. Right now, when a node disappears the job will fail.= Then recovery will kick in and restore from the latest checkpoint on a (po= ssibly) new set of nodes. The system will make sure that the partitions and= the state are correctly matched.

3. Also answered= by the above, I hope at least :-)

4. This does cu= rrently not work but the ongoing work in this is tracked by=C2=A0https:= //issues.apache.org/jira/browse/FLINK-3755.

Ch= eers,
Aljoscha

On Wed, 25 May 2016 at 21:09 Malgorzata Kudelska <m.j.kudelska@gmail.com&g= t; wrote:
Hi,
I have the following situation.
- a keyed stream with a key defined as: userId % numberOfPartitions
- a custom flatMap transformation where I use a StateValue variable to keep= the state of some calculations for each userId
- my questions are:
1. Does flink guarantee that the users with a given key will be always proc= essed by the same partition assuming that the number of nodes is constant?<= br> 2. What will happen when one node disapears or a new one joins?=C2=A0 How w= ill flink redistribute the users that were processed by the one that disape= ared?
3. Will flink restore the state variables of these users from the last chec= kpoint and redistribute them to the new processing nodes?
4. How will flink redistribute the worload when a new node joins?

Cheers,
Gosia

Hi,
right now, this does not work but we're is a= lso actively working on that. This is the design doc for part one of the ne= cessary changes:=C2=A0https://docs.google.com/document/d/1G1OS1z3xEBOrYD4wSu-LuBCyPUWyFd9l3T9Wys= sQ63w/edit?usp=3Dsharing

Cheers,
Alj= oscha

On Wed, 25 May 2= 016 at 13:32 Malgorzata Kudelska <m.j.kudelska@gmail.com> wrote:
Hi,
Thanks for your reply.

Is Flink able to detect that an additional server jo= ined and rebalance the processing? How is it done if I have a keyed stream = and some custom ValueState variables?

Cheers,=C2= =A0
Gosia

2016-05-25 11:32 GMT+02:00 Aljoscha Krettek <aljoscha@apac= he.org>:
H= i Gosia,
right now, Flink is not doing incremental checkpoints. Every c= heckpoint is fully valid in isolation. Incremental checkpointing came up se= veral times on ML discussions and we a planning to work on it once someone = finds some free time.

Cheers,
Aljoscha

On Wed,= 25 May 2016 at 09:29 Rub=C3=A9n Casado <ruben.casado@treelogic.com> wrote:<= br>
Hi Gosia

You can have a look to the = PROTEUS project we are doing [1]. We are implementing incremental version o= f analytics operations. For example you can see in [2] the implementation o= f the incremental AVG. Maybe the code can give you some ideas :-)

[1] https://github.com/proteus-h2020/proteus-backend= /tree/development
[2] https://github.com/proteus-= h2020/proteus-backend/blob/development/src/main/java/com/treelogic/proteus/= flink/incops/IncrementalAverage.java

= ______________________________________
Dr. Rub=C3=A9n Casado
Head of Big Da= ta
Treelogic
<= font face=3D"arial, helvetica, sans-serif" color=3D"#3366FF">ruben.casado.treelogic

+34 902 28= 6 386 - +34 607 18 28 06
Parque Tecnol=C3=B3gico de Asturias =C2=B7 Parcela 30
E33428 Llanera =C2=B7 Asturias [Spain]
www.treelogic.com____= __________________________________
=

----- Mensaje original -----
De: "M= algorzata Kudelska" <m.j.kudelska@gmail.com>
Para: user@flink.apache.org
Enviados= : Martes, 24 de Mayo 2016 22:01:28 GMT +01:00 Amsterdam / Berl=C3=ADn / Ber= na / Roma / Estocolmo / Viena
Asunto: Incremental updates

Hi,
I have the following question. Does Flink support incremental updates?

In particular, I have a custom StateValue object and during = the checkpoints I would like to save only the fields that changed since the= previous checkpoint. Is that possible?

Regards,
Gosia