flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Malgorzata Kudelska <m.j.kudel...@gmail.com>
Subject Re: Incremental updates
Date Wed, 25 May 2016 18:49:40 GMT
Hi,
I have the following situation.
- a keyed stream with a key defined as: userId % numberOfPartitions
- a custom flatMap transformation where I use a StateValue variable to keep
the state of some calculations for each userId
- my questions are:
1. Does flink guarantee that the users with a given key will be always
processed by the same partition assuming that the number of nodes is
constant?
2. What will happen when one node disapears or a new one joins?  How will
flink redistribute the users that were processed by the one that disapeared?
3. Will flink restore the state variables of these users from the last
checkpoint and redistribute them to the new processing nodes?
4. How will flink redistribute the worload when a new node joins?

Cheers,
Gosia
Hi,
right now, this does not work but we're is also actively working on that.
This is the design doc for part one of the necessary changes:
https://docs.google.com/document/d/1G1OS1z3xEBOrYD4wSu-LuBCyPUWyFd9l3T9WyssQ63w/edit?usp=sharing

Cheers,
Aljoscha

On Wed, 25 May 2016 at 13:32 Malgorzata Kudelska <m.j.kudelska@gmail.com>
wrote:

> Hi,
> Thanks for your reply.
>
> Is Flink able to detect that an additional server joined and rebalance the
> processing? How is it done if I have a keyed stream and some custom
> ValueState variables?
>
> Cheers,
> Gosia
>
> 2016-05-25 11:32 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>
>> Hi Gosia,
>> right now, Flink is not doing incremental checkpoints. Every checkpoint
>> is fully valid in isolation. Incremental checkpointing came up several
>> times on ML discussions and we a planning to work on it once someone finds
>> some free time.
>>
>> Cheers,
>> Aljoscha
>>
>> On Wed, 25 May 2016 at 09:29 Rubén Casado <ruben.casado@treelogic.com>
>> wrote:
>>
>>> Hi Gosia
>>>
>>> You can have a look to the PROTEUS project we are doing [1]. We are
>>> implementing incremental version of analytics operations. For example you
>>> can see in [2] the implementation of the incremental AVG. Maybe the code
>>> can give you some ideas :-)
>>>
>>>
>>> [1] https://github.com/proteus-h2020/proteus-backend/tree/development
>>> [2]
>>> https://github.com/proteus-h2020/proteus-backend/blob/development/src/main/java/com/treelogic/proteus/flink/incops/IncrementalAverage.java
>>>
>>> ______________________________________
>>>
>>> *Dr. Rubén Casado*
>>> Head of Big Data
>>> Treelogic
>>> <http://es.linkedin.com/in/rcasadot/> <https://twitter.com/ruben_casado>
>>> *ruben.casado.treelogic*
>>>
>>> +34 902 286 386 - +34 607 18 28 06
>>> Parque Tecnológico de Asturias · Parcela 30
>>> E33428 Llanera · Asturias [Spain]
>>> <http://www.treelogic.com>www.treelogic.com
>>> ______________________________________
>>>
>>>
>>> ----- Mensaje original -----
>>> De: "Malgorzata Kudelska" <m.j.kudelska@gmail.com>
>>> Para: user@flink.apache.org
>>> Enviados: Martes, 24 de Mayo 2016 22:01:28 GMT +01:00 Amsterdam / Berlín
>>> / Berna / Roma / Estocolmo / Viena
>>> Asunto: Incremental updates
>>>
>>>
>>> Hi,
>>> I have the following question. Does Flink support incremental updates?
>>>
>>> In particular, I have a custom StateValue object and during the
>>> checkpoints I would like to save only the fields that changed since the
>>> previous checkpoint. Is that possible?
>>>
>>> Regards,
>>> Gosia
>>>
>>
>

Mime
View raw message