flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paris Carbone <par...@kth.se>
Subject Re: Making state in streaming more explicit
Date Thu, 30 Apr 2015 19:27:20 GMT
I agree with all suggestions, thanks for summing it up Stephan.

A few more points I have in mind at the moment:

- Regarding the acknowledgements, indeed we don’t need to make all operators commit back,
we just have to make sure that all sinks have acknowledged a checkpoint to consider it complete
back at the coordinator.

- Do you think we should broadcast commit responses to sources that need it after every successful
checkpoint? The checkpoint interval does not always match with the frequency we want to initiate
a compaction for example on Kafka. One alternative would be to make sources request a successful
checkpoint id via a future on demand.

- We have to update the current checkpointing approach to cover iterative streams. We need
to make sure we don’t send checkpoint requests to iteration heads and handle downstream
backup for records in transit during checkpoints accordingly. 

cheers
Paris

> On 30 Apr 2015, at 20:47, Stephan Ewen <sewen@apache.org> wrote:
> 
> I was looking into the handling of state in streaming operators, and it is
> a bit hidden from the system
> 
> Right now, functions can (of they want) put some state into their context.
> At runtime, state may occur or not. Before runtime, the system cannot tell
> which operators are going to be stateful, and which are going to be
> stateless.
> 
> I think it is a good idea to expose that. We can use that for optimizations
> and we know which operators need to checkpoint state and acknowledge the
> asynchronous checkpoint.
> 
> At this point, we need to assume that all operators need to send a
> confirmation message, which is unnecessary.
> 
> Also, I think we should expose which operations want a "commit"
> notification after the checkpoint completed. Good examples are
> 
>  - the KafkaConsumer source, which can then commit the offset that is safe
> to zookeeper
> 
>  - a transactional KafkaProduce sink, which can commit a batch of messages
> to the kafka partition once the checkpoint is done (to get exactly once
> guarantees that include the sink)
> 
> Comments welcome!
> 
> Greetings,
> Stephan

Mime
View raw message