flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: custom control messages from source
Date Wed, 20 Jul 2016 11:30:40 GMT
Hi Chen!

If I understand, you want to implement a custom way of triggering
checkpoints, based on messages in the input message queue (for example
based on Kafka events)? Basically to trigger a checkpoint when you have
received a special message through each Kafka partition?

Please let me know if that is the correct understanding of your question.

That is an interesting use case. My feeling is that the simplest way to go
about that would be to do the following:
  - Add to Flink a way to trigger a checkpoint via a RPC call. There is
such a call to trigger savepoints, for checkpoints, it could be similarly
added.
  - Implement a custom source that waits for these "trigger-events". When
it receives such an event, it blocks and send an RPC call to a custom
service you implement.
  - Once the service has received a call from all sources, it sends an RPC
call to the checkpoint coordinator to trigger the checkpoint.
  - The sources continue after they triggered the checkpoint.

Greetings,
Stephan


On Mon, Jul 18, 2016 at 12:40 AM, Chen Qin <qinnchen@gmail.com> wrote:

> Hi there,
>
> So far, checkpoint trigger is hardcoded in CheckpointCorrdinator which
> triggered periodically and push control messages to task managers. It was
> implemented orthogonal to business logics implemented in jobs.
>
> Our scenario requires master pipeline flow control messages along with
> events in distributed queue. When follower pipeline source detect a control
> message(checkpoint_barrier_id / watermark / restore_to_checkpoint_id) it
> will block itself and send message to checkpoint coordinator and request a
> specific checkpoint. Once ack from checkpoint coordinator, and acked back
> checkpoint coordinator, source unblock itself and keep going. It would
> contains certain message de-dupe logic which always honor first seeing
> control message and discard latter duplicates given the assumption that
> events consumed in distributed queue is strongly ordered.
>
> Pros:
>
>    - Allow pipeline author define customized checkpointing logic
>    - Allow break down large pipeline into smaller ones and connect via
>    streaming queue
>       - leverage data locality and run sub pipeline near it's dependency.
>       - pipe late arriving events to a side pipeline and consolidate there
>
> Cons:
>
>    - Overhead of blocking follower pipeline source
>    - Overhead of distributed queue latency
>    - Overhead of writing control messages to distributed queue
>
>
> Does that makes sense?
>
> Thanks,
> Chen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message