flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2727) Add a base class for MessageQueue-with-acknowledgement sources
Date Tue, 22 Sep 2015 11:34:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902419#comment-14902419
] 

ASF GitHub Bot commented on FLINK-2727:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/1163

    [FLINK-2727] [streaming] Add a base class for Message Queue Sources that acknowledge messages
by ID

    Several message queues (RabbitMQ, Amazon SQS) have the pattern that you retrieve messages
and acknowledge them back by ID. This pull request adds a simple base non-parallel source
that provides tooling for:
    
      - Collecting the IDs of elements emitted between two checkpoints
      - Persisting them with the checkpoint, respecting proper serialization
      - Acknowledging them when a checkpoint is notified of completion.
    
    This assumes that the Message Queues retain unacknowledged messages and re-emit them after
the acknowledgement period expired.
    
    ### Form the class header
    
    The mechanism for this source assumes that messages are identified by a unique ID.
    When messages are taken from the message queue, the message must not be dropped immediately,
but must be retained until acknowledged. Messages that are not acknowledged within a certain
time interval will be served again (to a different connection, established by the recovered
source).
    
    Note that this source can give no guarantees about message order in the case of failures,
because messages that were retrieved but not yet acknowledged will be returned later again,
after a set of messages that was not retrieved before the failure.
    
    Internally, this source gathers the IDs of elements it emits. Per checkpoint, the IDs
are stored and acknowledged when the checkpoint is complete. That way, no message is acknowledged
unless it is certain that it has been successfully processed throughout the topology and the
updates to any state caused by that message are persistent.
    
    All messages that are emitted and successfully processed by the streaming program will
eventually be acknowledged. In corner cases, the source may acknowledge certain IDs multiple
times, if a failure occurs while acknowledging.
    
    A typical way to use this base in a source function is by implementing a run() method
as follows:
    ```java
    public void run(SourceContext<Type> ctx) throws Exception {
        while (running) {
            Message msg = queue.retrieve();
            synchronized (ctx.getCheckpointLock()) {
                ctx.collect(msg.getMessageData());
                addId(msg.getMessageId());
            }
        }
    }
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink messagequeuesource

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1163
    
----
commit 505bd0baa34560ca8a7f2744b3b7890152133a1e
Author: Stephan Ewen <sewen@apache.org>
Date:   2015-09-22T11:23:56Z

    [FLINK-2727] [streaming] Add a base class for Message Queue Sources that acknowledge messages
by ID.

----


> Add a base class for MessageQueue-with-acknowledgement sources
> --------------------------------------------------------------
>
>                 Key: FLINK-2727
>                 URL: https://issues.apache.org/jira/browse/FLINK-2727
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>    Affects Versions: 0.10
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>
> Several message queues (RabbitMQ, Amazon SQS) have the pattern that you retrieve messages
and acknowledge them back by ID.
> We can create a simple base non-parallel source that provides tooling for:
>   - Collecting the IDs of elements emitted between two checkpoints
>   - Persisting them with the checkpoint, respecting proper serialization
>   - Acknowledging them when a checkpoint is notified of completion.
> This assumes that the Message Queues retain unacknowledged messages and re-emit them
after the acknowledgement period expired. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message