flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From StephanEwen <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Date Wed, 29 Jul 2015 14:05:33 GMT
GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/951

    [FLINK-2407] [streaming] Add an API switch to choose between "exactly once" and "at least
once".

    Adds a switch to choose between **exactly once** and **at least once** checkpointing mode.
    
    Exactly Once
    ==========
    Sets the checkpointing mode to "exactly once". This mode means that the system will checkpoint
the operator and user function state in such a way that, upon recovery, every record will
be reflected exactly once in the operator state.
    
    For example, if a user function counts the number of elements in a stream, this number
will consistently be equal to the number of actual elements in the stream, regardless of failures
and recovery.
    
    Note that this does not mean that each record flows through the streaming data flow only
once. It means that upon recovery, the state of operators/functions is restored such that
the resumed data streams pick up exactly at after the last modification to the state.
     
    Note that this mode does not guarantee exactly-once behavior in the interaction with external
systems (only state in Flink's operators and user functions). The reason for that is that
a certain level of "collaboration" is required between two systems to achieve exactly-once
guarantees. However, for certain systems, connectors can be written that facilitate this collaboration.
    
    This mode sustains high throughput. Depending on the data flow graph and operations, this
mode may increase the record latency, because operators need to align their input streams,
in order to create a consistent snapshot point. The latency increase for simple dataflows
(no repartitioning) is negligible. For simple dataflows with repartitioning, the average latency
remains small, but the slowest records typically have an increased latency.
    
    
    At Least Once
    ===========
    
    Sets the checkpointing mode to "at least once". This mode means that the system will checkpoint
the operator and user function state in a simpler way. Upon failure and recovery, some records
may be reflected multiple times in the operator state.
    
    For example, if a user function counts the number of elements in a stream, this number
will equal to, or larger, than the actual number of elements in the stream, in the presence
of failure and recovery.
    
    This mode has minimal impact on latency and may be preferable in very-low latency scenarios,
where a sustained very-low latency (such as few milliseconds) is needed, and where occasional
duplicate messages (on recovery) do not matter.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink at_least_once_switch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #951
    
----
commit b089efa6ef688d61f41463d644768845393a913b
Author: Stephan Ewen <sewen@apache.org>
Date:   2015-07-29T12:12:42Z

    [FLINK-2407] [streaming] Add an API switch to choose between "exactly once" and "at least
once".

commit 71bd9c01aa24c9420766c9c79ff80618341b8e69
Author: Stephan Ewen <sewen@apache.org>
Date:   2015-07-29T12:49:23Z

    [hotfix] Code cleanups in the StreamConfig

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message