beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [28/50] [abbrv] incubator-beam git commit: Update Checkpoint Documentation
Date Wed, 06 Jul 2016 17:20:34 GMT
Update Checkpoint Documentation

Checkpoint finalization is best effort. A checkpoint that is committed
to durable state is permitted to be reused to start a reader, regardless
of if it is finalized.

Note that checkpoints which have an affect on their source (via
finalize) should generally require Deduping, due to the potential for
arbitrary checkpoint finalization failures.


Branch: refs/heads/runners-spark2
Commit: 89b22c88b7df353a7990cc36db8181a83d6b82a2
Parents: ef9d195
Author: Thomas Groh <>
Authored: Mon Jun 20 13:48:17 2016 -0700
Committer: Luke Cwik <>
Committed: Wed Jul 6 10:18:51 2016 -0700

 .../org/apache/beam/sdk/io/ | 26 +++++++++++---------
 1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/
index ea3004e..dded8e2 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/
@@ -91,6 +91,10 @@ public abstract class UnboundedSource<
    * <p>This is needed if the underlying data source can return the same record multiple
    * such a queuing system with a pull-ack model.  Sources where the records read are uniquely
    * identified by the persisted state in the CheckpointMark do not need this.
+   *
+   * <p>Generally, if {@link CheckpointMark#finalizeCheckpoint()} is overridden, this
method should
+   * return true. Checkpoint finalization is best-effort, and readers can be resumed from
+   * checkpoint that has not been finalized.
   public boolean requiresDeduping() {
     return false;
@@ -106,7 +110,7 @@ public abstract class UnboundedSource<
      * Called by the system to signal that this checkpoint mark has been committed along
      * all the records which have been read from the {@link UnboundedReader} since the
-     * previous finalized checkpoint was taken.
+     * previous checkpoint was taken.
      * <p>For example, this method could send acknowledgements to an external data
      * such as Pubsub.
@@ -120,15 +124,9 @@ public abstract class UnboundedSource<
      * {@link UnboundedReader} has not yet be finalized.
      * <li>In the absence of failures, all checkpoints will be finalized and they will
      * finalized in the same order they were taken from the {@link UnboundedReader}.
-     * <li>It is possible for a checkpoint to be taken but this method never called
-     * the checkpoint could not be committed for any reason.
-     * <li>If this call throws an exception then the entire checkpoint will be abandoned
and the
-     * reader restarted from an earlier, successfully-finalized checkpoint.
-     * <li>If a checkpoint fails for any reason then no later checkpoint will be allowed
to be
-     * finalized without the reader first being restarted.
-     * <li>If an {@link UnboundedReader} is restarted from an earlier checkpoint, the
-     * instance will be deserialized from the serialized form of the earlier checkpoint using
-     * {@link UnboundedSource#getCheckpointMarkCoder()}.
+     * <li>It is possible for a checkpoint to be taken but this method never called.
This method
+     * will never be called if the checkpoint could not be committed, and other failures
may cause
+     * this method to never be called.
      * <li>It is not safe to assume the {@link UnboundedReader} from which this checkpoint
      * created still exists at the time this method is called.
      * </ul>
@@ -230,8 +228,12 @@ public abstract class UnboundedSource<
      * <p>All elements read up until this method is called will be processed together
as a bundle.
      * (An element is considered 'read' if it could be returned by a call to {@link #getCurrent}.)
      * Once the result of processing those elements and the returned checkpoint have been
-     * committed, {@link CheckpointMark#finalizeCheckpoint} will (eventually) be called on
-     * returned {@link CheckpointMark} object.
+     * committed, {@link CheckpointMark#finalizeCheckpoint} will be called at most once at
+     * later point on the returned {@link CheckpointMark} object. Checkpoint finalization
+     * best-effort, and checkpoints may not be finalized. If duplicate elements may be produced
+     * checkpoints are not finalized in a timely manner, {@link UnboundedSource#requiresDeduping()}
+     * should be overridden to return true, and {@link UnboundedReader#getCurrentRecordId()}
+     * be overriden to return unique record IDs.
      * <p>The returned object should not be modified.

View raw message