flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From se...@apache.org
Subject [4/9] flink git commit: [hotfix] [docs] Move section about internal snapshot implementation from 'state_backends.md' to 'stream_checkpointing.md'
Date Wed, 11 Jan 2017 20:28:25 GMT
[hotfix] [docs] Move section about internal snapshot implementation from 'state_backends.md'
to 'stream_checkpointing.md'


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/d4a23be7
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/d4a23be7
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/d4a23be7

Branch: refs/heads/master
Commit: d4a23be7d296476db8dfb79636389dede1b383cd
Parents: 455c7a1
Author: Stephan Ewen <sewen@apache.org>
Authored: Tue Jan 10 09:55:25 2017 +0100
Committer: Stephan Ewen <sewen@apache.org>
Committed: Wed Jan 11 21:12:27 2017 +0100

----------------------------------------------------------------------
 docs/internals/state_backends.md       | 12 ------------
 docs/internals/stream_checkpointing.md | 14 +++++++++++++-
 2 files changed, 13 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/d4a23be7/docs/internals/state_backends.md
----------------------------------------------------------------------
diff --git a/docs/internals/state_backends.md b/docs/internals/state_backends.md
index f6a4cc7..11d46ed 100644
--- a/docs/internals/state_backends.md
+++ b/docs/internals/state_backends.md
@@ -69,15 +69,3 @@ Examples are "ValueState", "ListState", etc. Flink's runtime encodes the
states
 *Raw State* is state that users and operators keep in their own data structures. When checkpointed,
they only write a sequence of bytes into
 the checkpoint. Flink knows nothing about the state's data structures and sees only the raw
bytes.
 
-
-## Checkpointing Procedure
-
-When operator snapshots are taken, there are two parts: the **synchronous** and the **asynchronous**
parts.
-
-Operators and state backends provide their snapshots as a Java `FutureTask`. That task contains
the state where the *synchronous* part
-is completed and the *asynchronous* part is pending. The asynchronous part is then executed
by a background thread for that checkpoint.
-
-Operators that checkpoint purely synchronously return an already completed `FutureTask`.
-If an asynchronous operation needs to be performed, it is executed in the `run()` method
of that `FutureTask`.
-
-The tasks are cancelable, in order to release streams and other resource consuming handles.

http://git-wip-us.apache.org/repos/asf/flink/blob/d4a23be7/docs/internals/stream_checkpointing.md
----------------------------------------------------------------------
diff --git a/docs/internals/stream_checkpointing.md b/docs/internals/stream_checkpointing.md
index 75493ca..e8b3e46 100644
--- a/docs/internals/stream_checkpointing.md
+++ b/docs/internals/stream_checkpointing.md
@@ -138,7 +138,7 @@ in *at least once* mode.
 
 Note that the above described mechanism implies that operators stop processing input records
while they are storing a snapshot of their state in the *state backend*. This *synchronous*
state snapshot introduces a delay every time a snapshot is taken.
 
-It is possible to let an operator continue processing while it stores its state snapshot,
effectively letting the state snapshots happen *asynchronously* in the background. To do that,
the operator must be able to produce a state object that should be stored in a way such that
further modifications to the operator state do not affect that state object.
+It is possible to let an operator continue processing while it stores its state snapshot,
effectively letting the state snapshots happen *asynchronously* in the background. To do that,
the operator must be able to produce a state object that should be stored in a way such that
further modifications to the operator state do not affect that state object. An example for
that are *copy-on-write* style data structures, such as used for example in RocksDB.
 
 After receiving the checkpoint barriers on its inputs, the operator starts the asynchronous
snapshot copying of its state. It immediately emits the barrier to its outputs and continues
with the regular stream processing. Once the background copy process has completed, it acknowledges
the checkpoint to the checkpoint coordinator (the JobManager). The checkpoint is now only
complete after all sinks received the barriers and all stateful operators acknowledged their
completed backup (which may be later than the barriers reaching the sinks).
 
@@ -152,3 +152,15 @@ entire distributed dataflow, and gives each operator the state that was
snapshot
 stream from position <i>S<sub>k</sub></i>. For example in Apache
Kafka, that means telling the consumer to start fetching from offset <i>S<sub>k</sub></i>.
 
 If state was snapshotted incrementally, the operators start with the state of the latest
full snapshot and then apply a series of incremental snapshot updates to that state.
+
+## Operator Snapshot Implementation
+
+When operator snapshots are taken, there are two parts: the **synchronous** and the **asynchronous**
parts.
+
+Operators and state backends provide their snapshots as a Java `FutureTask`. That task contains
the state where the *synchronous* part
+is completed and the *asynchronous* part is pending. The asynchronous part is then executed
by a background thread for that checkpoint.
+
+Operators that checkpoint purely synchronously return an already completed `FutureTask`.
+If an asynchronous operation needs to be performed, it is executed in the `run()` method
of that `FutureTask`.
+
+The tasks are cancelable, in order to release streams and other resource consuming handles.


Mime
View raw message