flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aljos...@apache.org
Subject flink git commit: [FLINK-6163] Document per-window state in ProcessWindowFunction
Date Fri, 10 Nov 2017 14:57:44 GMT
Repository: flink
Updated Branches:
  refs/heads/release-1.4 5f992e8de -> da435f121

[FLINK-6163] Document per-window state in ProcessWindowFunction

Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/da435f12
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/da435f12
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/da435f12

Branch: refs/heads/release-1.4
Commit: da435f121821fd1107c41352a54ee804f10cf7e3
Parents: 5f992e8
Author: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Authored: Fri Nov 10 10:54:16 2017 +0100
Committer: Aljoscha Krettek <aljoscha.krettek@gmail.com>
Committed: Fri Nov 10 15:56:13 2017 +0100

 docs/dev/stream/operators/windows.md | 32 +++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/docs/dev/stream/operators/windows.md b/docs/dev/stream/operators/windows.md
index 7966ec8..3c0cd85 100644
--- a/docs/dev/stream/operators/windows.md
+++ b/docs/dev/stream/operators/windows.md
@@ -978,6 +978,38 @@ input
+### Using per-window state in ProcessWindowFunction
+In addition to accessing keyed state (as any rich function can) a `ProcessWindowFunction`
+also use keyed state that is scoped to the window that the function is currently processing.
In this
+context it is important to understand what the window that *per-window* state is referring
to is.
+There are different "windows" involved:
+ - The window that was defined when specifying the windowed operation: This might be *tumbling
+ windows of 1 hour* or *sliding windows of 2 hours that slide by 1 hour*.
+ - An actual instance of a defined window for a given key: This might be *time window from
+ to 13:00 for user-id xyz*. This is based on the window definition and there will be many
+ based on the number of keys that the job is currently processing and based on what time
+ the events fall into.
+Per-window state is tied to the latter of those two. Meaning that if we process events for
+different keys and events for all of them currently fall into the *[12:00, 13:00)* time window
+then there will be 1000 window instances that each have their own keyed per-window state.
+There are two methods on the `Context` object that a `process()` invocation receives that
+access two the two types of state:
+ - `globalState()`, which allows access to keyed state that is not scoped to a window
+ - `windowState()`, which allows access to keyed state that is also scoped to the window
+This feature is helpful if you anticipate multiple firing for the same window, as can happen
+you have late firings for data that arrives late or when you have a custom trigger that does
+speculative early firings. In such a case you would store information about previous firings
+the number of firings in per-window state.
+When using windowed state it is important to also clean up that state when a window is cleared.
+should happen in the `clear()` method.
 ### WindowFunction (Legacy)
 In some places where a `ProcessWindowFunction` can be used you can also use a `WindowFunction`.

View raw message