Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5F439200C1F for ; Sat, 18 Feb 2017 17:54:19 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5DC26160B71; Sat, 18 Feb 2017 16:54:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 820A9160B66 for ; Sat, 18 Feb 2017 17:54:18 +0100 (CET) Received: (qmail 97235 invoked by uid 500); 18 Feb 2017 16:54:17 -0000 Mailing-List: contact commits-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list commits@flink.apache.org Received: (qmail 97226 invoked by uid 99); 18 Feb 2017 16:54:17 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Feb 2017 16:54:17 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 9B46DDFC63; Sat, 18 Feb 2017 16:54:17 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: uce@apache.org To: commits@flink.apache.org Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: flink git commit: [FLINK-5837][docs] improving readability of the queryable state docs Date: Sat, 18 Feb 2017 16:54:17 +0000 (UTC) archived-at: Sat, 18 Feb 2017 16:54:19 -0000 Repository: flink Updated Branches: refs/heads/release-1.2 451fe851e -> b21f9d11d [FLINK-5837][docs] improving readability of the queryable state docs Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/b21f9d11 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/b21f9d11 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/b21f9d11 Branch: refs/heads/release-1.2 Commit: b21f9d11dd2d1230402e70ec446c473ce186c21e Parents: 451fe85 Author: David Anderson Authored: Fri Feb 17 17:46:10 2017 +0100 Committer: Ufuk Celebi Committed: Sat Feb 18 17:54:12 2017 +0100 ---------------------------------------------------------------------- docs/dev/stream/queryable_state.md | 46 ++++++++++++++++----------------- 1 file changed, 23 insertions(+), 23 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/b21f9d11/docs/dev/stream/queryable_state.md ---------------------------------------------------------------------- diff --git a/docs/dev/stream/queryable_state.md b/docs/dev/stream/queryable_state.md index 7d337dc..4a63856 100644 --- a/docs/dev/stream/queryable_state.md +++ b/docs/dev/stream/queryable_state.md @@ -41,23 +41,24 @@ bottleneck in practice. Attention: Queryable state accesses keyed state from a concurrent thread rather than synchronizing with the operator and potentially blocking its operation. Since any state backend using Java heap space, e.g. MemoryStateBackend or - FsStateBackend, does not work with copies when retrieving values but instead the - references to the stored values, read-modify-write patterns are unsafe and may cause the + FsStateBackend, does not work with copies when retrieving values but instead directly + references the stored values, read-modify-write patterns are unsafe and may cause the queryable state server to fail due to concurrent modifications. The RocksDBStateBackend is safe from these issues. ## Making State Queryable -In order to make state queryable, first, the queryable state server needs to be enabled globally +In order to make state queryable, the queryable state server first needs to be enabled globally by setting the `query.server.enable` configuration parameter to `true` (current default). -Then, appropriate state needs to be made queryable by either -* a convenience `QueryableStateStream` which behaves like a sink and offers incoming values as +Then appropriate state needs to be made queryable by using either + +* a `QueryableStateStream`, a convenience object which behaves like a sink and offers incoming values as queryable state, or -* using `StateDescriptor#setQueryable(String queryableStateName)` for making keyed state of an +* `StateDescriptor#setQueryable(String queryableStateName)`, which makes the keyed state of an operator queryable. -The following sections explain the use of these two. +The following sections explain the use of these two approaches. ### Queryable State Stream @@ -85,18 +86,18 @@ QueryableStateStream asQueryableState(
- Note: There is no queryable list state sink as it would result in an ever-growing + Note: There is no queryable ListState sink as it would result in an ever-growing list which may not be cleaned up and thus will eventually consume too much memory.
A call to these methods returns a `QueryableStateStream`, which cannot be further transformed and currently only holds the name as well as the value and key serializer for the queryable state -stream. It is comparable to a sink, after which you cannot do further transformations. +stream. It is comparable to a sink, and cannot be followed by further transformations. -Internally, the `QueryableStateStream` gets translated to an operator, which uses all incoming +Internally a `QueryableStateStream` gets translated to an operator which uses all incoming records to update the queryable state instance. In a program like the following, all records of the keyed stream will be used to update the state -instance, i.e. either via `ValueState#update(value)` or `AppendingState#add(value)` depending on +instance, either via `ValueState#update(value)` or `AppendingState#add(value)`, depending on the chosen state variant: {% highlight java %} stream.keyBy(0).asQueryableState("query-name") @@ -107,8 +108,8 @@ This acts like the Scala API's `flatMapWithState`. Managed keyed state of an operator (see [Using Managed Keyed State]({{ site.baseurl }}/dev/stream/state.html#using-managed-keyed-state)) -can be made queryable by setting the appropriate state descriptor queryable via -`StateDescriptor#setQueryable(String queryableStateName)` as in the example below. +can be made queryable by making the appropriate state descriptor queryable via +`StateDescriptor#setQueryable(String queryableStateName)`, as in the example below: {% highlight java %} ValueStateDescriptor> descriptor = new ValueStateDescriptor<>( @@ -168,7 +169,7 @@ There are some serialization utils for key/namespace and value serialization inc ## Example -The following example extends the `CountWindowAverage` example from +The following example extends the `CountWindowAverage` example (see [Using Managed Keyed State]({{ site.baseurl }}/dev/stream/state.html#using-managed-keyed-state)) by making it queryable and showing how to query this value: @@ -203,8 +204,7 @@ public class CountWindowAverage extends RichFlatMapFunction, } {% endhighlight %} -Once used on a job, retrieve the job ID and query any key's current state of this operator via -(for any `Long key`): +Once used in a job, you can retrieve the job ID and then query any key's current state from this operator: {% highlight java %} final Configuration config = new Configuration(); @@ -236,12 +236,12 @@ Tuple2 value = ## Configuration -The following configuration parameters influence the queryable state server's and client's -behaviour. They are defined in `QueryableStateOptions`. +The following configuration parameters influence the behaviour of the queryable state server and client. +They are defined in `QueryableStateOptions`. ### Server * `query.server.enable`: flag to indicate whether to start the queryable state server -* `query.server.port`: port to bind internal `KvStateServer` to (0 => pick random available port) +* `query.server.port`: port to bind to the internal `KvStateServer` (0 => pick random available port) * `query.server.network-threads`: number of network (event loop) threads for the `KvStateServer` (0 => #slots) * `query.server.query-threads`: number of asynchronous query threads for the `KvStateServerHandler` (0 => #slots). @@ -253,11 +253,11 @@ behaviour. They are defined in `QueryableStateOptions`. ## Limitations * The queryable state life-cycle is bound to the life-cycle of the job, e.g. tasks register -queryable state on startup and unregister it on dispose. In future versions, it is desirable to -decouple this in order to allow queries after a task finishes and to speed up recovery via state +queryable state on startup and unregister it on disposal. In future versions, it is desirable to +decouple this in order to allow queries after a task finishes, and to speed up recovery via state replication. -* Notifications about available KvState happen via a simple tell. This should be improved to be -more robust with asks and acknowledgements in future. +* Notifications about available KvState happen via a simple tell. In the future this should be improved to be +more robust with asks and acknowledgements. * The server and client keep track of statistics for queries. These are currently disabled by default as they would not be exposed anywhere. As soon as there is better support to publish these numbers via the Metrics system, we should enable the stats.