Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2C44D200D19 for ; Fri, 6 Oct 2017 10:31:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2AC571609D0; Fri, 6 Oct 2017 08:31:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 70D521609DF for ; Fri, 6 Oct 2017 10:31:08 +0200 (CEST) Received: (qmail 96774 invoked by uid 500); 6 Oct 2017 08:31:07 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 96765 invoked by uid 99); 6 Oct 2017 08:31:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Oct 2017 08:31:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E7C3FCBAC8 for ; Fri, 6 Oct 2017 08:31:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Qj9VPsn9jRXm for ; Fri, 6 Oct 2017 08:31:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 379D65FD8E for ; Fri, 6 Oct 2017 08:31:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 30CC9E0F10 for ; Fri, 6 Oct 2017 08:31:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8D2482417D for ; Fri, 6 Oct 2017 08:31:03 +0000 (UTC) Date: Fri, 6 Oct 2017 08:31:03 +0000 (UTC) From: "Kostas Kloudas (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (FLINK-7771) Make the operator state queryable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 06 Oct 2017 08:31:09 -0000 Kostas Kloudas created FLINK-7771: ------------------------------------- Summary: Make the operator state queryable Key: FLINK-7771 URL: https://issues.apache.org/jira/browse/FLINK-7771 Project: Flink Issue Type: Improvement Components: Queryable State Affects Versions: 1.4.0 Reporter: Kostas Kloudas Assignee: Kostas Kloudas Fix For: 1.4.0 There seem to be some requests for making the operator (non-keyed) state queryable. This means that the user will specify the *uuid* of the operator and the *taskId*, and he will be able to access the state that corresponds to that operator and for that specific task. This issue will serve to document the discussion on the topic, so that everybody can participate. Personally, I think that such a feature should wait until some things on state handling are stabilized (_e.g._ replication and checkpoint management). My main concerns have to do with the semantics and guarantees that such a feature could offer *for now*. At first, operator state is essentially a list state that can be reshuffled arbitrarily upon restoring or rescaling. This means that task1 will have at a given execution attempt elements _A,B,C_ while after restoring (even without rescaling) it may have _D,B,E_ without this implying that something happened to states _A_ and _C_. They were simply assigned to another task. This makes it hard to reason about the results that you get at any point in time, as it provides *no locality/consistency guarantees between executions*. The above, in combination with the fact that (for now) it is not possible to query the state at a specific point in time (_e.g._ the last checkpointed state), means that there is no easy way to get a consistent view of the state of an operator. So in the example above, when querying _(operatorA, task1)_ and _(operatorA, task2)_, the user can get states belonging to different "points in time" which can result to duplicates, lost values and all the problems encountered in distributed systems when there are no consistency guarantees. The above illustrates some of the consistency problems that such a feature could face now. I also link [~till.rohrmann] and [~skonto] as he also mentioned that this feature could be helpful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)