Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 13534200B50 for ; Sat, 13 Aug 2016 17:57:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 11AED160AA6; Sat, 13 Aug 2016 15:57:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 55968160A73 for ; Sat, 13 Aug 2016 17:57:21 +0200 (CEST) Received: (qmail 50425 invoked by uid 500); 13 Aug 2016 15:57:20 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 50411 invoked by uid 99); 13 Aug 2016 15:57:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Aug 2016 15:57:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6CA062C0003 for ; Sat, 13 Aug 2016 15:57:20 +0000 (UTC) Date: Sat, 13 Aug 2016 15:57:20 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-4340) Remove RocksDB Semi-Async Checkpoint Mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 13 Aug 2016 15:57:22 -0000 [ https://issues.apache.org/jira/browse/FLINK-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419978#comment-15419978 ] ASF GitHub Bot commented on FLINK-4340: --------------------------------------- Github user wenlong88 commented on the issue: https://github.com/apache/flink/pull/2345 @StephanEwen you are right. But in specific situation, we may need some temporary compromise do make the system work well, and then remove the compromised points latter as soon as possible . I think both approaches have shortcomes. When the state is large such as millions of KVs per db, full async approach can do the full async backup, but will cost a lot of time to restore which may be intolerable while doing fail-over in production. So I think it is necessary to have both, and the full async can be the default option. Considering that there is no really perfect solution yet, I think It is OK to remove the semi-async way right now to avoid blocking the job of key group but need to reintroduce a better solution latter soon if you agree that rocksdb is quite a good choice of statebackend in large state situations. Regrading to the overhead of memory in different dbs. Rocksdb can share the same block cache for different db instance but I don't know how to reduce the cost of memtables which is also a problem existed in current solution that allowing to store different stats in a single db using column families since memtables of column families are also separated. > Remove RocksDB Semi-Async Checkpoint Mode > ----------------------------------------- > > Key: FLINK-4340 > URL: https://issues.apache.org/jira/browse/FLINK-4340 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Affects Versions: 1.1.0 > Reporter: Aljoscha Krettek > Assignee: Aljoscha Krettek > > This seems to be causing to many problems and is also incompatible with the upcoming key-group/sharding changes that will allow rescaling of keyed state. > Once this is done we can also close FLINK-4228. -- This message was sent by Atlassian JIRA (v6.3.4#6332)