Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EB593200C36 for ; Fri, 10 Mar 2017 13:45:11 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E9EB2160B79; Fri, 10 Mar 2017 12:45:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3D9AC160B69 for ; Fri, 10 Mar 2017 13:45:11 +0100 (CET) Received: (qmail 3363 invoked by uid 500); 10 Mar 2017 12:45:10 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 3354 invoked by uid 99); 10 Mar 2017 12:45:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Mar 2017 12:45:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CFBB21A072D for ; Fri, 10 Mar 2017 12:45:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id twNFwYv9Dqn1 for ; Fri, 10 Mar 2017 12:45:08 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 42B025FBE2 for ; Fri, 10 Mar 2017 12:45:08 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 360C4E0AE8 for ; Fri, 10 Mar 2017 12:45:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 07EA2243BA for ; Fri, 10 Mar 2017 12:45:05 +0000 (UTC) Date: Fri, 10 Mar 2017 12:45:05 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-5715) Asynchronous snapshotting for HeapKeyedStateBackend MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 10 Mar 2017 12:45:12 -0000 [ https://issues.apache.org/jira/browse/FLINK-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905007#comment-15905007 ] ASF GitHub Bot commented on FLINK-5715: --------------------------------------- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/3466#discussion_r105389424 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsStateBackend.java --- @@ -97,6 +100,27 @@ public FsStateBackend(String checkpointDataUri) throws IOException { * * @param checkpointDataUri The URI describing the filesystem (scheme and optionally authority), * and the path to the checkpoint data directory. + * @param asynchronousSnapshots Switch to enable asynchronous snapshots. + * + * @throws IOException Thrown, if no file system can be found for the scheme in the URI. + */ + public FsStateBackend(String checkpointDataUri, boolean asynchronousSnapshots) throws IOException { --- End diff -- We are getting one more parameter into the constructors with the change makes the state backend handle all checkpoint/savepoint storage related business. That must be constructor parameter, so if we can avoid further constructor parameters, that would help. Otherwise we really end up with 20 constructors. > Asynchronous snapshotting for HeapKeyedStateBackend > --------------------------------------------------- > > Key: FLINK-5715 > URL: https://issues.apache.org/jira/browse/FLINK-5715 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing > Affects Versions: 1.3.0 > Reporter: Stefan Richter > Assignee: Stefan Richter > > Blocking snapshots render the HeapKeyedStateBackend practically unusable for many user in productions. Their jobs can not tolerate stopped processing for the time it takes to write gigabytes of data from memory to disk. Asynchronous snapshots would be a solution to this problem. The challenge for the implementation is coming up with a copy-on-write scheme for the in-memory hash maps that build the foundation of this backend. After taking a closer look, this problem is twofold. First, providing CoW semantics for the hashmap itself, as a mutible structure, thereby avoiding costly locking or blocking where possible. Second, CoW for the mutable value objects, e.g. through cloning via serializers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)