Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DEC89200C1A for ; Mon, 13 Feb 2017 22:33:46 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DD6F6160B4A; Mon, 13 Feb 2017 21:33:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3115D160B60 for ; Mon, 13 Feb 2017 22:33:46 +0100 (CET) Received: (qmail 68272 invoked by uid 500); 13 Feb 2017 21:33:45 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 68263 invoked by uid 99); 13 Feb 2017 21:33:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Feb 2017 21:33:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C7CF6C05DF for ; Mon, 13 Feb 2017 21:33:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id eyy13LI6ftEe for ; Mon, 13 Feb 2017 21:33:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5FC6E5FC4A for ; Mon, 13 Feb 2017 21:33:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 74DB2E062D for ; Mon, 13 Feb 2017 21:33:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id CDD5121D6A for ; Mon, 13 Feb 2017 21:33:41 +0000 (UTC) Date: Mon, 13 Feb 2017 21:33:41 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-5788) Document assumptions about File Systems and persistence MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 13 Feb 2017 21:33:47 -0000 [ https://issues.apache.org/jira/browse/FLINK-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864489#comment-15864489 ] ASF GitHub Bot commented on FLINK-5788: --------------------------------------- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/3301 [FLINK-5788] [docs] Improve documentation of FileSystem and spell out the data persistence contract This writes down the contract that the Flink `FileSystem` and `FSDataOutputStream` implementations have to adhere to in order to support proper consistency and failure recovery. The contract has so far been only implicitly defined and adhered to by the checkpointing and high-availability code. ## Contract Data written to an `FSDataOutputStream` created from a `FileSystem` is considered persistent, if two requirements are met: 1. **Visibility Requirement:** It must be guaranteed that all other processes, machines, virtual machines, containers, etc. that are able to access the file see the data consistently when given the absolute file path. This requirement is similar to the *close-to-open* semantics defined by POSIX, but restricted to the file itself (by its absolute path). 2. **Durability Requirement:** The file system's specific durability/persistence requirements must be met. These are specific to the particular file system. For example the `LocalFileSystem` does not provide any durability guarantees for crashes of both hardware and operating system, while replicated distributed file systems (like HDFS) guarantee typically durability in the presence of up to concurrent failure or *n* nodes, where *n* is the replication factor. Updates to the file's parent directory (such as that the file shows up when listing the directory contents) are not required to be complete for the data in the file stream to be considered persistent. This relaxation is important for file systems where updates to directory contents are only eventually consistent (like S3). You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink filesystem_docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3301.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3301 ---- ---- > Document assumptions about File Systems and persistence > ------------------------------------------------------- > > Key: FLINK-5788 > URL: https://issues.apache.org/jira/browse/FLINK-5788 > Project: Flink > Issue Type: Improvement > Components: Documentation > Reporter: Stephan Ewen > Assignee: Stephan Ewen > Fix For: 1.3.0 > > > We should add some description about the assumptions we make for the behavior of {{FileSystem}} implementations to support proper checkpointing and recovery operations. > This is especially critical for file systems like {{S3}} with a somewhat tricky contract. -- This message was sent by Atlassian JIRA (v6.3.15#6346)