flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5788) Document assumptions about File Systems and persistence
Date Mon, 13 Feb 2017 21:33:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864489#comment-15864489

ASF GitHub Bot commented on FLINK-5788:

GitHub user StephanEwen opened a pull request:


    [FLINK-5788] [docs] Improve documentation of FileSystem and spell out the data persistence

    This writes down the contract that the Flink `FileSystem` and `FSDataOutputStream` implementations
have to adhere to in order to support proper consistency and failure recovery. The contract
has so far been only implicitly defined and adhered to by the checkpointing and high-availability
    ## Contract
    Data written to an `FSDataOutputStream` created from a `FileSystem` is considered persistent,
if two requirements are met:
      1. **Visibility Requirement:** It must be guaranteed that all other processes, machines,
         virtual machines, containers, etc. that are able to access the file see the data
         when given the absolute file path. This requirement is similar to the *close-to-open*
         semantics defined by POSIX, but restricted to the file itself (by its absolute path).
      2. **Durability Requirement:** The file system's specific durability/persistence requirements
         must be met. These are specific to the particular file system. For example the
         `LocalFileSystem` does not provide any durability guarantees for crashes of both
         hardware and operating system, while replicated distributed file systems (like HDFS)
         guarantee typically durability in the presence of up to concurrent failure or *n*
         nodes, where *n* is the replication factor.
    Updates to the file's parent directory (such as that the file shows up when listing the
directory contents) are not required to be complete for the data in the file stream to be
considered persistent. This relaxation is important for file systems where updates to directory
contents are only eventually consistent (like S3).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink filesystem_docs

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3301


> Document assumptions about File Systems and persistence
> -------------------------------------------------------
>                 Key: FLINK-5788
>                 URL: https://issues.apache.org/jira/browse/FLINK-5788
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
> We should add some description about the assumptions we make for the behavior of {{FileSystem}}
implementations to support proper checkpointing and recovery operations.
> This is especially critical for file systems like {{S3}} with a somewhat tricky contract.

This message was sent by Atlassian JIRA

View raw message