hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3
Date Fri, 16 Dec 2016 17:49:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755057#comment-15755057

Steve Loughran commented on HADOOP-13786:

A few of us (Thomas, Pieter, Mingliang, Aaron and Sean) had a quick conf call earlier this
week, where Thomas and Pieter outlined their proposed algorithm for implementing zero-rename
commits to any consistent S3 endpoint.

I've written up [my interpretation of the algorithm|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delayed-put-commit.md]
for review, looking though the Hadoop and Spark commit code to see what appears to be going
on, though I do need to actually document the various algorithms better. 

Comments welcome, especially those containing proofs of correctness

> add output committer which uses s3guard for consistent commits to S3
> --------------------------------------------------------------------
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message