hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints
Date Mon, 06 Mar 2017 13:22:33 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Steve Loughran updated HADOOP-13786:
    Attachment: HADOOP-13786-HADOOP-13345-007.patch

Patch 007; (limited) tests now working.

I'm changing direction slightly here and working on making the first committer a derivative
of the [Netflix Committer|https://github.com/rdblue/s3committer]. This stages to the local
filesystem, then, in task commit, uploads the generated files as the multipart PUT; co-ordination
information is persisted via HDFS. While this appears to add some complexity to the writing
process, it avoids "magic" in the filesystem, and, by using HDFS, doesn't need dynamo DB.

What it also adds is: actual use in production, along with minicluster tests. Production use
is going to mean that resilience to failures and odd execution orderings are more likely to
have been addressed; with my own committer I'd be relearning how things fail.

Accordingly, I think it'd be more likely to be ready for use.

Patch 007 doesn't include any of that, it's the "before" patch. 

I'm now merging in the netflix code, using S3A and the WriteOperationHelper as the means of
talking to S3. Their code is ASF licensed, but the copyright headers still say Netflix...we
need it to be added to this JIRA as a patch before we could think about committing to the
ASF codebase. In the meantime, I'll work on it locally

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch,
HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch,
HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message