hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3
Date Fri, 16 Dec 2016 18:29:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Steve Loughran updated HADOOP-13786:
    Attachment: HADOOP-13786-HADOOP-13345-001.patch

Patch 001

# modified FileOutputFormat to allow the committer to be chosen (as MRv1 did). Here's actually
a factory which can be defined, so that you can have a committer factory which chooses the
committer based on destination FS.
# There's an S3 factory, and a committer.
# The committer has some special methods in the FS API allowing it to bypass a lot of the
checks before operations which the full Hadoop FS API requires. Here we can assume that the
committer knows that the destination isn't a directory, doesn't want a mock parent directory
created after deleting a path, etc.
# {{ITestS3AOutputCommitter}} is a clone of {{TestFileOutputCommitter}}, reworked to be against
S3. Note it could be used as a basis for testing commits to other filesystems; the basic one
assumes local FS.

Some of the new tests are failing; I haven't completely weaned to the new tests off file://
and into being able to simulate different failures of (a subclass of) s3.

> add output committer which uses s3guard for consistent commits to S3
> --------------------------------------------------------------------
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message