hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints
Date Mon, 13 Mar 2017 21:58:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923030#comment-15923030
] 

Steve Loughran edited comment on HADOOP-13786 at 3/13/17 9:58 PM:
------------------------------------------------------------------

patch 012:
* UUID reinstated by default; you can turn this off.
* mostly all the unit tests are passing, 
* the mocks in {{TestStagingPartitionedJobCommit}}, {{TestStagingDirectoryOutputCommitter}}
 are having an unexpected delete in mock invocations (good? bad?), and because the staging
committer doesn't yet handle directories, some of the protocol tests are failing.
* Added {{AbstractITCommitProtocol}} subclasses for the directory and partition committers
* Those in the protocol IT tests which are failing on the job/commit fail are failing as the
tests aren't looking for the right exception.

One thing to highlight here is that when running these tests from my desktop, the staging
commits seem to be faster than the magic ones. Why? A lot less S3 communication over a long
haul link during task setup/commit, and there's no real data to upload, so the cost delaying
the upload until the task commit is negligible.


was (Author: stevel@apache.org):
patch 012:  
* mostly all the tests are passing, 
* the mocks in {{TestStagingPartitionedJobCommit}}, {{TestStagingDirectoryOutputCommitter}}
 are having an unexpected delete in mock invocations (good? bad?), and because the staging
committer doesn't yet handle directories, some of the protocol tests are failing.

One thing to highlight here is that when running these tests from my desktop, the staging
commits seem to be faster than the magic ones. Why? A lot less S3 communication over a long
haul link during task setup/commit, and there's no real data to upload, so the cost delaying
the upload until the task commit is negligible.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch,
HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch,
HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch,
HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch,
HADOOP-13786-HADOOP-13345-012.patch, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message