hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints
Date Tue, 14 Mar 2017 21:33:42 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-HADOOP-13345-013.patch

HADOOP-13786 patch 013.
  
h3. All mock tests are working.

More specifically, they were got working, then the commit logic tuned to reduce the number
of S3 calls 
* no {{exists(path)}} check before a {{delete()}} if the policy is replace
* no checking for return code of {{delete()}}, as we know it never signals an error, merely
that the destination path didn't exist at the time of call.

The tests have also been tuned to be a bit more explicit about what they are declaring and
asserting; less repetition in mock object setup.

Also: ability to turn up logging of mock operations, including stack trace level of invocation.
Useful to work out things like why more {{delete()}} calls are made than expected.

h3. Most of the committer IT tests are working

Everything is working except  the IT protocol tests {{testMapFileOutputCommitter}} and {{testConcurrentCommitTaskWithSubDir}},
which expect directories to be handled. I will do that at least for the Directory Committer,
with some mock tests as well as a fixed IT test, and skip them in the partition committer

h3. {{LambdaTestUtils}} tuning

Ryan's patch had some {{assertThrown()}} assertions which I've been moving to the common test
base.

While {{intercept()}} has some features {{assertThrown()}} lacked, it doesn't support handing
down extra diagnostics messages. Fixed. We could go one step further and allow callers to
provide a closure {{() -> String}} for diagnostics, perhaps, though maybe we can wait to
see what JUnit 5 has first

h3. TODO
* directory trees in the directory committer
* move to direct API calls on S3A
*  when s3guard is enabled, make sure PUT commits are updating the entire metastore tree.


> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch,
HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch,
HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch,
HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch,
HADOOP-13786-HADOOP-13345-012.patch, HADOOP-13786-HADOOP-13345-013.patch, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message