hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13786) Add S3Guard committer for zero-rename commits to S3 endpoints
Date Tue, 17 Oct 2017 14:45:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207733#comment-16207733

Steve Loughran commented on HADOOP-13786:

I recognise the fear of long-lived branches, and we  can probably get away with incremental
s3guard stuff from now now. This committer is mostly done apart from some docs & tuning
of related bits. The sole current diff since this patch and my local source is a new {{AWSStatus500Exception}}
and treating a 500 response as retriable, even on ops considered non-idempotent, and some
more docs on mapreduce task commit/abort.

For review, you can split off looking at 

# the retry logic: {{Invoker}}, {{S3ARetryPolicy}} & how that's being used to wrap operations
in S3AFileSystem, a refactored WriteOperationsHelper and DynamoDBMetadataStore. Is the model
write (retry-around-closures), is the retry policy good, and is it being used correctly, 
# Changes to S3ABlockOutputStream to let us control whether its delayed complete or not, &
changes to S3AFS to recognise the special paths so switch policy
# CommitOperations: the underlying integration with the FS to save/restore lists of PUTs to
complete, operations to commit them

Finally, the committer, looking at {{AbstractS3GuardCommitter}} , {{StagingCommitter}} (Ryan's),
and {{MagicS3GuardCommitter}} which is the one using the special output streams. They all
use the same bindings to the FS and JSON file formats, so differ in: where work goes, how
the commit metadata is passed to the job committer. And for the staging committer, the conflict
policies of the two public implementations, :Directory and Partitioned.

The test {{AbstractITCommitProtocol}} is the one which pushes the commit protocol through
its lifecycle, trying to recreate the valid & failure workflows. That's inevitably where
there's scope to cover all the corner cases...I think I'll look again at speculation there.

Finally, new docs, including one on [committer architecture|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md].
That covers what the MR commit protocol is, which is something you need to understand before
looking at the commit internals. That doc is probably the most complete discussion on the
topic there is, and even it avoids bits I don't understand (Preemption)

I can give  talk on this stuff on wednesday or thursday AM PST if people want

> Add S3Guard committer for zero-rename commits to S3 endpoints
> -------------------------------------------------------------
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-036.patch, HADOOP-13786-037.patch, HADOOP-13786-038.patch,
HADOOP-13786-039.patch, HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch,
HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch,
HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch,
HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch,
HADOOP-13786-HADOOP-13345-012.patch, HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch,
HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch, HADOOP-13786-HADOOP-13345-018.patch,
HADOOP-13786-HADOOP-13345-019.patch, HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch,
HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch, HADOOP-13786-HADOOP-13345-024.patch,
HADOOP-13786-HADOOP-13345-025.patch, HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch,
HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-029.patch,
HADOOP-13786-HADOOP-13345-030.patch, HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch,
HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch, cloud-intergration-test-failure.log,
objectstore.pdf, s3committer-master.zip
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message