hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints
Date Thu, 02 Feb 2017 19:56:52 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-HADOOP-13345-004.patch

Patch 004

this is passing the tests in a suite derived from {{org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter}};still
looking at ways to simulate failure conditions and semantics of failure we want.

Essentially: once a pending commit has happened, there is no *retry*. Meaning: when a task
has committed once, it should fail from then on, which it does with an FNFE on the task attempt
dir.

Similarly you can only commit a job once, even if all the job does is delete all child directories.

One change in this patch is the need to support pending subtrees, eg. map output to the directory
part-0000/index and part-0000/data in the destination dir; this has been done by adding the
notion of a {{__base}} path element in the pending tree. When a {{__base}} path is a parent.
the destination path is the parent of the __pending dir, with all children under {{__base}}
retained. With each task attempt dir being {{dest/__pending/$app/$app-attempt/$task_attempt/__base}},
this ensures that all data created in the task working dir ends up under the destination in
the same directory tree.

issues:

* what about cleaning up __pending? Job commit?
* need to stop someone creating a path {{__base/__pending}} and so sneak in pending stuff/get
very confused. Actually, stop __pending under __pending.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch,
HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures".
Implement it, including whatever is needed to demonstrate the correctness of the algorithm.
(that is, assuming that s3guard provides a consistent view of the presence/absence of blobs,
show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie.
not visible until the close()), if we need to use that to allow us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message