hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct
Date Tue, 12 Dec 2017 15:43:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287767#comment-16287767
] 

Steve Loughran commented on HADOOP-15107:
-----------------------------------------

I think I'd like to have the option of being able to set an x-committed header on created
files; this would declare the job/task ID at the time the MPU was created.

This would allow us to examine the list of files in a destination, and assert that all objects
were created from a successful task attempt which we know of, not an unsuccessful one. 

We could also have the _SUMMARY file include a list of successful tasks
* add a task attempt ID to the .pending & pendingset files (the single commit has a task
ID, but I want the set: job, task, task attempt)
* job commit to build list of committed tasks as it reads the set of .pendingset files.
* we can add an optional post-commit check to verify all new files have the header which matches
their entries; that there were no files committed whose provenance was elsewhere.
* and of course, gives you some diagnostics when backtracking the provenance of stuff.

This is something which can be used in integration testing. In production, too, maybe.

Risks:
* leaks a bit of information
* uses up a header. Maybe:have a generic taskInfo header which can be extended to contain
a bit more than just task attempt ID


ps: Reviewed the magic committer. Confirmed: a committed file the .pendingset file to the
$jobAttemptDir/$taskId.pendingset with overwrite=false. So: >1 taskattempt may commit.

With overwrite=false, its actually the first which wins; there's a tiny window of a risk of
overlap.

Proposed: allow overwrites, so guaranteeing that the last taskAttempt to do the write wins.
# the most likely last task attempt to write will be the sucessful one. Reason: a second attempt
is only committed first task attempt failed, or did not respond in a timely manner to a taskCommit
request.
# assuming time moves forwards, no GCs, etc, task Attempt #2 will inevitably be invoked after
attempt 1.
# if task attempt 1 had successfully committed, but not returned, then it is considered a
failure by the job. Therefore, attempt #2 should be the one which succeeds

> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15107
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper paper, requires
me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the FileOutputCommitter
too)
> # show that the Staging committer meets these requirements (most of this is implicit
in that it uses the V1 FileOutputCommitter to marshall .pendingset lists from committed tasks
to the final destination, where they are read and committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message