hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce
Date Sat, 16 Apr 2016 02:13:25 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243948#comment-15243948

Vinod Kumar Vavilapalli commented on MAPREDUCE-6608:

[~srikanth.sampath] / [~djp],

Got around to reading the design doc attached - there are a few important details that aren't
covered in the doc, besides the AM discovery problem itself

h4. Output Commit of previous tasks
The new AM needs to make sure that output of previously running containers can be safely committed.
IIRC, with today's FileOutputCommitter, new AM will only promote task-outputs that are present
in $jobOutput/_temporary/$currentAttemptID/

Similar changes may be needed for other OutputCommitters out there.

h4. Task Output Commit races

It doesn't look like we record task-commit in JobHistory, so it is possible that the previous
AM gave a commit go-ahead to a taskAttempt which is either (a) in the process of committing
output or (b) committed the output but fails to report to either of the AMs. In this case,
two taskAttempts can be committing at the same time!

In the same line, without recording the success of a commit after a task finishes committing,
we will run into issues.

h4. Conflicting TaskAttemptIDs

Today, we launch containers first and then record it in JobHistory. Because of this, if the
previous AM started a TaskAttempt but crashed before recording it in JobHistory, and this
oldTaskAttempt somehow cannot get reconnected to the new AM due to network issues, the new
AM generates the same TaskAttemptID for a newer attempt and they both will collide on HDFS
and/or the local NM output directories if they both happen to run on the same machine.

The above problem will be worse when speculative tasks are involved.

h4. Security
AM should use the same job-token as the previous incarnation otherwise the old running tasks
will get authentication failures. I quickly checked and it seems like the AM itself generates
the token, which means the second AM will generate a different one and all running tasks will
fail to sync back!

h4. Others
bq. In the WP case, upon a loss of connection to the AM the tasks will try and reestablish
the connection with the new AM.
This will not suffice. It is possible even today, but when a network partition occurs and
two AMs end up running at the same time and give commit-go permission to two TaskAttempts
of the same task, they will collide on the output-commit.

h4. General comments
This stuff is hard. Even if we forget about the AM discovery problem, I am sure others will
find a bunch of other design considerations you may be missing now.

I'd suggest spending more time on the design, atleast on some of the areas I pointed above
and then create a branch, create sub-tasks, do some prototypes etc.

> Work Preserving AM Restart for MapReduce
> ----------------------------------------
>                 Key: MAPREDUCE-6608
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Srikanth Sampath
>            Assignee: Srikanth Sampath
>         Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, WorkPreservingMRAppMaster-2.pdf,
> Providing a framework for work preserving AM is achieved in [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].
 We would like to take advantage of this for MapReduce(MR) applications.  There are some challenges
which have been described in the attached document and few options discussed.  We solicit
feedback from the community.

This message was sent by Atlassian JIRA

View raw message