hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce
Date Tue, 19 Apr 2016 16:47:25 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248151#comment-15248151
] 

Junping Du edited comment on MAPREDUCE-6608 at 4/19/16 4:46 PM:
----------------------------------------------------------------

[~vinodkv], thanks for review and comments. I think most your points here are solid, however,
the comments about "Output Commit of previous tasks" is a bit stale.

bq. The new AM needs to make sure that output of previously running containers can be safely
committed. IIRC, with today's FileOutputCommitter, new AM will only promote task-outputs that
are present in $jobOutput/_temporary/$currentAttemptID/
This is true before YARN-4815. However, after YARN-4815, most task-output commit to job final
output is handled by {{FileOutputCommitter.commitTask()}} instead of {{FileOutputCommitter.commitJob()}}.
So the commitJob() only left work of cleanup $jobOutput/_temporary. So there is nothing need
to do here except we make sure "mapreduce.fileoutputcommitter.algorithm.version" is set to
2. 
This is also an assumption setting for work of MAPREDUCE-5485 which is a prerequisite for
feature here - or AM will failed directly in case previous AM ends in job committing.

Investigating on rest of issues and will bring some possible proposals later.  


bq. I'd suggest spending more time on the design, atleast on some of the areas I pointed above
and then create a branch, create sub-tasks, do some prototypes etc.
+1. This feature work could be a bit over my expectation before. I agree we could need a separated
branch for developing this in parallel. Will create a branch once we finalize our design work.




was (Author: djp):
[~vinodkv], thanks for review and comments. I think most your points here are solid, however,
the comments about "Output Commit of previous tasks" is a bit stale.

bq. The new AM needs to make sure that output of previously running containers can be safely
committed. IIRC, with today's FileOutputCommitter, new AM will only promote task-outputs that
are present in $jobOutput/_temporary/$currentAttemptID/
This is true before YARN-4815. However, after YARN-4815, most task-output commit to job final
output is handled by {{FileOutputCommitter.commitTask()}} instead of {{FileOutputCommitter.commitJob()}}.
So the commitJob() only left work of cleanup $jobOutput/_temporary. So there is nothing need
to do here unless we make sure "mapreduce.fileoutputcommitter.algorithm.version" is set to
2. 
This is also an assumption setting for work of MAPREDUCE-5485 which is a prerequisite for
feature here - or AM will failed directly in case previous AM ends in job committing.

Investigating on rest of issues and will propose some possible solutions later.  


bq. I'd suggest spending more time on the design, atleast on some of the areas I pointed above
and then create a branch, create sub-tasks, do some prototypes etc.
+1. This feature work could be a bit over my expectation before. I agree we could need a separated
branch for developing this in parallel. Will create a branch once we finalize our design work.



> Work Preserving AM Restart for MapReduce
> ----------------------------------------
>
>                 Key: MAPREDUCE-6608
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Srikanth Sampath
>            Assignee: Srikanth Sampath
>         Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, WorkPreservingMRAppMaster-2.pdf,
WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].
 We would like to take advantage of this for MapReduce(MR) applications.  There are some challenges
which have been described in the attached document and few options discussed.  We solicit
feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message