hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
Date Thu, 09 Jan 2014 03:32:57 GMT

    [ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866264#comment-13866264

Jian He commented on YARN-1490:

Uploaded a new patch 

bq. Change it also transfer running containers too?
this is already there.
bq. Extend the test to explicitly validate that allocated/acquired/reserved are killed ?
acquired  to be killed test is already there. Add test for allocated container to be killed.
bq. That TODO will be fixed in this ticket itself? Or a separate one?
bq. We need to see if the remaining state needs to be transferred to. There is some commented
I intentionally commented those, in case I forgot, I think for now those are not needed yet.
bq. application.setShouldRecover(false); 
Earlier I thought to reset the flag, but given we are always passing the correct flag via
AppAttemptRemovedSchedulerEvent, remove it.
bq. The following code-block can be in the constructor?  if (!appAttempt.submissionContext.getCleanContainersWhenFail())
bq. Similarly the check for unmanaged AM can also be in the constructor?
This flag should apply only when the attempt is failed, not for killed/ finished. right? Added
test cases for this.
bq. Can you file a ticket for the broken stuff in unmanaged-AM?
done. YARN-1577

> RM should optionally not kill all containers when an ApplicationMaster exits
> ----------------------------------------------------------------------------
>                 Key: YARN-1490
>                 URL: https://issues.apache.org/jira/browse/YARN-1490
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>         Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch
> This is needed to enable work-preserving AM restart. Some apps can chose to reconnect
with old running containers, some may not want to. This should be an option.

This message was sent by Atlassian JIRA

View raw message