hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Dhoot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers
Date Wed, 25 Jun 2014 00:13:24 GMT

    [ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042879#comment-14042879

Anubhav Dhoot commented on YARN-1367:

[~jianhe] uploading a new patch that addressed some of the comments, here are the remaining

> we should have ResourceTrackerService change in this patch also to send resync on non-work-preserving
case and resnc_keeping_containers in work-preserving case ?
Should we do it in a separate jira? This can keep the NM side changes. I can open one if you

>code should be cleaner if using separate if case 
I am trying to keep the 2 cases the same except for killing containers. Hence the boolean
flag to distinguish that line and rest remains same without duplication.

>testPreserveContainersOnResyncKeepingContainers -> testKeepContainersOnResync
The name explicitly indicates the event name ResyncKeepingContainers has different behavior
than Resync

Lemme know what you think

> After restart NM should resync with the RM without killing containers
> ---------------------------------------------------------------------
>                 Key: YARN-1367
>                 URL: https://issues.apache.org/jira/browse/YARN-1367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1367.001.patch, YARN-1367.002.patch, YARN-1367.prototype.patch
> After RM restart, the RM sends a resync response to NMs that heartbeat to it.  Upon receiving
the resync response, the NM kills all containers and re-registers with the RM. The NM should
be changed to not kill the container and instead inform the RM about all currently running
containers including their allocations etc. After the re-register, the NM should send all
pending container completions to the RM as usual.

This message was sent by Atlassian JIRA

View raw message