hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MENG DING (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
Date Mon, 27 Jul 2015 17:46:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643099#comment-14643099
] 

MENG DING commented on YARN-1644:
---------------------------------

bq.  NM re-registration can still happen between the time the increase action is accepted,
and the time it's added into increasedContainers. Even startContainer has the same problem,
newly started container may fall into this tiny window that RM won't recover this container.
Yes, you are right that startContainer would have the same problem. 
So to make it clear, RM restart/NM re-registration can happen in the following scenarios:
* 1. Container resource increase is already completed. In this case, NM re-registration can
send the correct (increased) container size (through containerStatus object) for RM recovery.
* 2. Container to be increased has been added into increasedContainers, but the resource is
not yet updated. In this case, NM re-registration can send the correct container size through
both containerStatus and increasedContainers objects for RM recovery.
* 3. The increase action is accepted, but the container to be increased has not been added
into increasedContainers. In this case, the resource view between NM and RM becomes different.
The same issue applies to startContainers.

I don't have a solution for c yet, but I think the chance for scenario 3 to happen is very
small, especially with the {{blockNewContainerRequests}} and matching RM identifier logic
right now. Maybe we can log a separate JIRA for scenario 3, and fix that for both container
increase and container launch?

> RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-1644
>                 URL: https://issues.apache.org/jira/browse/YARN-1644
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Wangda Tan
>            Assignee: MENG DING
>         Attachments: YARN-1644-YARN-1197.4.patch, YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch,
YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message