hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2001) Persist NMs info for RM restart
Date Tue, 06 May 2014 03:51:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990252#comment-13990252

Jian He commented on YARN-2001:

In a simple case that an application is granted 50% of the cluster resource. The cluster has
2 nodes. the application used up all its resource quota and launched all containers on node1.
RM fails over and node2 first re-syncs back with RM. Since node2 has no containers running
for this application, AM asks for more containers and RM will think this AM hasn’t used
any resources and will grant it more resources on node1. Then node1 comes back to RM, RM recovers
all containers on node1. The application end up with more than 50% resource limit.

Another example would be RM needs to generate new container Id for the new containers requested
from AM. If RM accepts new requests from AM before nodes sync back, the new container Id may
overlap with the Ids of the recovered containers. 

> Persist NMs info for RM restart
> -------------------------------
>                 Key: YARN-2001
>                 URL: https://issues.apache.org/jira/browse/YARN-2001
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
> RM should not accept allocate requests from AMs until all the NMs have registered with
RM. For that, RM needs to remember the previous NMs and wait for all the NMs to register.
> This is also useful for remembering decommissioned nodes across restarts.

This message was sent by Atlassian JIRA

View raw message