hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2001) Persist NMs info for RM restart
Date Tue, 06 May 2014 03:28:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990241#comment-13990241

Karthik Kambatla commented on YARN-2001:

bq. we may run into condition like the resource usage, capacity limit (e.g. headroom, queue
capacity etc. ) in scheduler is not yet correct until all the nodes sync back all the running
containers belong to the app, applications/queues can potentially go beyond its limit.

My understanding has been that the RM's scheduler starts from scratch on restart/failover
and rebuilds its state as nodes heartbeat. At any point in time, the cluster's resources correspond
only to the NMs that have registered with the "new" RM. IOW, this should be no different from
a new cluster. Given this, I am not sure how the scheduler can have incorrect information.

> Persist NMs info for RM restart
> -------------------------------
>                 Key: YARN-2001
>                 URL: https://issues.apache.org/jira/browse/YARN-2001
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
> RM should not accept allocate requests from AMs until all the NMs have registered with
RM. For that, RM needs to remember the previous NMs and wait for all the NMs to register.
> This is also useful for remembering decommissioned nodes across restarts.

This message was sent by Atlassian JIRA

View raw message