hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2641) Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
Date Wed, 15 Oct 2014 18:21:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172718#comment-14172718
] 

zhihai xu commented on YARN-2641:
---------------------------------

thanks [~kasha], [~jianhe], [~djp] and [~ywskycn] for the review.

> Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
> -------------------------------------------------------------------
>
>                 Key: YARN-2641
>                 URL: https://issues.apache.org/jira/browse/YARN-2641
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>             Fix For: 2.7.0
>
>         Attachments: YARN-2641.000.patch, YARN-2641.001.patch, YARN-2641.002.patch, YARN-2641.003.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat from the
Node Manager. The node heartbeat interval is configurable. The default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) instead
of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed before
NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. The RMNode will only
expire in RM after  "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes)
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message