hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2641) Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
Date Mon, 13 Oct 2014 22:47:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170149#comment-14170149
] 

zhihai xu commented on YARN-2641:
---------------------------------

I didn't see the failure(TestAMRestart) in my local build based on latest code base:
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 119.464 sec - in org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Results :

Tests run: 6, Failures: 0, Errors: 0, Skipped: 0


> Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
> -------------------------------------------------------------------
>
>                 Key: YARN-2641
>                 URL: https://issues.apache.org/jira/browse/YARN-2641
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2641.000.patch, YARN-2641.001.patch, YARN-2641.002.patch, YARN-2641.003.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat from the
Node Manager. The node heartbeat interval is configurable. The default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) instead
of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed before
NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. The RMNode will only
expire in RM after  "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes)
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message