hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2641) Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
Date Tue, 14 Oct 2014 15:09:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171034#comment-14171034
] 

Hudson commented on YARN-2641:
------------------------------

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1926 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1926/])
YARN-2641. Decommission nodes on -refreshNodes instead of next NM-RM heartbeat. (Zhihai Xu
via kasha) (kasha: rev da709a2eac7110026169ed3fc4d0eaf85488d3ef)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java


> Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
> -------------------------------------------------------------------
>
>                 Key: YARN-2641
>                 URL: https://issues.apache.org/jira/browse/YARN-2641
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>             Fix For: 2.7.0
>
>         Attachments: YARN-2641.000.patch, YARN-2641.001.patch, YARN-2641.002.patch, YARN-2641.003.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat from the
Node Manager. The node heartbeat interval is configurable. The default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) instead
of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed before
NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. The RMNode will only
expire in RM after  "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes)
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message