hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kuhu Shukla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
Date Mon, 11 Apr 2016 15:35:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235307#comment-15235307

Kuhu Shukla commented on YARN-4311:

The problem does exist. In a scenario where the node being removed from the list is already
in REBOOTED, LOST state then node metrics would go out of sync. 
There are 2 approaches to solve this:
1. Have the appropriate metrics decremented and let the timer remove such nodes.
2. Have the timer remove only nodes that are DECOMMISSIONED.

Both LOST and REBOOTED states are self arcs to themselves in the Finite State Machine, just
like DECOMMISSIONED nodes, so that removes some complexity if we go with option 1.

[~jlowe], Requesting you for your comments on the 2 approaches and which one sounds better.

Will open a follow up JIRA shortly.

> Removing nodes from include and exclude lists will not remove them from decommissioned
nodes list
> -------------------------------------------------------------------------------------------------
>                 Key: YARN-4311
>                 URL: https://issues.apache.org/jira/browse/YARN-4311
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>             Fix For: 2.8.0
>         Attachments: YARN-4311-branch-2.7.001.patch, YARN-4311-branch-2.7.002.patch,
YARN-4311-branch-2.7.003.patch, YARN-4311-branch-2.7.004.patch, YARN-4311-v1.patch, YARN-4311-v10.patch,
YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, YARN-4311-v13.patch, YARN-4311-v13.patch,
YARN-4311-v14.patch, YARN-4311-v2.patch, YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch,
YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch
> In order to fully forget about a node, removing the node from include and exclude list
is not sufficient. The RM lists it under Decomm-ed nodes. The tricky part that [~jlowe] pointed
out was the case when include lists are not used, in that case we don't want the nodes to
fall off if they are not active.

This message was sent by Atlassian JIRA

View raw message