hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
Date Wed, 13 Jan 2016 22:58:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097218#comment-15097218

Jason Lowe commented on YARN-4311:

Thanks for updating the patch!

I'd like to see the getTimestamp/setTimestamp methods be a bit more specific, like getInactiveTimestamp
or getUntrackedTimestamp or something similar.  Currently it sounds like a generic timestamp
and therefore is unclear what the context is and when it should and should not be used.

Similarly I think the property names are a bit too vague.  Would like to see "inactive", "untracked"
or some other adjective in the name to qualify when the properties apply.

Nit: interval and timeout sound similar without qualification, e.g.: there's already expiry-interval
properties that are really timeouts.  Should interval be check-interval or something to distinguish
it?  And would it make sense to derive one from the other, similar to what is done for AM/NM
expiry interval?  (e.g: interval could be a fraction of timeout, potentially with a maximum
interval cap of 10 minutes or something.)  That's another way to eliminate the potential for
confusion or misconfig, but it does make it a little more inflexible.

Nit: There should be whitespace between the properties for readability and consistency with
the rest of the file.  Typically the property name and default are clustered together with
gaps between different properties.

Why does isUntrackedNode do a conf lookup and assign includesFile?  It seems unrelated to
what it's doing, and if it is related, why not also excludesFile?  This gets called in a loop,
potentially over many nodes.  conf lookups can be a bit pricey, and we should avoid doing
them if they're not needed.

Time.monotonicNow should be used for the timestamps rather than System.currentTimeMillis to
avoid problems when the system time is adjusted.

The call to Timer.purge after Timer.cancel is redundant.  There are no tasks on the timer
after Timer.cancel is called.

> Removing nodes from include and exclude lists will not remove them from decommissioned
nodes list
> -------------------------------------------------------------------------------------------------
>                 Key: YARN-4311
>                 URL: https://issues.apache.org/jira/browse/YARN-4311
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, YARN-4311-v3.patch, YARN-4311-v4.patch,
YARN-4311-v5.patch, YARN-4311-v6.patch, YARN-4311-v7.patch
> In order to fully forget about a node, removing the node from include and exclude list
is not sufficient. The RM lists it under Decomm-ed nodes. The tricky part that [~jlowe] pointed
out was the case when include lists are not used, in that case we don't want the nodes to
fall off if they are not active.

This message was sent by Atlassian JIRA

View raw message