hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
Date Tue, 02 Aug 2016 16:42:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404354#comment-15404354
] 

Robert Kanter commented on YARN-5465:
-------------------------------------

I think the second option is better.  Even though updating the timeout of a currently decommissioning
node is harder, it's at least possible to have different sets of decommissioning nodes with
different timeouts, which seems like a common scenario to me.  The first option doesn't allow
you to do this at all.

> Server-Side NM Graceful Decommissioning subsequent call behavior
> ----------------------------------------------------------------
>
>                 Key: YARN-5465
>                 URL: https://issues.apache.org/jira/browse/YARN-5465
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful
>            Reporter: Robert Kanter
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has the following
behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes updates the
timeout for any currently decommissioning nodes.  This makes it impossible to gracefully decommission
different sets of nodes with different timeouts.  Though it does let you easily update the
timeout of currently decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully decommissioning
nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes independent.
 You can now have different sets of decommissioning nodes with different timeouts.  However,
to update the timeout of a currently decommissioning node, you'd have to first recommission
it, and then decommission it again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message