hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Palaniappan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers
Date Mon, 09 Apr 2018 23:07:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431472#comment-16431472

Karthik Palaniappan commented on YARN-8118:

Not sure I understand your use cases (@Jason/@Junping). For jobs that produce shuffle data
(i.e. all Hadoop-ecosystem jobs?), killing a container is just as bad as removing the
shuffle it produced. I can imagine a few reasonable scenarios around removing nodes:

1) immediately remove nodes (regular decommissioning)

2) wait for containers to finish, but don't wait until applications finish (scenarios where
shuffle doesn't matter)

3) wait for apps to finish and let in-progress apps use decommissioning nodes

#1 is regular (forceful) decommissioning. #3 is my proposal  – focused at cloud environments
with potentially drastic scaling events. #2 makes sense for non-cloud environments where few
nodes are being removed at a time. It also makes sense when running jobs that don't produce
shuffle output.

So if you're willing to tolerate a behavioral change, maybe #2 should be the default, and
#3 should be an additional flag (either an XML property or a flag on the graceful decommission

However, as currently implemented, it seems like graceful decommissioning is the worst of
all worlds – wait for apps to finish, but don't let apps use decommissioning nodes. Am
I missing something obvious here? I couldn't find anything in the original design docs discussing
why it was implemented that way.

> Better utilize gracefully decommissioning node managers
> -------------------------------------------------------
>                 Key: YARN-8118
>                 URL: https://issues.apache.org/jira/browse/YARN-8118
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 2.8.2
>         Environment: * Google Compute Engine (Dataproc)
>  * Java 8
>  * Hadoop 2.8.2 using client-mode graceful decommissioning
>            Reporter: Karthik Palaniappan
>            Priority: Major
>         Attachments: YARN-8118-branch-2.001.patch
> Proposal design doc with background + details (please comment directly on doc): [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications to complete
before shutting down, but they cannot run new containers from those in-progress applications. This
is wasteful, particularly in environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on DECOMMISSIONING
nodes, but should still avoid scheduling containers from new applications. That will make in-progress
applications complete faster and let nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real lines of
code (patch 001). If folks are happy with the proposal, I'll write unit tests and also write
a patch targeted at trunk.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message