hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
Date Tue, 05 Dec 2017 02:08:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277895#comment-16277895
] 

Robert Kanter commented on YARN-6483:
-------------------------------------

[~asuresh], did you mean to commit this to branch-3.0?  The fix version for this JIRA says
3.1.0.
Plus, the {{TestResourceTrackerService#testGracefulDecommissionDefaultTimeoutResolution}}
added here is relying on an XML excludes file, which is currently only supported in trunk
(YARN-7162), so it fails when run in branch-3.0 because it reads each line of XML as a separate
host (e.g. {{<?xml}}, {{<name>host1</name>}}, etc):
{noformat}
Running org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
Tests run: 35, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.706 sec <<< FAILURE!
- in org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
testGracefulDecommissionDefaultTimeoutResolution(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
 Time elapsed: 23.913 sec  <<< FAILURE!
java.lang.AssertionError: Node state is not correct (timedout) expected:<DECOMMISSIONING>
but was:<RUNNING>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:908)
	at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testGracefulDecommissionDefaultTimeoutResolution(TestResourceTrackerService.java:345)
{noformat}

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned
to the AM
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6483
>                 URL: https://issues.apache.org/jira/browse/YARN-6483
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Juan Rodríguez Hortalá
>            Assignee: Juan Rodríguez Hortalá
>             Fix For: 3.1.0
>
>         Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful decommissioning
mechanism to give time for tasks to complete in a node that is scheduled for decommission,
and for reducer tasks to read the shuffle blocks in that node. Also, YARN effectively blacklists
nodes in DECOMMISSIONING state by assigning them a capacity of 0, to prevent additional containers
to be launched in those nodes, so no more shuffle blocks are written to the node. This blacklisting
is not effective for applications like Spark, because a Spark executor running in a YARN container
will keep receiving more tasks after the corresponding node has been blacklisted at the YARN
level. We would like to propose a modification of the YARN heartbeat mechanism so nodes transitioning
to DECOMMISSIONING are added to the list of updated nodes returned by the Resource Manager
as a response to the Application Master heartbeat. This way a Spark application master would
be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message