hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
Date Mon, 21 Apr 2014 19:20:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975920#comment-13975920

Bikas Saha commented on YARN-1506:

This sleep is too long for a test. Same for other sleeps in the patch.
{code}+    while (alloc1Response.getAllocatedContainers().size() < 1) {
+      LOG.info("Waiting for containers to be created for app 1...");
+      Thread.sleep(1000);{code}

In the test is will be useful to have 2 outstanding container requests for that machine (while
the machine is fully booked) so we know that the scheduler will be trying to allocate on that
machine whenever the machine heartbeats. One outstanding request should be for 2 GB and the
other for 3GB. Also, after the node resource is changed, the test should do a few node heartbeats
to make sure that the scheduler will try to allocate new container (for the outstanding request)
on that node and fail to do so (+ not hit any NPE). Then after the first container completes,
the test should check that the 2GB outstanding container request is satisfiied but not the
3GB request. Thereafter, complete the second container and verify that the 3GB requests is
still not satisfied (because the NM has only 2GB resource.

Which brings another question to my mind. What happens when this change resource command reduces
the NM size to something less than the max container size allowed. e.g. Lets say that the
NM is 8GB and max allowed container size is 4GB. So the RM accepts a 4GB request. Now the
admin changes NM to 2GB. At this point the previously accepted 4GB request cannot be satisfied
and the application will get stuck. We may need to follow this up in a different jira. There
may be some existing jiras related to max container size and actual NM resource size.

bq. Update transition as only 2 lines of code can be shared and shared a method across different
class seems over-kill in this case
Its probably about personal stylistic choices. The number of lines of code are less important.
What is more important is that having a shared method declares that these pieces of code are
related to each other in a logical way (if such a relation exists). The dependency may be
via a method in RMNodeImpl thats called by both transitions or one transition extending the
other one. The choice depends on how the 2 pieces of code are related to each other.

> Replace set resource change on RMNode/SchedulerNode directly with event notification.
> -------------------------------------------------------------------------------------
>                 Key: YARN-1506
>                 URL: https://issues.apache.org/jira/browse/YARN-1506
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, scheduler
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v2.patch, YARN-1506-v3.patch,
YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch,
> According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
we should replace RMNode.setResourceOption() with some resource change event.

This message was sent by Atlassian JIRA

View raw message