hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
Date Mon, 13 Jan 2014 19:45:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869882#comment-13869882
] 

Bikas Saha commented on YARN-1506:
----------------------------------

Thanks for the change. The code looks a lot cleaner now with a bunch on changes removed.

ADMIN_RESOURCE_UPDATE instead of RESOURCE_UPDATE for the enum would help clarify that its
a forced admin update. 

Why not update the total capability here also (like we do for non-running node). When the
node reports back as healthy then we would probably need the new resource value, right?
{code}
+    public void transition(RMNodeImpl rmNode, RMNodeEvent event) {
+      // The node is not usable, only log a warn message
+      LOG.warn("Try to update resource on a "+ rmNode.getState().toString() +
+          " node: "+rmNode.toString());
{code}

Why are we doing this indirect subtraction via delta instead of simply clusterResource-=old;
clusterResource+=new. Its the same number of operations and less confusing to read.
{code}+    Resource deltaResource = Resources.subtract(newResource, oldResource);
+
+    // update resource to node
+    node.setTotalResource(newResource);
+    
+    // update resource to clusterResource 
+    Resources.addTo(clusterResource, deltaResource);{code}

Newly added synchronization? Sometimes getters are deliberately made lockless. I hope this
was not the case here.
{code}
-  public Resource getTotalResource() {
+  public synchronized Resource getTotalResource() {
{code}

I think its crucial to have a more complete test (maybe using mockRM) that verifies the flow
from admin service to the scheduler. Most interesting would be the case when the node is full
allocated and then an update reduces the capacity. Thus resulting in -ve value of available
resource on the node. I am wary that this case may have bugs in handling the -ve value in
existing scheduler code because its unexpected. Its fine for the test to use the default scheduler.

> Replace set resource change on RMNode/SchedulerNode directly with event notification.
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1506
>                 URL: https://issues.apache.org/jira/browse/YARN-1506
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, scheduler
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Blocker
>         Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch,
YARN-1506-v5.patch, YARN-1506-v6.patch
>
>
> According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message