hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
Date Wed, 08 Jul 2015 15:33:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618776#comment-14618776
] 

Sunil G commented on YARN-3894:
-------------------------------

Thanks [~bibinchundatt] for reporting and providing analysis.

During {{initScheduler}} call from *CapacityScheduler#serviceInit*, we will initialize the
queues. In the same callflow, we also will validate the capacity of nodelabel against the
queue capacity from {{ParentQueue#setChildQueues}}.
{code}
   // check label capacities
    for (String nodeLabel : labelManager.getClusterNodeLabelNames()) {
      float capacityByLabel = queueCapacities.getCapacity(nodeLabel);
      // check children's labels
      float sum = 0;
      for (CSQueue queue : childQueues) {
        sum += queue.getQueueCapacities().getCapacity(nodeLabel);
      }
      if ((capacityByLabel > 0 && Math.abs(1.0f - sum) > PRECISION)
          || (capacityByLabel == 0) && (sum > 0)) {
        throw new IllegalArgumentException("Illegal" + " capacity of "
            + sum + " for children of queue " + queueName
            + " for label=" + nodeLabel);
      }
    }
{code}

As per this code, if there is a mismatch in capacity for nodelabel against the queue capacity,
it should through *IllegalArgumentException*. But this will not happen in a case where we
configure a wrong capacity for label in cs xml, and restart RM.

*Issue:*
During {{CommonNodeLabelsManager#serviceStart}}, labels will re-populated from old mirror
file. But {{initScheduler}} and above call flow will happen from *serviceInit* instead of
*serviceStart*
This will make {{labelManager.getClusterNodeLabelNames()}} call as empty in above code. and
desired exception wont be thrown.

IMO We can move the node label init and recovery to serviceInit rather than serviceStart.
[~leftnoteasy], could you please pool in your thoughts.

> RM startup should fail for wrong CS xml NodeLabel capacity configuration 
> -------------------------------------------------------------------------
>
>                 Key: YARN-3894
>                 URL: https://issues.apache.org/jira/browse/YARN-3894
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: capacity-scheduler.xml
>
>
> Currently in capacity Scheduler when capacity configuration is wrong
> RM shutdown is the current behaviour, but not incase of NodeLabels capacity mismatch
> In {{CapacityScheduler#initializeQueues}}
> {code}
>   private void initializeQueues(CapacitySchedulerConfiguration conf)
>     throws IOException {   
>     root = 
>         parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
>             queues, queues, noop);
>     labelManager.reinitializeQueueLabels(getQueueToLabels());
>     root = 
>         parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
>             queues, queues, noop);
>     LOG.info("Initialized root queue " + root);
>     initializeQueueMappings();
>     setQueueAcls(authorizer, queues);
>   }
> {code}
> {{labelManager}} is initialized from queues and calculation for Label level capacity
mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will
be empty . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message