hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
Date Sun, 20 Sep 2015 20:58:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900067#comment-14900067

zhihai xu commented on YARN-3446:

Hi [~kasha], Thanks for the review! I attached a new patch YARN-3446.003.patch, which addressed
your first comment. I also added more test cases to verify {{getHeadroom}} with blacklisted
nodes remove and addition.
About your second comment: IMHO, if we didn't do the optimization, that will be a very big
overhead for a large cluster. For example, we have 2000 AM running on 5000 nodes cluster,
For each AM, we need go through 5000 nodes list to find the blacklisted {{SchedulerNode}}
in the heartbeat. With 2000 AM, it will loop 10,000,000 times. Normally number of blacklisted
nodes should be very small for each application. So iterating on the blacklisted nodes may
not be a performance issue. Also AM won't change blacklisted nodes frequently.
About your third comment, it is because currently {{SchedulerNode}} are stored in {{AbstractYarnScheduler#nodes}}
with key {{NodeId}}. But {{AppSchedulingInfo}} stores the blacklisted nodes using {{String}}
Node Name or Rack Name. I can't find an easy way to translate Node Name and Rack Name to {{NodeId}}.
So it looks like we need iterate through {{AbstractYarnScheduler#nodes}} to find the blacklisted
{{SchedulerNode}} if we use {{AppSchedulingInfo#getBlacklist}}. That means for a 5000 nodes
cluster, we need loop 5000 times, a big overhead. {{AbstractYarnScheduler#nodes}} are defined
at the following code:
  protected Map<NodeId, N> nodes = new ConcurrentHashMap<NodeId, N>();

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -------------------------------------------------------------------------
>                 Key: YARN-3446
>                 URL: https://issues.apache.org/jira/browse/YARN-3446
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-3446.000.patch, YARN-3446.001.patch, YARN-3446.002.patch, YARN-3446.003.patch
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption calculation,
headRoom is considering blacklisted nodes. This makes jobs to hang forever(ResourceManager
does not assign any new containers on blacklisted nodes but availableResource AM get from
RM includes blacklisted nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.

This message was sent by Atlassian JIRA

View raw message