hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Welch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
Date Wed, 03 Dec 2014 17:54:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233244#comment-14233244

Craig Welch commented on YARN-2637:

First, the easy parts :-) 

bq. typo for manualy


bq. usedAMResources is not used by sub-class, so suggest to replace it with private


bq. Exception messages here should be more meaningful than "c1", or "c2".

yup - fixed

bq. The log level here should be info or warn level rather than debug level. In most cases,
LOG.debug() should be under block of LOG.isDebugEnabled().

So, I had made this debug rather than something higher because I'm not sure we always care,
and it doesn't represent a failure case - this is normal/expected case, and other similar
cases for not starting the app don't log at all.  But, I can see that it will be helpful to
know this, and I don't think that it will result in excessive logging - so I went ahead and
made it an "info" level, sound good?  BTW, the "isXYZenabled" idiom is to save the cost of
evaluating the argument construction for the log message as these can be very expensive, but
for cheap cases like this (a string literal) it's not necessary as the only cost is going
to be the same evaluation for logging which will happen during the call

Now for the more complicated one:

bq. Looks like maxAMResourcePerQueuePercent is a allowed percent for AM resource in each queue.
So we may should calculate amLimit per queue rather than aggregate all applications together.

So, yes and no - the current behavior actually takes the maxAM... which is set globally and
it apportions it out based on the queue's baseline share of the cluster - so if the maxam
was say, 10%, and a given queue had 50% of the cluster, it would have an effective maxampercent
value of 5% (it's translated into "how many apps can I have running" based on the minallocation
of the cluster rather than actual am usage - which is the problem which prompted the fix -
but the important thing to get here is the way the overall maxampercent is apportioned out
to the queues)  There is also the option to override on a per queue basis, so that, in the
above scenario, if you didn't like the queue getting the 5% based on the overall process,
but you were happy with how other queues were working using the config, you could just override
for the given queue.

When I tried to translate this into something which was actually paying attention to the real
usage of the ams, two approaches seemed reasonable:

1. Just have a global used am resource value, use the global am percent everywhere (not apportioned)
- this way the total cluster level effect is what we want - in this case, the subdivision
of the amresource percent value is replaced with a total summing of the used resource amongst
the queues.  You can still override for a given queue if you want "this queue to be able to
go higher", which has the effective result of allowing one queue to go higher than the others,
this could starve other queues (bad) but that was already possible with the other approach,
albeit in a different way (when the cluster came to be filled with AM's from one particular

2.  We could subdivide the global maxampercent based on the queue share of the baseline (as
before) and then have a per-queue amresource percent (and amused) which are evaluated - this
would not be a difficult change from the current approach, but I think it is problematic for
the reason below 

The main reason I took approach number one over two is that I was concerned that with a complex
queue structure where there was a reasonable level of subdivision in a smallish cluster you
could end up with a queue which can effectively never start anything because the final value
is too small to ever be able to start one of the larger AM's we have these days.  By sharing
it globally this is less likely to happen because that "unused am resource" allocated out
to other queues which have a larger share of the cluster is not potentially sitting idle while
"leaf queue a.b.c" has a derived maxampercent of say 2%, which translates into 512mb, and
so can never start an application master which needs 1G (even though, globally, there's more
than enough ampercent to do so).  It's the "this queue can never start an am over x size"
that concerns me.  There are other possible ways to handle this with option 2, but I'm concerned
that they would add complexity to the behavior and change the behavior more than is needed
to correct the defect.

[~djp]  Make sense?  Thoughts?  I may take a go at option 2 so we can evaluate it, but I'm
concerned about the small cluster/too much subdivision scenario being problematic.

> maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
> ----------------------------------------------------------------------------------------
>                 Key: YARN-2637
>                 URL: https://issues.apache.org/jira/browse/YARN-2637
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Wangda Tan
>            Assignee: Craig Welch
>            Priority: Critical
>         Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch,
YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be activated in following
> {code}
>     for (Iterator<FiCaSchedulerApp> i=pendingApplications.iterator(); 
>          i.hasNext(); ) {
>       FiCaSchedulerApp application = i.next();
>       // Check queue limit
>       if (getNumActiveApplications() >= getMaximumActiveApplications()) {
>         break;
>       }
>       // Check user limit
>       User user = getUser(application.getUser());
>       if (user.getActiveApplications() < getMaximumActiveApplicationsPerUser()) {
>         user.activateApplication();
>         activeApplications.add(application);
>         i.remove();
>         LOG.info("Application " + application.getApplicationId() +
>             " from user: " + application.getUser() + 
>             " activated in queue: " + getQueueName());
>       }
>     }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum resource that
AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user
uses 5M for each AM (> minimum_allocation). All apps can still be activated, and it will
occupy all resource of a queue instead of only a max_am_resource_percent of a queue.

This message was sent by Atlassian JIRA

View raw message