Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: softfail (nike.apache.org: transitioning domain of
 david.morel@amakuru.net does not designate 5.57.20.182 as permitted sender)
From: "David Morel" <david.morel@amakuru.net>
To: user@hadoop.apache.org
Subject: Fair Scheduler, minResources and YARN
Date: Fri, 28 Nov 2014 10:47:05 +0100
Message-ID: <EBE9BFFC-B55E-473A-BB29-0B048072FCF6@amakuru.net>
MIME-Version: 1.0
Content-Type: text/plain; format=flowed

Hello Hadoop users,

Since I use CDH, I started posting on the CDH list but got no answer. I 
know I
shouldn't cross-post, but since the audiences might be slightly 
different, I'm
pasting an edited version of my messages here. I hope someone 
knowledgeable
enough about the fair scheduler on yarn can confirm or explain my 
findings...

First message I posted:

> I have a small CDH5.1 cluster for testing, and I am trying to 
> configure the
> FairScheduler to handle my use case (and failling).
>
> - I have several queues that should have a high priority and 
> guaranteed
>  resources so I configured them with high enough minResources.  - the 
> total number of min vcores is higher than what the cluster has,
>  but the apps/queues never run all at the same time
> - when I launch apps in other queues, even though no app is running on
>  the cluster at all, they don't even start as if the priority queues
>  held on to the resources.
>
> My understanding -which is obviously false- was that the minResources 
> were
> taken into account when the cluster was actually busy, not when it was
> completely idle.  I would expect the applications having unspecified
> minResources to preempt available resources when there are plenty, not 
> being
> prevented from running completely.
>
> Can someone explain what happens, what is supposed to happen, and how 
> I can
> configure the queues so that the full capacity is available to my 
> users
> except when some priority tasks are started that should have a 
> guaranteed
> number of vcores given to them?

Followed by a reply to myself:

>> My understanding -which is obviously false- was that the minResources 
>> were
>> taken into account when the cluster was actually busy, not when it 
>> was
>> completely idle.  I would expect the applications having unspecified
>> minResources to preempt available resources when there are plenty, 
>> not being
>> prevented from running completely.
>
>
> Which is exactly what happens on a 5.2 VM that I downloaded: the 
> queues that
> have no minresources set fire just fine, even though they have a fair 
> share
> of 0.
>
> I checked the configuration differences between the VM and my cluster, 
> and
> nothing stands out. Is there a different behaviour between 2.3 and 2.5 
> that
> could explain this? I can't really believe that such a big change 
> could have
> happened, so I must have missed something (?)

And my conclusion after more digging:

> I think what happens is this:
>
> The base allocation for a queue is its fair share. The fair share is 
> the total
> resources available on the cluster divided by the number of queues 
> (weighted).
> However, the minResources parameter for a queue, if present, is taken 
> as its
> fair share, regardless of cluster capacity or other shares. This 
> implies that
> if the total of minResources params for  the cluster is higher or 
> equal to the
> total capacity, the queues having no minResources parameter will be 
> given a
> fair share of zero. Since preemption (the applications taking cluster
> resources beyond their fair share) happens only in certain cases 
> (actual
> share for a queue being less than 50% of the fair share for some time) 
> that
> can never be true, the applications will never obtain the resources 
> they
> need to run, and stay in pending state forever.
>
> The change introduced in 5.2 where fair shares are computed 
> considering active
> queues only seems to fix that.
>
> I think the relevant jira ticket is this one:
>  https://issues.apache.org/jira/browse/YARN-2026
> Comments and corrections would be very welcome :-)

My conclusion is to move to 5.2 (hadoop 2.5) to support the allocations
I have, but any confirmation or better explanation would be awesome.

Thanks a lot

D.Morel