Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5945410B16 for ; Fri, 28 Nov 2014 09:49:52 +0000 (UTC) Received: (qmail 27108 invoked by uid 500); 28 Nov 2014 09:49:48 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 26968 invoked by uid 500); 28 Nov 2014 09:49:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 26957 invoked by uid 99); 28 Nov 2014 09:49:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2014 09:49:47 +0000 X-ASF-Spam-Status: No, hits=-1.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of david.morel@amakuru.net does not designate 5.57.20.182 as permitted sender) Received: from [5.57.20.182] (HELO mail-out5.booking.com) (5.57.20.182) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2014 09:49:19 +0000 Received: from mrx-202.lhr4.prod.booking.com ([10.182.6.179]:37258 helo=msx6.booking.com) by mtx-201.lhr4.prod.booking.com with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.84) (envelope-from ) id 1XuI8u-00043w-9T for user@hadoop.apache.org; Fri, 28 Nov 2014 10:47:12 +0100 Received: from ip-238.net-81-220-210.rev.numericable.fr ([81.220.210.238]:63210 helo=[192.168.0.117]) by mrx-202.lhr4.prod.booking.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.84) (envelope-from ) id 1XuI8t-00026R-TY for user@hadoop.apache.org; Fri, 28 Nov 2014 10:47:12 +0100 From: "David Morel" To: user@hadoop.apache.org Subject: Fair Scheduler, minResources and YARN Date: Fri, 28 Nov 2014 10:47:05 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; format=flowed X-Mailer: MailMate Trial (1.8r4576) X-AH-Spam-Helo: [192.168.0.117] X-AH-From: david.morel@amakuru.net X-AH-Rcpt: user@hadoop.apache.org X-Virus-Checked: Checked by ClamAV on apache.org Hello Hadoop users, Since I use CDH, I started posting on the CDH list but got no answer. I know I shouldn't cross-post, but since the audiences might be slightly different, I'm pasting an edited version of my messages here. I hope someone knowledgeable enough about the fair scheduler on yarn can confirm or explain my findings... First message I posted: > I have a small CDH5.1 cluster for testing, and I am trying to > configure the > FairScheduler to handle my use case (and failling). > > - I have several queues that should have a high priority and > guaranteed > resources so I configured them with high enough minResources. - the > total number of min vcores is higher than what the cluster has, > but the apps/queues never run all at the same time > - when I launch apps in other queues, even though no app is running on > the cluster at all, they don't even start as if the priority queues > held on to the resources. > > My understanding -which is obviously false- was that the minResources > were > taken into account when the cluster was actually busy, not when it was > completely idle. I would expect the applications having unspecified > minResources to preempt available resources when there are plenty, not > being > prevented from running completely. > > Can someone explain what happens, what is supposed to happen, and how > I can > configure the queues so that the full capacity is available to my > users > except when some priority tasks are started that should have a > guaranteed > number of vcores given to them? Followed by a reply to myself: >> My understanding -which is obviously false- was that the minResources >> were >> taken into account when the cluster was actually busy, not when it >> was >> completely idle. I would expect the applications having unspecified >> minResources to preempt available resources when there are plenty, >> not being >> prevented from running completely. > > > Which is exactly what happens on a 5.2 VM that I downloaded: the > queues that > have no minresources set fire just fine, even though they have a fair > share > of 0. > > I checked the configuration differences between the VM and my cluster, > and > nothing stands out. Is there a different behaviour between 2.3 and 2.5 > that > could explain this? I can't really believe that such a big change > could have > happened, so I must have missed something (?) And my conclusion after more digging: > I think what happens is this: > > The base allocation for a queue is its fair share. The fair share is > the total > resources available on the cluster divided by the number of queues > (weighted). > However, the minResources parameter for a queue, if present, is taken > as its > fair share, regardless of cluster capacity or other shares. This > implies that > if the total of minResources params for the cluster is higher or > equal to the > total capacity, the queues having no minResources parameter will be > given a > fair share of zero. Since preemption (the applications taking cluster > resources beyond their fair share) happens only in certain cases > (actual > share for a queue being less than 50% of the fair share for some time) > that > can never be true, the applications will never obtain the resources > they > need to run, and stay in pending state forever. > > The change introduced in 5.2 where fair shares are computed > considering active > queues only seems to fix that. > > I think the relevant jira ticket is this one: > https://issues.apache.org/jira/browse/YARN-2026 > Comments and corrections would be very welcome :-) My conclusion is to move to 5.2 (hadoop 2.5) to support the allocations I have, but any confirmation or better explanation would be awesome. Thanks a lot D.Morel