Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0FFB317761 for ; Thu, 3 Mar 2016 04:33:56 +0000 (UTC) Received: (qmail 42105 invoked by uid 500); 3 Mar 2016 04:33:51 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 41975 invoked by uid 500); 3 Mar 2016 04:33:51 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 41959 invoked by uid 99); 3 Mar 2016 04:33:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Mar 2016 04:33:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4EE5E180612 for ; Thu, 3 Mar 2016 04:33:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.18 X-Spam-Level: * X-Spam-Status: No, score=1.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, LOTS_OF_MONEY=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=handybook.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id o0h43jD-bdEJ for ; Thu, 3 Mar 2016 04:33:47 +0000 (UTC) Received: from mail-vk0-f49.google.com (mail-vk0-f49.google.com [209.85.213.49]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D35685FAFB for ; Thu, 3 Mar 2016 04:33:46 +0000 (UTC) Received: by mail-vk0-f49.google.com with SMTP id c3so10084356vkb.3 for ; Wed, 02 Mar 2016 20:33:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=handybook.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=bMlio2OKGb67xbwqX73j/2XwSxszStbPoVsuAEHNlIo=; b=I552+jSuZ8lmbefyO6UYTWOYT56LtvIepHQc8ZqN8HDUkqEvChgjhB2VeiR1H4y14H +JuLHx4O9TLxvSbTtlllzPg1O8UVt1Bo+MkmPqlgIjrHG1xgbRHpuNpct0j+KyZQJMZb wZ16OFFw3U4zaGPWpPGLgf7VQ6/SkC12DizQQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=bMlio2OKGb67xbwqX73j/2XwSxszStbPoVsuAEHNlIo=; b=BlYmdm2Z+Xtic0FnNadincItIHMURrvR10lK9Js5/Vaf2utdym+KyMSm9GDJPtGg2T nyS9zzbPdqGuTX2rt7OkBiz7+vWpV1ZwbxvF38r5TGmdDrjMs0aAVDtbJ5J7nzlwSwW0 SOAwGxglaX/YULrKjmN1XuJuXtF60O88x5ZAdLyNAfIVIyy9usZWb5HP0tt7moaYD69p MumY+e938nVJyDVTzfCcoKxmA6Dc7SsMoEsJ3fX3rgLdjVIB71Gz49+IxeYiCLkDziMf sERWyAzghSZgTBN6GU9j0pFoipmESUYrgfvQYe17DryQvKJJy7x1wx0eb0N3zhcWSbdd 9+jg== X-Gm-Message-State: AD7BkJKtR7QDMdRLuc/gJFJJtH7eeU/SvC/1rg+4lZE59m7Gd3POIRRhuzCWUFhYqVggPwKvuAZor6P/7jO5lrdqi6LlnUj80tB6qQTQln0EQXDn5tAuzt7leeIb+GC+LYv0yJibN67n39dAWkQm8x/QvVk5CjSbGH9yMPn7w/VuEKnKdHjqqeCbTl1DacRk5UcrJoZ54ollZT+jGg== MIME-Version: 1.0 X-Received: by 10.31.21.15 with SMTP id 15mr393802vkv.41.1456979620018; Wed, 02 Mar 2016 20:33:40 -0800 (PST) Received: by 10.31.142.79 with HTTP; Wed, 2 Mar 2016 20:33:39 -0800 (PST) In-Reply-To: References: Date: Wed, 2 Mar 2016 23:33:39 -0500 Message-ID: Subject: Re: Yarn with CapacityScheduler will only schedule two applications - open to consultants today From: Marcin Tustin To: "Naganarasimha G R (Naga)" Cc: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11434b90b253ea052d1d80ae --001a11434b90b253ea052d1d80ae Content-Type: text/plain; charset=UTF-8 Hi Naga, You're quite right. Our fix was balance out the resources in each label more, in order to have the bug bite us less. Beyond that I would direct any readers to your text and my text below. Marcin On Wednesday, March 2, 2016, Naganarasimha G R (Naga) < garlanaganarasimha@huawei.com> wrote: > Hi Marcin, > > *we're seeing is that even though we have 12 machines in our yarn cluster, > yarn will only schedule two applications. * > If this is the problem you are facing with the below configuration then > its mostly YARN-3216 > itself. > And i think its not yet out in any released hadoop version yet (not sure > about HDP) also not aware of the version 2.7.1.2.3 (may be HDP version > number) > IIUC you would have tried submitting in 2 diff partitions "data" and > "yarn" hence 2 if not for a given partition only one would have run (if the > am resource is less than the mimium size CS allows atleast 1 am to run) > > Given that you have already identified the issue, what more are you > expecting ? > > Regards, > + Naga > ------------------------------ > *From:* Marcin Tustin [mtustin@handybook.com > ] > *Sent:* Wednesday, March 02, 2016 20:09 > *To:* user@hadoop.apache.org > > *Subject:* Yarn with CapacityScheduler will only schedule two > applications - open to consultants today > > Hi All, > > We're hitting this issue. If you're a consultant with capacity today (2 > March 2016 EST in New York) please feel free to contact me on or off list. > > In terms of stack, we're using Yarn 2.7.1.2.3 from the latest Hortonworks > distribution. It's possible we're hitting this bug: > https://issues.apache.org/jira/browse/YARN-3216 > > The behaviour we're seeing is that even though we have 12 machines in our > yarn cluster, yarn will only schedule two applications. The breakdown of > machines is: > 10 machines with the data label > 2 with the yarn label > > We have three queues: interactive, noninteractive, and default. > We're expecting to split capacity in the data label 20%/80% and default > 100% of yarn label. > > We have the following capacity scheduler config (Key=value format taken > from ambari): > > yarn.scheduler.capacity.maximum-am-resource-percent=100 > yarn.scheduler.capacity.maximum-applications=10000 > yarn.scheduler.capacity.node-locality-delay=40 > yarn.scheduler.capacity.queue-mappings-override.enable=true > yarn.scheduler.capacity.root.accessible-node-labels=* > yarn.scheduler.capacity.root.accessible-node-labels.data.capacity=100 > > yarn.scheduler.capacity.root.accessible-node-labels.data.maximum-capacity=100 > yarn.scheduler.capacity.root.accessible-node-labels.yarn.capacity=100 > > yarn.scheduler.capacity.root.accessible-node-labels.yarn.maximum-capacity=100 > yarn.scheduler.capacity.root.acl_administer_queue=* > yarn.scheduler.capacity.root.capacity=100 > yarn.scheduler.capacity.root.default-node-label-expression=data > yarn.scheduler.capacity.root.default.accessible-node-labels=yarn > > yarn.scheduler.capacity.root.default.accessible-node-labels.yarn.capacity=100 > > yarn.scheduler.capacity.root.default.accessible-node-labels.yarn.maximum-capacity=100 > yarn.scheduler.capacity.root.default.acl_submit_applications=* > yarn.scheduler.capacity.root.default.capacity=50 > yarn.scheduler.capacity.root.default.default-node-label-expression=yarn > yarn.scheduler.capacity.root.default.maximum-am-resource-percent=80 > yarn.scheduler.capacity.root.default.maximum-capacity=100 > yarn.scheduler.capacity.root.default.minimum-user-limit-percent=100 > yarn.scheduler.capacity.root.default.ordering-policy=fair > > yarn.scheduler.capacity.root.default.ordering-policy.fair.enable-size-based-weight=false > yarn.scheduler.capacity.root.default.state=RUNNING > yarn.scheduler.capacity.root.default.user-limit-factor=100 > yarn.scheduler.capacity.root.interactive.accessible-node-labels=data > > yarn.scheduler.capacity.root.interactive.accessible-node-labels.data.capacity=20 > > yarn.scheduler.capacity.root.interactive.accessible-node-labels.data.maximum-capacity=100 > yarn.scheduler.capacity.root.interactive.acl_administer_queue=* > yarn.scheduler.capacity.root.interactive.acl_submit_applications=* > yarn.scheduler.capacity.root.interactive.capacity=10 > yarn.scheduler.capacity.root.interactive.maximum-am-resource-percent=50 > yarn.scheduler.capacity.root.interactive.maximum-applications=2000 > yarn.scheduler.capacity.root.interactive.maximum-capacity=100 > yarn.scheduler.capacity.root.interactive.minimum-user-limit-percent=100 > yarn.scheduler.capacity.root.interactive.ordering-policy=fifo > yarn.scheduler.capacity.root.interactive.state=RUNNING > yarn.scheduler.capacity.root.interactive.user-limit-factor=100 > yarn.scheduler.capacity.root.maximum-am-resource-percent=80 > yarn.scheduler.capacity.root.maximum-capacity=100 > yarn.scheduler.capacity.root.noninteractive.accessible-node-labels=data > > yarn.scheduler.capacity.root.noninteractive.accessible-node-labels.data.capacity=80 > > yarn.scheduler.capacity.root.noninteractive.accessible-node-labels.data.maximum-am-resource-percent=80 > > yarn.scheduler.capacity.root.noninteractive.accessible-node-labels.data.maximum-capacity=80 > yarn.scheduler.capacity.root.noninteractive.acl_submit_applications=* > yarn.scheduler.capacity.root.noninteractive.capacity=40 > > yarn.scheduler.capacity.root.noninteractive.default-node-label-expression=data > yarn.scheduler.capacity.root.noninteractive.maximum-am-resource-percent=80 > yarn.scheduler.capacity.root.noninteractive.maximum-applications=8000 > yarn.scheduler.capacity.root.noninteractive.maximum-capacity=100 > yarn.scheduler.capacity.root.noninteractive.minimum-user-limit-percent=100 > yarn.scheduler.capacity.root.noninteractive.ordering-policy=fair > > yarn.scheduler.capacity.root.noninteractive.ordering-policy.fair.enable-size-based-weight=false > yarn.scheduler.capacity.root.noninteractive.state=RUNNING > yarn.scheduler.capacity.root.noninteractive.user-limit-factor=100 > yarn.scheduler.capacity.root.queues=default,interactive,noninteractive > yarn.scheduler.capacity.root.user-limit-factor=40 > > > Also yarn.resourcemanager.scheduler.class is > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler. > > Any suggestions gratefully received. > > Marcin > > Want to work at Handy? Check out our culture deck and open roles > > Latest news at Handy > Handy just raised $50m > led > by Fidelity > > -- Want to work at Handy? Check out our culture deck and open roles Latest news at Handy Handy just raised $50m led by Fidelity --001a11434b90b253ea052d1d80ae Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Naga,

You're quite right. Our fix was balance out= the resources in each label more, in order to have the bug bite us less. B= eyond that I would direct any readers to your text and my text below.=C2=A0=

Marcin=C2=A0

On Wednesday, Ma= rch 2, 2016, Naganarasimha G R (Naga) <garlanaganarasimha@huawei.com> wrote:
Hi Marcin,

<= i>we're seeing is that even though we have 12 machines in our yarn clus= ter, yarn will only schedule two applications.=C2=A0
If this = is the problem you are facing with the below configuration then its mostly= =C2=A0YARN-3216=C2=A0itself.= =C2=A0
And i think its not yet out in any released hadoop version yet (not su= re about HDP) also not aware of the version=C2=A0=C2=A0= 2.7.1.2.3 (may be HDP version number)
IIUC you would have tried submitting in 2 = diff partitions "data" and "yarn" hence 2 if not for a given partition only one would have = run (if the am resource is less than the mimium size CS allows atleast 1 am= to run)
<= br>
G= iven that you have already identified the issue, what more are you expectin= g ?

Regards,=
+ Naga

From: Marcin Tustin [mtustin@handybook= .com]
Sent: Wednesday, March 02, 2016 20:09
To: user@hadoop.apache.org
Subject: Yarn with CapacityScheduler will only schedule two applicat= ions - open to consultants today

Hi All,

We're hitting this issue. If you're a consultant with capacity= today (2 March 2016 EST in New York) please feel free to contact me on or = off list.

In terms of stack, we're using Yarn 2.7.1.2.3 from the latest Hortonworks distribution. It's possible we&#= 39;re hitting this bug:=C2=A0https://issues.apache.org/jir= a/browse/YARN-3216

The behaviour we're seeing is that even though we have 12 machines= in our yarn cluster, yarn will only schedule two applications. The breakdo= wn of machines is:
10 machines with the data label
2 with the yarn label

We have three queues: interactive, noninteractive, and default.=C2=A0<= /div>
We're expecting to split capacity in the data label 20%/80% and de= fault 100% of yarn label.

We have the following capacity scheduler config (Key=3Dvalue format ta= ken from ambari):

yarn.scheduler.capacity.maximum-am-resource-percent=3D100
yarn.scheduler.capacity.maximum-applications=3D10000
yarn.scheduler.capacity.node-locality-delay=3D40
yarn.scheduler.capacity.queue-mappings-override.enable=3Dtrue
yarn.scheduler.capacity.root.accessible-node-labels=3D*
yarn.scheduler.capacity.root.accessible-node-labels.data.capacity=3D10= 0
yarn.scheduler.capacity.root.accessible-node-labels.data.maximum-capac= ity=3D100
yarn.scheduler.capacity.root.accessible-node-labels.yarn.capacity=3D10= 0
yarn.scheduler.capacity.root.accessible-node-labels.yarn.maximum-capac= ity=3D100
yarn.scheduler.capacity.root.acl_administer_queue=3D*
yarn.scheduler.capacity.root.capacity=3D100
yarn.scheduler.capacity.root.default-node-label-expression=3Ddata
yarn.scheduler.capacity.root.default.accessible-node-labels=3Dyarn
yarn.scheduler.capacity.root.default.accessible-node-labels.yarn.capac= ity=3D100
yarn.scheduler.capacity.root.default.accessible-node-labels.yarn.maxim= um-capacity=3D100
yarn.scheduler.capacity.root.default.acl_submit_applications=3D*
yarn.scheduler.capacity.root.default.capacity=3D50
yarn.scheduler.capacity.root.default.default-node-label-expression=3Dy= arn
yarn.scheduler.capacity.root.default.maximum-am-resource-percent=3D80<= /div>
yarn.scheduler.capacity.root.default.maximum-capacity=3D100
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=3D100<= /div>
yarn.scheduler.capacity.root.default.ordering-policy=3Dfair
yarn.scheduler.capacity.root.default.ordering-policy.fair.enable-size-= based-weight=3Dfalse
yarn.scheduler.capacity.root.default.state=3DRUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=3D100
yarn.scheduler.capacity.root.interactive.accessible-node-labels=3Ddata=
yarn.scheduler.capacity.root.interactive.accessible-node-labels.data.c= apacity=3D20
yarn.scheduler.capacity.root.interactive.accessible-node-labels.data.m= aximum-capacity=3D100
yarn.scheduler.capacity.root.interactive.acl_administer_queue=3D*
yarn.scheduler.capacity.root.interactive.acl_submit_applications=3D*
yarn.scheduler.capacity.root.interactive.capacity=3D10
yarn.scheduler.capacity.root.interactive.maximum-am-resource-percent= =3D50
yarn.scheduler.capacity.root.interactive.maximum-applications=3D2000
yarn.scheduler.capacity.root.interactive.maximum-capacity=3D100
yarn.scheduler.capacity.root.interactive.minimum-user-limit-percent=3D= 100
yarn.scheduler.capacity.root.interactive.ordering-policy=3Dfifo
yarn.scheduler.capacity.root.interactive.state=3DRUNNING
yarn.scheduler.capacity.root.interactive.user-limit-factor=3D100
yarn.scheduler.capacity.root.maximum-am-resource-percent=3D80
yarn.scheduler.capacity.root.maximum-capacity=3D100
yarn.scheduler.capacity.root.noninteractive.accessible-node-labels=3Dd= ata
yarn.scheduler.capacity.root.noninteractive.accessible-node-labels.dat= a.capacity=3D80
yarn.scheduler.capacity.root.noninteractive.accessible-node-labels.dat= a.maximum-am-resource-percent=3D80
yarn.scheduler.capacity.root.noninteractive.accessible-node-labels.dat= a.maximum-capacity=3D80
yarn.scheduler.capacity.root.noninteractive.acl_submit_applications=3D= *
yarn.scheduler.capacity.root.noninteractive.capacity=3D40
yarn.scheduler.capacity.root.noninteractive.default-node-label-express= ion=3Ddata
yarn.scheduler.capacity.root.noninteractive.maximum-am-resource-percen= t=3D80
yarn.scheduler.capacity.root.noninteractive.maximum-applications=3D800= 0
yarn.scheduler.capacity.root.noninteractive.maximum-capacity=3D100
yarn.scheduler.capacity.root.noninteractive.minimum-user-limit-percent= =3D100
yarn.scheduler.capacity.root.noninteractive.ordering-policy=3Dfair
yarn.scheduler.capacity.root.noninteractive.ordering-policy.fair.enabl= e-size-based-weight=3Dfalse
yarn.scheduler.capacity.root.noninteractive.state=3DRUNNING
yarn.scheduler.capacity.root.noninteractive.user-limit-factor=3D100
yarn.scheduler.capacity.root.queues=3Ddefault,interactive,noninteracti= ve
yarn.scheduler.capacity.root.user-limit-factor=3D40


Also=C2=A0yar= n.resourcem= anager.sche= duler.class is=C2=A0org.apac= he.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.=

Any suggestions gra= tefully received.

Marcin

Want to work at Handy? Check out our=C2= =A0culture deck and open roles
Latest=C2=A0ne= ws=C2=A0at Handy
Handy=C2=A0just = raised $50m=C2=A0led by Fidelity


Want to work at Han= dy? Check out our=C2=A0culture deck and open r= oles
Latest=C2=A0news=C2= =A0at Handy
Handy=C2=A0just raised $50m=C2=A0led by Fidelity

--001a11434b90b253ea052d1d80ae--