Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9DF05200BBB for ; Thu, 10 Nov 2016 19:26:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 9CCC9160B01; Thu, 10 Nov 2016 18:26:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4B859160AF7 for ; Thu, 10 Nov 2016 19:26:49 +0100 (CET) Received: (qmail 66539 invoked by uid 500); 10 Nov 2016 18:26:09 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 66300 invoked by uid 99); 10 Nov 2016 18:26:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Nov 2016 18:26:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 971871A9F0B for ; Thu, 10 Nov 2016 18:26:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.68 X-Spam-Level: ** X-Spam-Status: No, score=2.68 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BnB6AgLmFleu for ; Thu, 10 Nov 2016 18:26:04 +0000 (UTC) Received: from mail-it0-f50.google.com (mail-it0-f50.google.com [209.85.214.50]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 518595FC0E for ; Thu, 10 Nov 2016 18:26:04 +0000 (UTC) Received: by mail-it0-f50.google.com with SMTP id q124so253537906itd.1 for ; Thu, 10 Nov 2016 10:26:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=w5haOxoFuHNdShXAqPoeSrOHmcwHrE/lLRNaQKur6J8=; b=0s8TrVPaP+YUu8JpscVpfbS91NCk3y2HX2CJl058Nl4Yw8GaLTFBWxG3Q2sBoDf5nX zKIo7/sot+sEWKlrJ1oRVzeVnHUMQN+m+oRQQsvkwUv4tld5DPWUA4Totfaq6aTs22q9 v/tFeeTOlCTtu+uqWjTD2aI8aZFHgSxGIbkkVPEnrDZhMPRdcofcbGC1+fUkwcoiIgYm aVEz5oBSVtQ/7mx9VroKXyTm/A006xXEeuGilG0m3nA6rvMAomCALv9bfUGmnjmhozn+ kLWZJgReBW5OUM4dBqg3O3T943DjX/yXrWztjNXg0AH8VbIynSuyfgtbtPqksn4xJ6JG uZ5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=w5haOxoFuHNdShXAqPoeSrOHmcwHrE/lLRNaQKur6J8=; b=RTlqrI7G/2pHeaoPQhgPKk1N4jubLfSWqq2+JPOh1tU6j9Gt4V/c9it80TOyPmF9CR 8ouiNbDIkvJYYknEPDtnJYFsNJ78q+NQi2xsotlUGzVVHuEYmkZfI3k/5ZWTuCSd3lzt ypNMOwOs9Tn5kMeYmHdB1oy8yDYa5At2GPZ/s6EHTv3hC/d02rbDJR+Dsznbj2Bto9Cz VskJOEkH8lqURhHeeEXw/h27NtINjfoLN7cxau6NNRM2go+Os42tLktHoLvnW6T+kinI GQl6DBULfL5y/aMHUcN6PEbukQN3EtNX+ECrEoOrNzJrRy1gkGqvd+9UB2b/qR8ilLnz sizg== X-Gm-Message-State: ABUngvdLWA3dQm+T6qEtwFvcn19RjFgbhRr7H5vnYa0+jbXPAxXxW2k0hXfqPKe3Cw03ZAj9kRLQRlw7CN8mAA== X-Received: by 10.107.195.129 with SMTP id t123mr1260193iof.221.1478802363686; Thu, 10 Nov 2016 10:26:03 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.192.65 with HTTP; Thu, 10 Nov 2016 10:26:03 -0800 (PST) In-Reply-To: References: From: Ravi Prakash Date: Thu, 10 Nov 2016 10:26:03 -0800 Message-ID: Subject: Re: Yarn 2.7.3 - capacity scheduler container allocation to nodes? To: =?UTF-8?Q?Rafa=C5=82_Radecki?= Cc: Bibinchundatt , user Content-Type: multipart/alternative; boundary=94eb2c187d7e94a34c0540f68149 archived-at: Thu, 10 Nov 2016 18:26:50 -0000 --94eb2c187d7e94a34c0540f68149 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Is there a reason you want that behavior? I'm not sure you can get it easily. Here's a link to the code that may be coming into play (depending on your configuration) : https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hado= op-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java= /org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue= .java#L1372 On Thu, Nov 10, 2016 at 1:57 AM, Rafa=C5=82 Radecki wrote: > I have already used maximum-capacity for both queues (70 and 30) to limit > their resource usage but it seems that this mechanism does not work on no= de > level but rather on cluster level. > We have samza tasks on the cluster and they run for a very long time so w= e > cannot depend on the elasticity mechanism. > > 2016-11-10 10:31 GMT+01:00 Bibinchundatt : > >> Hi Rafai, >> >> >> >> Probably the following 2 two option you can look into >> >> 1. *Elasticity* - Free resources can be allocated to any queue >> beyond it=E2=80=99s capacity. When there is demand for these resources f= rom queues >> running below capacity at a future point in time, as tasks scheduled on >> these resources complete, they will be assigned to applications on queue= s >> running below the capacity (pre-emption is not supported). This ensures >> that resources are available in a predictable and elastic manner to queu= es, >> thus preventing artifical silos of resources in the cluster which helps >> utilization. >> >> >> >> http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn >> -site/CapacityScheduler.html >> >> >> >> >> >> yarn.scheduler.capacity..maximum-capacity >> >> Maximum queue capacity in percentage (%) as a float. This limits the >> *elasticity* for applications in the queue. Defaults to -1 which >> disables it. >> >> >> >> 2. Preemption of containers. >> >> >> >> >> >> Regards >> >> Bibin >> >> >> >> *From:* Rafa=C5=82 Radecki [mailto:radecki.rafal@gmail.com] >> *Sent:* 10 November 2016 17:26 >> *To:* Bibinchundatt >> *Cc:* Ravi Prakash; user >> >> *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to >> nodes? >> >> >> >> We have 4 nodes and 4 large (~30GB each tasks), additionally we have >> about 25 small (~2 GB each) tasks. All tasks can possibly be started in >> random order. >> On each node we have 50GB for yarn. So in case we start all 4 large task= s >> at the beginning the are correctly scheduled to all 4 nodes. >> But in case we first start all short tasks they all go to the first >> cluster node and there is no free capacity on it. Then we try to start 4 >> large tasks but we only have resources from remaining 3 nodes available = and >> cannot start one of the large tasks. >> >> >> >> BR, >> >> Rafal. >> >> >> >> 2016-11-10 9:54 GMT+01:00 Bibinchundatt : >> >> Hi Rafal! >> >> Is there a way to force yarn to use configured above thresholds (70% and >> 30%) per node? >> >> -Currently we can=E2=80=99t specify threshold per node. >> >> >> >> As per your initial mail Yarn per node is ~50GB means all nodes resource= s >> are same. Any usecase specifically for per node allocation based on >> percentage? >> >> >> >> >> >> *From:* Rafa=C5=82 Radecki [mailto:radecki.rafal@gmail.com] >> *Sent:* 10 November 2016 14:59 >> *To:* Ravi Prakash >> *Cc:* user >> *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to >> nodes? >> >> >> >> Hi Ravi. >> >> >> >> I did not specify labels this time ;) I just created two queues as it is >> visible in the configuration. >> >> Overall queues work but allocation of jobs is different then expected by >> me as I wrote at the beginning. >> >> >> >> BR, >> >> Rafal. >> >> >> >> 2016-11-10 2:48 GMT+01:00 Ravi Prakash : >> >> Hi Rafal! >> >> Have you been able to launch the job successfully first without >> configuring node-labels? Do you really need node-labels? How much total >> memory do you have on the cluster? Node labels are usually for specifyin= g >> special capabilities of the nodes (e.g. some nodes could have GPUs and y= our >> application could request to be run on only the nodes which have GPUs) >> >> HTH >> >> Ravi >> >> >> >> On Wed, Nov 9, 2016 at 5:37 AM, Rafa=C5=82 Radecki >> wrote: >> >> Hi All. >> >> >> >> I have a 4 node cluster on which I run yarn. I created 2 queues "long" >> and "short", first with 70% resource allocation, the second with 30% >> allocation. Both queues are configured on all available nodes by default= . >> >> >> >> My memory for yarn per node is ~50GB. Initially I thought that when I >> will run tasks in "short" queue yarn will allocate them on all nodes usi= ng >> 30% of the memory on every node. So for example if I run 20 tasks, 2GB e= ach >> (40GB summary), in short queue: >> >> - ~7 first will be scheduled on node1 (14GB total, 30% out of 50GB >> available on this node for "short" queue -> 15GB) >> - next ~7 tasks will be scheduled on node2 >> >> - ~6 remaining tasks will be scheduled on node3 >> >> - yarn on node4 will not use any resources assigned to "short" queue. >> >> But this seems not to be the case. At the moment I see that all tasks ar= e >> started on node1 and other nodes have no tasks started. >> >> >> >> I attached my yarn-site.xml and capacity-scheduler.xml. >> >> >> >> Is there a way to force yarn to use configured above thresholds (70% and >> 30%) per node and not per cluster as a whole? I would like to get a >> configuration in which on every node 70% is always available for "short" >> queue, 70% for "long" queue and in case any resources are free for a >> particular queue they are not used by other queues. Is it possible? >> >> >> >> BR, >> >> Rafal. >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org >> For additional commands, e-mail: user-help@hadoop.apache.org >> >> >> >> >> >> >> > > --94eb2c187d7e94a34c0540f68149 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Is there a reason you want that behavior? I'm not sure= you can get it easily. Here's a link to the code that may be coming in= to play (depending on your configuration) : https:/= /github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hadoop-yarn= /hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/ap= ache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java#L= 1372

On Thu, Nov 10, 2016 at 1:57 AM, Rafa=C5=82 Radecki <= radecki.rafal@= gmail.com> wrote:
I have already used maximum-capacity for both queues (70 and 30) t= o limit their resource usage but it seems that this mechanism does not work= on node level but rather on cluster level.
We have samza tasks on the c= luster and they run for a very long time so we cannot depend on the elastic= ity mechanism.

2016-11-10 10:31 GMT+01:00 Bibin= chundatt <bibin.chundatt@huawei.com>:

Hi Rafai,

=C2=A0

Probably the following 2 = two option you can look into

= 1.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Ela= sticity=C2=A0- Free resources can be allocated to any queue beyond it=E2=80=99s capacity.= When there is demand for these resources from queues running below capacit= y at a future point in time, as tasks scheduled on these resources complete= , they will be assigned to applications on queues running below the capacity (pre-emption is not supported). This = ensures that resources are available in a predictable and elastic manner to= queues, thus preventing artifical silos of resources in the cluster which = helps utilization.

=C2=A0

http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/ha= doop-yarn-site/CapacityScheduler.html

=C2=A0

=C2=A0

yarn.scheduler.capacity.<queue-path&= gt;.maximum-capacity<= /p>

Maximum queue capacity in = percentage (%) as a float. This limits the=C2=A0elasticity=C2=A0for = applications in the queue. Defaults to -1 which disables it.<= /span>

=C2=A0

= 2.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Preemption of contai= ners.

=C2=A0

=C2=A0

Regards

Bibin

=C2=A0

From: Rafa= =C5=82 Radecki [mailto:radecki.rafal@gmail.com]
Sent: 10 November 2016 17:26
To: Bibinchundatt
Cc: Ravi Prakash; user


Subject: Re: Yarn 2.7.3 - capacity scheduler container allocation to= nodes?

=C2=A0

We have 4 nodes and 4 large (~30GB each tasks), addi= tionally we have about 25 small (~2 GB each) tasks. All tasks can possibly = be started in random order.
On each node we have 50GB for yarn. So in case we start all 4 large tasks a= t the beginning the are correctly scheduled to all 4 nodes.
But in case we first start all short tasks they all go to the first cluster= node and there is no free capacity on it. Then we try to start 4 large tas= ks but we only have resources from remaining 3 nodes available and cannot s= tart one of the large tasks.

=C2=A0

BR,

Rafal.

=C2=A0

2016-11-10 9:54 GMT+01:00 Bibinchundatt <bibin.chundatt@huaw= ei.com>:

Hi Rafal!

Is there a way to force yarn to use configured above= thresholds (70% and 30%) per node?

-Currently we can=E2=80=99t specify threshold per no= de.

=C2=A0

As per your initial mail Yarn per node is ~50GB mean= s all nodes resources are same. Any usecase specifically for per node alloc= ation based on percentage?

=C2=A0

=C2=A0

From: Rafa= =C5=82 Radecki [mailto:radecki.rafal@gmail.com]
Sent: 10 November 2016 14:59
To: Ravi Prakash
Cc: user
Subject: Re: Yarn 2.7.3 - capacity scheduler container allocation to= nodes?

=C2=A0

Hi Ravi.

=C2=A0

I did not specify labels this time ;) I just created= two queues as it is visible in the configuration.

Overall queues work but allocation of jobs is differ= ent then expected by me as I wrote at the beginning.

=C2=A0

BR,

Rafal.

=C2=A0

2016-11-10 2:48 GMT+01:00 Ravi Prakash <ravihadoop@gmail.com= >:

Hi Rafal!

Have you been able to= launch the job successfully first without configuring node-labels? Do you = really need node-labels? How much total memory do you have on the cluster? = Node labels are usually for specifying special capabilities of the nodes (e.g. some no= des could have GPUs and your application could request to be run on only th= e nodes which have GPUs)

HTH

Ravi

=C2=A0

On Wed, Nov 9, 2016 at 5:37 AM, Rafa=C5=82 Radecki &= lt;radecki.raf= al@gmail.com> wrote:

Hi All.

=C2=A0

I have a 4 node cluster on which I run yarn. I creat= ed 2 queues "long" and "short", first with 70% resource= allocation, the second with 30% allocation. Both queues are configured on all available nodes by default.

=C2=A0

My memory for yarn per node is ~50GB. Initially I th= ought that when I will run tasks in "short" queue yarn will alloc= ate them on all nodes using 30% of the memory on every node. So for example if I run 20 tasks, 2GB each (40GB summary), in short queue:=

- ~7 first will be scheduled on node1 (14GB total, 3= 0% out of 50GB available on this node for "short" queue -> 15G= B)
- next ~7 tasks will be scheduled on node2

- ~6 remaining tasks will be scheduled on node3

- yarn on node4 will not use any resources assigned = to "short" queue.

But this seems not to be the case. At the moment I s= ee that all tasks are started on node1 and other nodes have no tasks starte= d.

=C2=A0

I attached my yarn-site.xml and capacity-scheduler.x= ml.

=C2=A0

Is there a way to force yarn to use configured above= thresholds (70% and 30%) per node and not per cluster as a whole? I would = like to get a configuration in which on every node 70% is always available for "short" queue, 70% for "long&qu= ot; queue and in case any resources are free for a particular queue they ar= e not used by other queues. Is it possible?

=C2=A0

BR,

Rafal.

=C2=A0<= /p>

-----------------------------------------------= ----------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

=C2=A0

=C2=A0

=C2=A0



--94eb2c187d7e94a34c0540f68149--