hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafał Radecki <radecki.ra...@gmail.com>
Subject Re: Yarn 2.7.3 - capacity scheduler container allocation to nodes?
Date Thu, 10 Nov 2016 09:26:11 GMT
We have 4 nodes and 4 large (~30GB each tasks), additionally we have about
25 small (~2 GB each) tasks. All tasks can possibly be started in random
order.
On each node we have 50GB for yarn. So in case we start all 4 large tasks
at the beginning the are correctly scheduled to all 4 nodes.
But in case we first start all short tasks they all go to the first cluster
node and there is no free capacity on it. Then we try to start 4 large
tasks but we only have resources from remaining 3 nodes available and
cannot start one of the large tasks.

BR,
Rafal.

2016-11-10 9:54 GMT+01:00 Bibinchundatt <bibin.chundatt@huawei.com>:

> Hi Rafal!
>
> Is there a way to force yarn to use configured above thresholds (70% and
> 30%) per node?
>
> -Currently we can’t specify threshold per node.
>
>
>
> As per your initial mail Yarn per node is ~50GB means all nodes resources
> are same. Any usecase specifically for per node allocation based on
> percentage?
>
>
>
>
>
> *From:* Rafał Radecki [mailto:radecki.rafal@gmail.com]
> *Sent:* 10 November 2016 14:59
> *To:* Ravi Prakash
> *Cc:* user
> *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to
> nodes?
>
>
>
> Hi Ravi.
>
>
>
> I did not specify labels this time ;) I just created two queues as it is
> visible in the configuration.
>
> Overall queues work but allocation of jobs is different then expected by
> me as I wrote at the beginning.
>
>
>
> BR,
>
> Rafal.
>
>
>
> 2016-11-10 2:48 GMT+01:00 Ravi Prakash <ravihadoop@gmail.com>:
>
> Hi Rafal!
>
> Have you been able to launch the job successfully first without
> configuring node-labels? Do you really need node-labels? How much total
> memory do you have on the cluster? Node labels are usually for specifying
> special capabilities of the nodes (e.g. some nodes could have GPUs and your
> application could request to be run on only the nodes which have GPUs)
>
> HTH
>
> Ravi
>
>
>
> On Wed, Nov 9, 2016 at 5:37 AM, Rafał Radecki <radecki.rafal@gmail.com>
> wrote:
>
> Hi All.
>
>
>
> I have a 4 node cluster on which I run yarn. I created 2 queues "long" and
> "short", first with 70% resource allocation, the second with 30%
> allocation. Both queues are configured on all available nodes by default.
>
>
>
> My memory for yarn per node is ~50GB. Initially I thought that when I will
> run tasks in "short" queue yarn will allocate them on all nodes using 30%
> of the memory on every node. So for example if I run 20 tasks, 2GB each
> (40GB summary), in short queue:
>
> - ~7 first will be scheduled on node1 (14GB total, 30% out of 50GB
> available on this node for "short" queue -> 15GB)
> - next ~7 tasks will be scheduled on node2
>
> - ~6 remaining tasks will be scheduled on node3
>
> - yarn on node4 will not use any resources assigned to "short" queue.
>
> But this seems not to be the case. At the moment I see that all tasks are
> started on node1 and other nodes have no tasks started.
>
>
>
> I attached my yarn-site.xml and capacity-scheduler.xml.
>
>
>
> Is there a way to force yarn to use configured above thresholds (70% and
> 30%) per node and not per cluster as a whole? I would like to get a
> configuration in which on every node 70% is always available for "short"
> queue, 70% for "long" queue and in case any resources are free for a
> particular queue they are not used by other queues. Is it possible?
>
>
>
> BR,
>
> Rafal.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>
>
>
>
>

Mime
View raw message