hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafał Radecki <radecki.ra...@gmail.com>
Subject Re: Yarn 2.7.3 - capacity scheduler container allocation to nodes?
Date Thu, 10 Nov 2016 09:57:55 GMT
I have already used maximum-capacity for both queues (70 and 30) to limit
their resource usage but it seems that this mechanism does not work on node
level but rather on cluster level.
We have samza tasks on the cluster and they run for a very long time so we
cannot depend on the elasticity mechanism.

2016-11-10 10:31 GMT+01:00 Bibinchundatt <bibin.chundatt@huawei.com>:

> Hi Rafai,
>
>
>
> Probably the following 2 two option you can look into
>
> 1.       *Elasticity* - Free resources can be allocated to any queue
> beyond it’s capacity. When there is demand for these resources from queues
> running below capacity at a future point in time, as tasks scheduled on
> these resources complete, they will be assigned to applications on queues
> running below the capacity (pre-emption is not supported). This ensures
> that resources are available in a predictable and elastic manner to queues,
> thus preventing artifical silos of resources in the cluster which helps
> utilization.
>
>
>
> http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-
> yarn-site/CapacityScheduler.html
>
>
>
>
>
> yarn.scheduler.capacity.<queue-path>.maximum-capacity
>
> Maximum queue capacity in percentage (%) as a float. This limits the
> *elasticity* for applications in the queue. Defaults to -1 which disables
> it.
>
>
>
> 2.       Preemption of containers.
>
>
>
>
>
> Regards
>
> Bibin
>
>
>
> *From:* Rafał Radecki [mailto:radecki.rafal@gmail.com]
> *Sent:* 10 November 2016 17:26
> *To:* Bibinchundatt
> *Cc:* Ravi Prakash; user
>
> *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to
> nodes?
>
>
>
> We have 4 nodes and 4 large (~30GB each tasks), additionally we have about
> 25 small (~2 GB each) tasks. All tasks can possibly be started in random
> order.
> On each node we have 50GB for yarn. So in case we start all 4 large tasks
> at the beginning the are correctly scheduled to all 4 nodes.
> But in case we first start all short tasks they all go to the first
> cluster node and there is no free capacity on it. Then we try to start 4
> large tasks but we only have resources from remaining 3 nodes available and
> cannot start one of the large tasks.
>
>
>
> BR,
>
> Rafal.
>
>
>
> 2016-11-10 9:54 GMT+01:00 Bibinchundatt <bibin.chundatt@huawei.com>:
>
> Hi Rafal!
>
> Is there a way to force yarn to use configured above thresholds (70% and
> 30%) per node?
>
> -Currently we can’t specify threshold per node.
>
>
>
> As per your initial mail Yarn per node is ~50GB means all nodes resources
> are same. Any usecase specifically for per node allocation based on
> percentage?
>
>
>
>
>
> *From:* Rafał Radecki [mailto:radecki.rafal@gmail.com]
> *Sent:* 10 November 2016 14:59
> *To:* Ravi Prakash
> *Cc:* user
> *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to
> nodes?
>
>
>
> Hi Ravi.
>
>
>
> I did not specify labels this time ;) I just created two queues as it is
> visible in the configuration.
>
> Overall queues work but allocation of jobs is different then expected by
> me as I wrote at the beginning.
>
>
>
> BR,
>
> Rafal.
>
>
>
> 2016-11-10 2:48 GMT+01:00 Ravi Prakash <ravihadoop@gmail.com>:
>
> Hi Rafal!
>
> Have you been able to launch the job successfully first without
> configuring node-labels? Do you really need node-labels? How much total
> memory do you have on the cluster? Node labels are usually for specifying
> special capabilities of the nodes (e.g. some nodes could have GPUs and your
> application could request to be run on only the nodes which have GPUs)
>
> HTH
>
> Ravi
>
>
>
> On Wed, Nov 9, 2016 at 5:37 AM, Rafał Radecki <radecki.rafal@gmail.com>
> wrote:
>
> Hi All.
>
>
>
> I have a 4 node cluster on which I run yarn. I created 2 queues "long" and
> "short", first with 70% resource allocation, the second with 30%
> allocation. Both queues are configured on all available nodes by default.
>
>
>
> My memory for yarn per node is ~50GB. Initially I thought that when I will
> run tasks in "short" queue yarn will allocate them on all nodes using 30%
> of the memory on every node. So for example if I run 20 tasks, 2GB each
> (40GB summary), in short queue:
>
> - ~7 first will be scheduled on node1 (14GB total, 30% out of 50GB
> available on this node for "short" queue -> 15GB)
> - next ~7 tasks will be scheduled on node2
>
> - ~6 remaining tasks will be scheduled on node3
>
> - yarn on node4 will not use any resources assigned to "short" queue.
>
> But this seems not to be the case. At the moment I see that all tasks are
> started on node1 and other nodes have no tasks started.
>
>
>
> I attached my yarn-site.xml and capacity-scheduler.xml.
>
>
>
> Is there a way to force yarn to use configured above thresholds (70% and
> 30%) per node and not per cluster as a whole? I would like to get a
> configuration in which on every node 70% is always available for "short"
> queue, 70% for "long" queue and in case any resources are free for a
> particular queue they are not used by other queues. Is it possible?
>
>
>
> BR,
>
> Rafal.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>
>
>
>
>
>
>

Mime
View raw message