hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (Naga)" <garlanaganarasi...@huawei.com>
Subject RE: Concurrency control
Date Tue, 29 Sep 2015 09:22:55 GMT
Thanks Rohith for your thoughts ,
      But i think by this configuration it might not completely solve the scenario mentioned
by Laxman, As if the there is some time gap between first and and the second app then though
we have fairness or priority set for apps starvation will be there.
IIUC we can think of an approach where in we can have something similar to "yarn.scheduler.capacity.<queue-path>.user-limit-factor"
 where in it can provide  the functionality like
"yarn.scheduler.capacity.<queue-path>.app-limit-factor" : The multiple of the queue
capacity which can be configured to allow a single app to acquire more resources.  Thoughts
?

+ Naga



________________________________
From: Rohith Sharma K S [rohithsharmaks@huawei.com]
Sent: Tuesday, September 29, 2015 14:07
To: user@hadoop.apache.org
Subject: RE: Concurrency control

Hi Laxman,

In Hadoop-2.8(Not released  yet),  CapacityScheduler provides configuration for configuring
ordering policy.  By configuring FAIR_ORDERING_POLICY in CS , probably you should be able
to achieve  your goal i.e avoiding starving of applications for resources.


org.apache.hadoop.yarn.server.resourcemanager.scheduler.policy.FairOrderingPolicy<S<UrlBlockedError.aspx>>
An OrderingPolicy which orders SchedulableEntities for fairness (see FairScheduler FairSharePolicy),
generally, processes with lesser usage are lesser. If sizedBasedWeight is set to true then
an application with high demand may be prioritized ahead of an application with less usage.
This is to offset the tendency to favor small apps, which could result in starvation for large
apps if many small ones enter and leave the queue continuously (optional, default false)


Community Issue Id :  https://issues.apache.org/jira/browse/YARN-3463

Thanks & Regards
Rohith Sharma K S

From: Laxman Ch [mailto:laxman.lux@gmail.com]
Sent: 29 September 2015 13:36
To: user@hadoop.apache.org
Subject: Re: Concurrency control

Bouncing this thread again. Any other thoughts please?

On 17 September 2015 at 23:21, Laxman Ch <laxman.lux@gmail.com<mailto:laxman.lux@gmail.com>>
wrote:
No Naga. That wont help.

I am running two applications (app1 - 100 vcores, app2 - 100 vcores) with same user which
runs in same queue (capacity=100vcores). In this scenario, if app1 triggers first occupies
all the slots and runs longs then app2 will starve longer.

Let me reiterate my problem statement. I wanted "to control the amount of resources (vcores,
memory) used by an application SIMULTANEOUSLY"

On 17 September 2015 at 22:28, Naganarasimha Garla <naganarasimha.gr@gmail.com<mailto:naganarasimha.gr@gmail.com>>
wrote:
Hi Laxman,
For the example you have stated may be we can do the following things :
1. Create/modify the queue with capacity and max cap set such that its equivalent to 100 vcores.
So as there is no elasticity, given application will not be using the resources beyond the
capacity configured
2. yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent   so that each active
user would be assured with the minimum guaranteed resources . By default value is 100 implies
no user limits are imposed.

Additionally we can think of "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage"
which will enforce strict cpu usage for a given container if required.

+ Naga

On Thu, Sep 17, 2015 at 4:42 PM, Laxman Ch <laxman.lux@gmail.com<mailto:laxman.lux@gmail.com>>
wrote:
Yes. I'm already using cgroups. Cgroups helps in controlling the resources at container level.
But my requirement is more about controlling the concurrent resource usage of an application
at whole cluster level.

And yes, we do configure queues properly. But, that won't help.

For example, I have an application with a requirement of 1000 vcores. But, I wanted to control
this application not to go beyond 100 vcores at any point of time in the cluster/queue. This
makes that application to run longer even when my cluster is free but I will be able meet
the guaranteed SLAs of other applications.

Hope this helps to understand my question.

And thanks Narasimha for quick response.

On 17 September 2015 at 16:17, Naganarasimha Garla <naganarasimha.gr@gmail.com<mailto:naganarasimha.gr@gmail.com>>
wrote:
Hi Laxman,
Yes if cgroups are enabled and "yarn.scheduler.capacity.resource-calculator" configured to
DominantResourceCalculator then cpu and memory can be controlled.
Please Kindly  furhter refer to the official documentation
http://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html

But may be if say more about problem then we can suggest ideal configuration, seems like capacity
configuration and splitting of the queue is not rightly done or you might refer to Fair Scheduler
if you want more fairness for container allocation for different apps.

On Thu, Sep 17, 2015 at 4:10 PM, Laxman Ch <laxman.lux@gmail.com<mailto:laxman.lux@gmail.com>>
wrote:
Hi,

In YARN, do we have any way to control the amount of resources (vcores, memory) used by an
application SIMULTANEOUSLY.

- In my cluster, noticed some large and long running mr-app occupied all the slots of the
queue and blocking other apps to get started.
- I'm using Capacity schedulers (using hierarchical queues and preemption disabled)
- Using Hadoop version 2.6.0
- Did some googling around this and gone through configuration docs but I'm not able to find
anything that matches my requirement.

If needed, I can provide more details on the usecase and problem.

--
Thanks,
Laxman




--
Thanks,
Laxman




--
Thanks,
Laxman



--
Thanks,
Laxman

Mime
View raw message