hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From iain wright <iainw...@gmail.com>
Subject Re: Applications bottlenecked in ACCEPTED state ..
Date Wed, 26 Oct 2016 22:36:50 GMT
Thanks for contributing back your findings @Gautam


Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Wed, Oct 26, 2016 at 2:02 PM, Gautam <gautamkowshik@gmail.com> wrote:

> Figured what was causing the bottleneck. Realized the following parameters
> are very important for scheduling in large clusters or clusters with beefy
> nodes.
> Following properties in yarn-site helped job throughput:
> - yarn.scheduler.fair.continuous-scheduling-enabled = true  : Spins off a
> thread dedicated to assigning containers to app attempts.
> - yarn.scheduler.fair.assignmultiple = true : Allows multiple containers
> to be assigned on each scheduling attempt.
> This speeds up scheduler performance considerably and more importantly
> reduces uncertainty and noise in scheduling frequency. Surprisingly, these
> didn't show up  on any Hadoop presentations, docs or the usual blogs, so
> hopefully this is useful for someone else.
> -Gautam.
> On Tue, Oct 25, 2016 at 8:09 PM Gautam <gautamkowshik@gmail.com> wrote:
>> Hello Mighty Hadoop Users,
>>                                           We'v been running into
>> applications getting bottlenecked (MR/Tez) now and then. Apps get stuck in
>> the ACCEPTED state and take random times to reach RUNNING. Our cluster is
>> not particularly at peak load capacity wise but might be related to sudden
>> submission of applications.
>> Scenario that I'm concerned about and trying to fix/optimize:
>>  - Applications start piling up in ACCEPTED state. App gets submitted,
>>  transitions  from SUBMITTED to ACCEPTED.  Remains here for 5mins or 10
>> mins or even 30 mins in many cases doing nothing.
>>  - Queue of this app, at the time, has available capacity during this
>> time.
>>  - There is no user-limit configured. We use fair-share scheduler so I
>> don't think a default user limit is applied. *Please correct me if i'm
>> wrong*
>>  - Suddenly get's into RUNNING and finishes as usual.
>> We use Hadoop 2.6.0 (cdh5.7.4), most concerned configurations are
>> default. These are all Mapreduce and Tez jobs. I tried increasing yarn.
>> resourcemanager.scheduler.client.thread-count=100
>> and yarn.resourcemanager.amlauncher.thread-count=100 but didn't help.
>> I have attached the RM debug log (filtered by app that was stuck for 11
>> mins) and NM log for the AM of that app. Would like to know what tuning can
>> help with this.
>> Much Appreciated,
>> -Gautam.

View raw message