hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ray Chiang <rchi...@cloudera.com>
Subject Re: YARN queues become unusable and jobs are stuck in ACCEPTED state
Date Fri, 29 Apr 2016 23:33:00 GMT
Just because you have sufficient resources doesn't mean another job should
launch an AM.  You might want to check maxAMShare
and queueMaxAMShareDefault.

Given that you have sufficient resources, you could be running into
YARN-3491.

I don't know whether you have the option, but CDH 5.3.3 is pretty old at
this point.  CDH 5.3.10/5.4.10/5.5.2 have the latest bug fixes.

-Ray

On Thu, Apr 28, 2016 at 12:03 PM, Matt Cheah <mcheah@palantir.com> wrote:

> Hi,
>
> I¹ve been sporadically seeing an issue when using Hadoop YARN. I¹m using Hadoop 2.5.0,
CDH5.3.3.
>
> When I¹ve configured the stack to use the fair scheduler protocol, after some period
of time of the cluster being alive and running jobs, I¹m noticing that when I submit a job,
the job will be stuck in the ACCEPTED state even though the cluster has sufficient resources
to spawn an application master container as well as the queue I¹m submitting to having sufficient
resources available. Furthermore, all jobs submitted to that queue will be stuck in the ACCEPTED
state. I can unblock job submission by going into the allocation XML file, renaming the queue,
and submitting jobs to that renamed queue instead. However the queue has only changed name,
and all of its other settings have been preserved.
>
> It is clearly untenable for me to have to change the queues that I¹m using sometimes.
This appears to happen irrespective of the settings of the queue, e.g. Its weight or its minimum
resource share. The events leading up to this occurrence are strictly unpredictable and I
have no concrete way to reproduce the issue. The logs don¹t show anything interesting either;
the resource manager just states that it schedules an attempt for the application submitted
to the bad queue, but the attempt¹s application master is never allocated to a container
anywhere.
>
> I have looked around the YARN bug base and couldn¹t find any similar issues. I¹ve also
used jstack to inspect the Resource Manager process, but nothing is obviously wrong there.
I was wondering if anyone has encountered a similar issue before. I apologize that the description
is vague, but it¹s the best way I can describe it.
>
> Thanks,
>
> -Matt Cheah
>
>
>

Mime
View raw message