hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Resource Manager Queue builds up
Date Wed, 13 Mar 2013 03:19:48 GMT

Okay, I think you ran into some bug/bottleneck, but we need your help to figure this out.

Can you wait till the queue builds up beyond some threshold, say 40K events, stop the load
for some time, and see if the queue is emptied?

Can you share more info on your benchmark/load so that we can reproduce this? How fast do
your containers finish? Their wall time? And how do they exit - normally by process exit or
something else like AM releasing them?

There are issues around a single dispatcher, but at your scale you shouldn't be seeing this.
If nothing, you can try applying the patch at https://issues.apache.org/jira/browse/YARN-365
and see if it helps.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Mar 12, 2013, at 8:08 PM, Muntasir Raihan Rahman wrote:

> Here is the tail of the latest yarn-log:
> 
> "2013-03-12 19:43:09,584 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 25000
> 2013-03-12 19:45:32,588 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 26000
> 2013-03-12 19:47:55,634 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 27000
> 2013-03-12 19:50:18,694 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 28000
> 2013-03-12 19:52:41,442 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 29000
> 2013-03-12 19:55:04,548 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 30000
> 2013-03-12 19:57:27,664 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 31000
> 2013-03-12 19:59:50,767 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 32000
> 2013-03-12 20:02:13,496 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 33000
> 2013-03-12 20:04:36,365 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 34000
> 2013-03-12 20:06:59,441 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 35000
> 2013-03-12 20:09:22,550 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 36000
> 2013-03-12 20:11:45,640 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 37000
> 2013-03-12 20:14:08,515 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 38000
> 2013-03-12 20:16:31,626 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 39000
> 2013-03-12 20:18:54,362 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 40000
> 2013-03-12 20:21:17,439 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 41000
> 2013-03-12 20:23:40,606 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 42000
> 2013-03-12 20:26:03,267 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 43000
> 2013-03-12 20:28:26,352 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 44000
> 2013-03-12 20:30:49,380 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 45000
> 2013-03-12 20:33:12,450 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 46000
> 2013-03-12 20:35:35,568 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 47000
> 2013-03-12 20:37:58,686 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 48000
> 2013-03-12 20:40:21,470 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 49000
> 2013-03-12 20:42:44,165 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 50000
> 2013-03-12 20:45:07,242 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 51000
> 2013-03-12 20:47:30,241 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 52000
> 2013-03-12 20:49:53,272 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 53000
> 2013-03-12 20:52:16,370 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 54000
> 2013-03-12 20:54:39,436 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 55000
> 2013-03-12 20:57:02,349 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 56000
> 2013-03-12 20:59:25,471 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 57000
> 2013-03-12 21:01:48,512 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 58000
> 2013-03-12 21:04:11,107 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 59000
> 2013-03-12 21:06:34,099 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Size of event-queue is 60000
> "
> 
> So the queue grows quite a bit.
> 
> Muntasir.
> 
> On Tue, Mar 12, 2013 at 9:59 PM, Muntasir Raihan Rahman <
> muntasir.raihan@gmail.com> wrote:
> 
>> I am running it on a 8 node emulab cluster. We have one master and 7
>> slaves, each with 3 containers. The capacity scheduler web interface
>> becomes un-responsive, and the yarn logs show that the queue size keeps
>> increasing. The number of concurrent running applications is 2-3, but they
>> always occupy all containers in the cluster.
>> 
>> The queue size increases upto 10, 000 on some runs.
>> 
>> Muntasir.
>> 
>> 
>> On Tue, Mar 12, 2013 at 9:55 PM, Vinod Kumar Vavilapalli <
>> vinodkv@hortonworks.com> wrote:
>> 
>>> And what's the number of nodes you have? Number of concurrent running
>>> applications? ResourceManager log-level?
>>> 
>>> A 1000 event-queue size isn't really a problem, till what size does it
>>> keep on increasing? And how did you find out that ResourceManager is
>>> "freezing", meaning what are the symptoms you were observing?
>>> 
>>> Thanks,
>>> +Vinod Kumar Vavilapalli
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>> 
>>> On Mar 11, 2013, at 11:28 PM, Muntasir Raihan Rahman wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am trying to do some experiments with hadoop yarn. I am submitting a
>>>> large number of jobs to a queue, but after some time the resource
>>> manager
>>>> freezes, and I get the following yarn log message: "Size of event-queue
>>> is
>>>> 1000", and the size keeps increasing. I tried to increase the
>>>> node-manager-heartbeat interval from 1sec to 3sec, but I still see the
>>> same
>>>> problem.
>>>> 
>>>> Can anyone please give me a hint at the problem, and how to avoid
>>> resource
>>>> manager queue build up?
>>>> 
>>>> Thanks
>>>> Muntasir.
>>>> 
>>>> --
>>>> Best Regards
>>>> Muntasir Raihan Rahman
>>>> Email: muntasir.raihan@gmail.com
>>>> Phone: 1-217-979-9307
>>>> Department of Computer Science,
>>>> University of Illinois Urbana Champaign,
>>>> 3111 Siebel Center,
>>>> 201 N. Goodwin Avenue,
>>>> Urbana, IL  61801
>>> 
>>> 
>> 
>> 
>> --
>> Best Regards
>> Muntasir Raihan Rahman
>> Email: muntasir.raihan@gmail.com
>> Phone: 1-217-979-9307
>> Department of Computer Science,
>> University of Illinois Urbana Champaign,
>> 3111 Siebel Center,
>> 201 N. Goodwin Avenue,
>> Urbana, IL  61801
>> 
> 
> 
> 
> -- 
> Best Regards
> Muntasir Raihan Rahman
> Email: muntasir.raihan@gmail.com
> Phone: 1-217-979-9307
> Department of Computer Science,
> University of Illinois Urbana Champaign,
> 3111 Siebel Center,
> 201 N. Goodwin Avenue,
> Urbana, IL  61801


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message