hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohith Sharma <rohithsharm...@gmail.com>
Subject Re: ResouceManager hung: org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1000
Date Fri, 29 May 2015 03:17:53 GMT
Hi

Can you take thread dump and verify it?

jstack <pid> > RM.out 
OR
kill -3 <pid>  (Note : head dump will be logged in out file)

Thanks & Regards
Rohith Sharma K S

> On May 29, 2015, at 8:43 AM, jason lu <ljhn1829@gmail.com> wrote:
> 
> 
> Hi,
>     I met the same problem as : http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%3C482C5F6F-6FEB-4552-99F5-07C8B54ACE20@apache.org%3E
<http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%3C482C5F6F-6FEB-4552-99F5-07C8B54ACE20@apache.org%3E>
> 
>  Any idea about that?
>   It almost hadoop every 3 or 4weeks in my cluster(about 150 nodes).
> I check the log, no warn, no error, no exception, but the ResouceManager hung, not crash.
> 
> I found this code, but I have no idea why it happens, why the event is bigger and bigger?
> 
> thanks.
> 
>    private final class EventProcessor implements Runnable {
>       @Override
>       public void run() {
> 
>         SchedulerEvent event;
> 
>         while (!stopped && !Thread.currentThread().isInterrupted()) {
>           try {
>             event = eventQueue.take();
>           } catch (InterruptedException e) {
>             LOG.error("Returning, interrupted : " + e);
>             return; // TODO: Kill RM.
>           }
> 
>           try {
>             scheduler.handle(event);
>           } catch (Throwable t) {
>             // An error occurred, but we are shutting down anyway.
>             // If it was an InterruptedException, the very act of 
>             // shutdown could have caused it and is probably harmless.
>             if (stopped) {
>               LOG.warn("Exception during shutdown: ", t);
>               break;
>             }
>             LOG.fatal("Error in handling event type " + event.getType()
>                 + " to the scheduler", t);
>             if (shouldExitOnError
>                 && !ShutdownHookManager.get().isShutdownInProgress()) {
>               LOG.info("Exiting, bbye..");
>               System.exit(-1);
>             }
>           }
>         }
>       }
>     }
> 
>     @Override
>     protected void serviceStop() throws Exception {
>       this.stopped = true;
>       this.eventProcessor.interrupt();
>       try {
>         this.eventProcessor.join();
>       } catch (InterruptedException e) {
>         throw new YarnRuntimeException(e);
>       }
>       super.serviceStop();
>     }
> 
>     @Override
>     public void handle(SchedulerEvent event) {
>       try {
>         int qSize = eventQueue.size();
>         if (qSize !=0 && qSize %1000 == 0) {
>           LOG.info("Size of scheduler event-queue is " + qSize);
>         }
>         int remCapacity = eventQueue.remainingCapacity();
>         if (remCapacity < 1000) {
>           LOG.info("Very low remaining capacity on scheduler event queue: "
>               + remCapacity);
>         }
>         this.eventQueue.put(event);
>       } catch (InterruptedException e) {
>         throw new YarnRuntimeException(e);
>       }
>     }
>   }
> 
> logs:
> 
> grep 'Size of event-queue' yarn-hadoop-resourcemanager-gdc-hm01-formal.i.nease.net.log
> 2015-05-29 00:54:46,985 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 1000
> 2015-05-29 00:55:28,850 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 2000
> 2015-05-29 00:56:10,204 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 3000
> 2015-05-29 00:56:51,995 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 4000
> 2015-05-29 00:57:33,981 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 5000
> 2015-05-29 00:58:15,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 6000
> 2015-05-29 00:58:57,111 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 7000
> 2015-05-29 00:59:38,593 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 8000
> 2015-05-29 01:00:20,215 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 9000
> 2015-05-29 01:01:00,559 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 10000
> 2015-05-29 01:01:39,614 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 11000
> 2015-05-29 01:02:21,364 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 12000
> 2015-05-29 01:03:03,233 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 13000
> 2015-05-29 01:03:44,701 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 14000
> 2015-05-29 01:04:26,494 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 15000
> 2015-05-29 01:05:08,180 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 16000
> 2015-05-29 01:05:50,331 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue
is 17000
> 
> 


Mime
View raw message