hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason lu <ljhn1...@gmail.com>
Subject Re: ResouceManager hung: org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1000
Date Fri, 29 May 2015 04:24:36 GMT
I forgot to do that before restart the process.

> 在 2015年5月29日,11:17,Rohith Sharma <rohithsharmaks@gmail.com> 写道:
> 
> Hi
> 
> Can you take thread dump and verify it?
> 
> jstack <pid> > RM.out 
> OR
> kill -3 <pid>  (Note : head dump will be logged in out file)
> 
> Thanks & Regards
> Rohith Sharma K S
> 
>> On May 29, 2015, at 8:43 AM, jason lu <ljhn1829@gmail.com <mailto:ljhn1829@gmail.com>>
wrote:
>> 
>> 
>> Hi,
>>     I met the same problem as : http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%3C482C5F6F-6FEB-4552-99F5-07C8B54ACE20@apache.org%3E
<http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%3C482C5F6F-6FEB-4552-99F5-07C8B54ACE20@apache.org%3E>
>> 
>>  Any idea about that?
>>   It almost hadoop every 3 or 4weeks in my cluster(about 150 nodes).
>> I check the log, no warn, no error, no exception, but the ResouceManager hung, not
crash.
>> 
>> I found this code, but I have no idea why it happens, why the event is bigger and
bigger?
>> 
>> thanks.
>> 
>>    private final class EventProcessor implements Runnable {
>>       @Override
>>       public void run() {
>> 
>>         SchedulerEvent event;
>> 
>>         while (!stopped && !Thread.currentThread().isInterrupted()) {
>>           try {
>>             event = eventQueue.take();
>>           } catch (InterruptedException e) {
>>             LOG.error("Returning, interrupted : " + e);
>>             return; // TODO: Kill RM.
>>           }
>> 
>>           try {
>>             scheduler.handle(event);
>>           } catch (Throwable t) {
>>             // An error occurred, but we are shutting down anyway.
>>             // If it was an InterruptedException, the very act of 
>>             // shutdown could have caused it and is probably harmless.
>>             if (stopped) {
>>               LOG.warn("Exception during shutdown: ", t);
>>               break;
>>             }
>>             LOG.fatal("Error in handling event type " + event.getType()
>>                 + " to the scheduler", t);
>>             if (shouldExitOnError
>>                 && !ShutdownHookManager.get().isShutdownInProgress()) {
>>               LOG.info("Exiting, bbye..");
>>               System.exit(-1);
>>             }
>>           }
>>         }
>>       }
>>     }
>> 
>>     @Override
>>     protected void serviceStop() throws Exception {
>>       this.stopped = true;
>>       this.eventProcessor.interrupt();
>>       try {
>>         this.eventProcessor.join();
>>       } catch (InterruptedException e) {
>>         throw new YarnRuntimeException(e);
>>       }
>>       super.serviceStop();
>>     }
>> 
>>     @Override
>>     public void handle(SchedulerEvent event) {
>>       try {
>>         int qSize = eventQueue.size();
>>         if (qSize !=0 && qSize %1000 == 0) {
>>           LOG.info("Size of scheduler event-queue is " + qSize);
>>         }
>>         int remCapacity = eventQueue.remainingCapacity();
>>         if (remCapacity < 1000) {
>>           LOG.info("Very low remaining capacity on scheduler event queue: "
>>               + remCapacity);
>>         }
>>         this.eventQueue.put(event);
>>       } catch (InterruptedException e) {
>>         throw new YarnRuntimeException(e);
>>       }
>>     }
>>   }
>> 
>> logs:
>> 
>> grep 'Size of event-queue' yarn-hadoop-resourcemanager-gdc-hm01-formal.i.nease.net.log
>> 2015-05-29 00:54:46,985 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 1000
>> 2015-05-29 00:55:28,850 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 2000
>> 2015-05-29 00:56:10,204 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 3000
>> 2015-05-29 00:56:51,995 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 4000
>> 2015-05-29 00:57:33,981 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 5000
>> 2015-05-29 00:58:15,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 6000
>> 2015-05-29 00:58:57,111 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 7000
>> 2015-05-29 00:59:38,593 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 8000
>> 2015-05-29 01:00:20,215 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 9000
>> 2015-05-29 01:01:00,559 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 10000
>> 2015-05-29 01:01:39,614 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 11000
>> 2015-05-29 01:02:21,364 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 12000
>> 2015-05-29 01:03:03,233 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 13000
>> 2015-05-29 01:03:44,701 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 14000
>> 2015-05-29 01:04:26,494 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 15000
>> 2015-05-29 01:05:08,180 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 16000
>> 2015-05-29 01:05:50,331 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of
event-queue is 17000
>> 
>> 
> 


Mime
View raw message