hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: ResourceManager shutting down
Date Fri, 14 Mar 2014 15:05:33 GMT
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was
not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.
 The odd thing is, the host it complains about have been in use for days at that point without
problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220
State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449))
- Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453))
- Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557))
- InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200))
- Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206))
- ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572))
- ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98))
- org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on
8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on
8050
> … and so on, it shuts down
>  


Mime
View raw message