hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: ResourceManager shutting down
Date Thu, 13 Mar 2014 21:29:36 GMT
Never mind... we figured out its DNS entry was going missing.
john

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 13, 2014 2:52 PM
To: user@hadoop.apache.org
Subject: ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.
 The odd thing is, the host it complains about have been in use for days at that point without
problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220
State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449))
- Error in handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
        ... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453))
- Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088<mailto:SelectChannelConnector@metallica.office.datalever.com:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557))
- InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
sleep interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) -
Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) -
ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572))
- ResourceManager metrics system shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98))
- org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
... and so on, it shuts down


Mime
View raw message