hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvina Caíno Lores <silvi.ca...@gmail.com>
Subject Job stuck in running state on Hadoop 2.2.0
Date Tue, 10 Dec 2013 07:37:41 GMT
Hi everyone,

I'm having trouble running the Hadoop examples in a single node. All the
executions get stuck at the running state at 0% map and reduce and the logs
don't seem to indicate any issue, besides the need to kill the node manager:

compute-0-7-3: nodemanager did not stop gracefully after 5 seconds: killing
with kill -9

RM

2013-12-09 11:52:22,466 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Command to launch container container_1386585879247_0001_01_000001 :
$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA -Xmx1024m
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
2><LOG_DIR>/stderr
2013-12-09 11:52:22,882 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
launching container Container: [ContainerId:
container_1386585879247_0001_01_000001, NodeId: compute-0-7-3:8010,
NodeHttpAddress: compute-0-7-3:8042, Resource: <memory:2000, vCores:1>,
Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.7.3:8010 },
] for AM appattempt_1386585879247_0001_000001
2013-12-09 11:52:22,883 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1386585879247_0001_000001 State change from ALLOCATED to LAUNCHED
2013-12-09 11:52:23,371 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1386585879247_0001_01_000001 Container Transitioned from ACQUIRED
to RUNNING
2013-12-09 11:52:30,922 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
Auth successful for appattempt_1386585879247_0001_000001 (auth:SIMPLE)
2013-12-09 11:52:30,938 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
registration appattempt_1386585879247_0001_000001
2013-12-09 11:52:30,939 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=scaino
IP=10.0.7.3 OPERATION=Register App Master TARGET=ApplicationMasterService
RESULT=SUCCESS APPID=application_1386585879247_0001
APPATTEMPTID=appattempt_1386585879247_0001_000001
2013-12-09 11:52:30,941 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1386585879247_0001_000001 State change from LAUNCHED to RUNNING
2013-12-09 11:52:30,941 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1386585879247_0001 State change from ACCEPTED to RUNNING


NM

2013-12-10 08:26:02,100 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event CONTAINER_STOP for appId application_1386585879247_0001
2013-12-10 08:26:02,102 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path :
/scratch/HDFS-scaino-2/tmp/nm-local-dir/usercache/scaino/appcache/application_1386585879247_0001
2013-12-10 08:26:02,103 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event APPLICATION_STOP for appId application_1386585879247_0001
2013-12-10 08:26:02,110 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1386585879247_0001 transitioned from
APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2013-12-10 08:26:02,157 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
Scheduling Log Deletion for application: application_1386585879247_0001,
with delay of 10800 seconds
2013-12-10 08:26:04,688 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Stopping resource-monitoring for container_1386585879247_0001_01_000001
2013-12-10 08:26:05,838 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Done waiting for Applications to be Finished. Still alive:
[application_1386585879247_0001]
2013-12-10 08:26:05,839 INFO org.apache.hadoop.ipc.Server: Stopping server
on 8010
2013-12-10 08:26:05,846 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8010
2013-12-10 08:26:05,847 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder

I tried the pi and wordcount examples with same results, any ideas on how
to debug this?

Thanks in advance.

Regards,
Silvina Caíno

Mime
View raw message