hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taka Shinagawa <taka.epsi...@gmail.com>
Subject Re: Job stuck in running state on Hadoop 2.2.0
Date Tue, 10 Dec 2013 09:10:13 GMT
I had a similar problem after setting up Hadoop 2.2.0 based on the
instructions at
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

Although it's not documented on the page, I needed to edit hadoop-env.sh
and yarn-env.sh as well to update
JAVA_HOME, HADOOP_CONF_DIR, HADOOP_YARN_USER and YARN_CONF_DIR.

Once these variables are set, I was able to run the example successfully.



On Mon, Dec 9, 2013 at 11:37 PM, Silvina Caíno Lores
<silvi.caino@gmail.com>wrote:

>
> Hi everyone,
>
> I'm having trouble running the Hadoop examples in a single node. All the
> executions get stuck at the running state at 0% map and reduce and the logs
> don't seem to indicate any issue, besides the need to kill the node manager:
>
> compute-0-7-3: nodemanager did not stop gracefully after 5 seconds:
> killing with kill -9
>
> RM
>
> 2013-12-09 11:52:22,466 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> Command to launch container container_1386585879247_0001_01_000001 :
> $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA -Xmx1024m
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
> 2><LOG_DIR>/stderr
> 2013-12-09 11:52:22,882 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
> launching container Container: [ContainerId:
> container_1386585879247_0001_01_000001, NodeId: compute-0-7-3:8010,
> NodeHttpAddress: compute-0-7-3:8042, Resource: <memory:2000, vCores:1>,
> Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.7.3:8010}, ] for AM appattempt_1386585879247_0001_000001
> 2013-12-09 11:52:22,883 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1386585879247_0001_000001 State change from ALLOCATED to LAUNCHED
> 2013-12-09 11:52:23,371 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1386585879247_0001_01_000001 Container Transitioned from ACQUIRED
> to RUNNING
> 2013-12-09 11:52:30,922 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth successful for appattempt_1386585879247_0001_000001 (auth:SIMPLE)
> 2013-12-09 11:52:30,938 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
> registration appattempt_1386585879247_0001_000001
> 2013-12-09 11:52:30,939 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=scaino
> IP=10.0.7.3 OPERATION=Register App Master TARGET=ApplicationMasterService
> RESULT=SUCCESS APPID=application_1386585879247_0001
> APPATTEMPTID=appattempt_1386585879247_0001_000001
> 2013-12-09 11:52:30,941 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1386585879247_0001_000001 State change from LAUNCHED to RUNNING
> 2013-12-09 11:52:30,941 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1386585879247_0001 State change from ACCEPTED to RUNNING
>
>
> NM
>
> 2013-12-10 08:26:02,100 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_STOP for appId application_1386585879247_0001
> 2013-12-10 08:26:02,102 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Deleting absolute path :
> /scratch/HDFS-scaino-2/tmp/nm-local-dir/usercache/scaino/appcache/application_1386585879247_0001
> 2013-12-10 08:26:02,103 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event APPLICATION_STOP for appId application_1386585879247_0001
> 2013-12-10 08:26:02,110 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1386585879247_0001 transitioned from
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2013-12-10 08:26:02,157 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
> Scheduling Log Deletion for application: application_1386585879247_0001,
> with delay of 10800 seconds
> 2013-12-10 08:26:04,688 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Stopping resource-monitoring for container_1386585879247_0001_01_000001
> 2013-12-10 08:26:05,838 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Done waiting for Applications to be Finished. Still alive:
> [application_1386585879247_0001]
> 2013-12-10 08:26:05,839 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8010
> 2013-12-10 08:26:05,846 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8010
> 2013-12-10 08:26:05,847 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
>
> I tried the pi and wordcount examples with same results, any ideas on how
> to debug this?
>
> Thanks in advance.
>
> Regards,
> Silvina Caíno
>
>
>
>
>

Mime
View raw message