hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kawa <kawa.a...@gmail.com>
Subject Re: Job stuck in running state on Hadoop 2.2.0
Date Tue, 10 Dec 2013 23:29:07 GMT
It sounds like the job was successfully submitted to the cluster, but there
as some problem when starting/running AM, so that no progress is made. It
happened to me once, when I was playing with YARN on a cluster consisting
of very small machines, and I mis-configured YARN to allocated to AM more
memory than the actual memory available on any machine on my cluster. So
that RM was not able to start AM anywhere due to inability to find big
enough container.

Could you show the logs from the job? The link should be available on your
console after you submit a job e.g.
13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job:
http://compute-7-2:8088/proxy/application_1386668372725_0001/


2013/12/10 Silvina Caíno Lores <silvi.caino@gmail.com>

> Thank you! I realized that, despite I exported the variables in the
> scripts, there were a few errors and my desired configuration wasn't being
> used (which explained other strange behavior).
>
> However, I'm still getting the same issue with the examples, for instance:
>
> hadoop jar
> ~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
> pi 1 100
> Number of Maps = 1
> Samples per Map = 100
> 13/12/10 10:41:18 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Wrote input for Map #0
> Starting Job
> 13/12/10 10:41:19 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 13/12/10 10:41:20 INFO input.FileInputFormat: Total input paths to process
> : 1
> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: number of splits:1
> 13/12/10 10:41:20 INFO Configuration.deprecation: user.name is
> deprecated. Instead, use mapreduce.job.user.name
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.jar is
> deprecated. Instead, use mapreduce.job.jar
> 13/12/10 10:41:20 INFO Configuration.deprecation:
> mapred.map.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.map.speculative
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.reduce.tasks is
> deprecated. Instead, use mapreduce.job.reduces
> 13/12/10 10:41:20 INFO Configuration.deprecation:
> mapred.output.value.class is deprecated. Instead, use
> mapreduce.job.output.value.class
> 13/12/10 10:41:20 INFO Configuration.deprecation:
> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.reduce.speculative
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.map.class is
> deprecated. Instead, use mapreduce.job.map.class
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.job.name is
> deprecated. Instead, use mapreduce.job.name
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.reduce.class
> is deprecated. Instead, use mapreduce.job.reduce.class
> 13/12/10 10:41:20 INFO Configuration.deprecation:
> mapreduce.inputformat.class is deprecated. Instead, use
> mapreduce.job.inputformat.class
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.input.dir is
> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.output.dir is
> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> 13/12/10 10:41:20 INFO Configuration.deprecation:
> mapreduce.outputformat.class is deprecated. Instead, use
> mapreduce.job.outputformat.class
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.map.tasks is
> deprecated. Instead, use mapreduce.job.maps
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.output.key.class
> is deprecated. Instead, use mapreduce.job.output.key.class
> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.working.dir is
> deprecated. Instead, use mapreduce.job.working.dir
> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1386668372725_0001
> 13/12/10 10:41:20 INFO impl.YarnClientImpl: Submitted application
> application_1386668372725_0001 to ResourceManager at /0.0.0.0:8032
> 13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job:
> http://compute-7-2:8088/proxy/application_1386668372725_0001/
> 13/12/10 10:41:21 INFO mapreduce.Job: Running job: job_1386668372725_0001
> 13/12/10 10:41:31 INFO mapreduce.Job: Job job_1386668372725_0001 running
> in uber mode : false
> 13/12/10 10:41:31 INFO mapreduce.Job: map 0% reduce 0%
> ---- stuck here ----
>
>
> I hope the problem is not in the environment files. I have the following
> at the beginning of hadoop-env.sh:
>
> # The java implementation to use.
> export JAVA_HOME=/home/software/jdk1.7.0_25/
>
> # The jsvc implementation to use. Jsvc is required to run secure datanodes.
> #export JSVC_HOME=${JSVC_HOME}
>
> export
> HADOOP_INSTALL=/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT
>
> export HADOOP_HDFS_HOME=$HADOOP_INSTALL
> export HADOOP_COMMON_HOME=$HADOOP_INSTALL
> export HADOOP_CONF_DIR=$HADOOP_INSTALL"/etc/hadoop"
>
>
> and this in yarn-env.sh:
>
> export JAVA_HOME=/home/software/jdk1.7.0_25/
>
> export
> HADOOP_INSTALL=/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT
>
> export HADOOP_HDFS_HOME=$HADOOP_INSTALL
> export HADOOP_COMMON_HOME=$HADOOP_INSTALL
> export HADOOP_CONF_DIR=$HADOOP_INSTALL"/etc/hadoop"
>
>
> Not sure what to do about HADOOP_YARN_USER though, since I don't have a
> dedicated user to run the demons.
>
> Thanks!
>
>
> On 10 December 2013 10:10, Taka Shinagawa <taka.epsilon@gmail.com> wrote:
>
>> I had a similar problem after setting up Hadoop 2.2.0 based on the
>> instructions at
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>>
>> Although it's not documented on the page, I needed to edit hadoop-env.sh
>> and yarn-env.sh as well to update
>> JAVA_HOME, HADOOP_CONF_DIR, HADOOP_YARN_USER and YARN_CONF_DIR.
>>
>> Once these variables are set, I was able to run the example successfully.
>>
>>
>>
>> On Mon, Dec 9, 2013 at 11:37 PM, Silvina Caíno Lores <
>> silvi.caino@gmail.com> wrote:
>>
>>>
>>> Hi everyone,
>>>
>>> I'm having trouble running the Hadoop examples in a single node. All the
>>> executions get stuck at the running state at 0% map and reduce and the logs
>>> don't seem to indicate any issue, besides the need to kill the node manager:
>>>
>>> compute-0-7-3: nodemanager did not stop gracefully after 5 seconds:
>>> killing with kill -9
>>>
>>> RM
>>>
>>> 2013-12-09 11:52:22,466 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>>> Command to launch container container_1386585879247_0001_01_000001 :
>>> $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
>>> -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
>>> -Dhadoop.root.logger=INFO,CLA -Xmx1024m
>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
>>> 2><LOG_DIR>/stderr
>>> 2013-12-09 11:52:22,882 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
>>> launching container Container: [ContainerId:
>>> container_1386585879247_0001_01_000001, NodeId: compute-0-7-3:8010,
>>> NodeHttpAddress: compute-0-7-3:8042, Resource: <memory:2000, vCores:1>,
>>> Priority: 0, Token: Token { kind: ContainerToken, service: 10.0.7.3:8010}, ]
for AM appattempt_1386585879247_0001_000001
>>> 2013-12-09 11:52:22,883 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>> appattempt_1386585879247_0001_000001 State change from ALLOCATED to LAUNCHED
>>> 2013-12-09 11:52:23,371 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>>> container_1386585879247_0001_01_000001 Container Transitioned from ACQUIRED
>>> to RUNNING
>>> 2013-12-09 11:52:30,922 INFO
>>> SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for
>>> appattempt_1386585879247_0001_000001 (auth:SIMPLE)
>>> 2013-12-09 11:52:30,938 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
>>> registration appattempt_1386585879247_0001_000001
>>> 2013-12-09 11:52:30,939 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=scaino
>>> IP=10.0.7.3 OPERATION=Register App Master TARGET=ApplicationMasterService
>>> RESULT=SUCCESS APPID=application_1386585879247_0001
>>> APPATTEMPTID=appattempt_1386585879247_0001_000001
>>> 2013-12-09 11:52:30,941 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>> appattempt_1386585879247_0001_000001 State change from LAUNCHED to RUNNING
>>> 2013-12-09 11:52:30,941 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>>> application_1386585879247_0001 State change from ACCEPTED to RUNNING
>>>
>>>
>>> NM
>>>
>>> 2013-12-10 08:26:02,100 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
>>> event CONTAINER_STOP for appId application_1386585879247_0001
>>> 2013-12-10 08:26:02,102 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
>>> Deleting absolute path :
>>> /scratch/HDFS-scaino-2/tmp/nm-local-dir/usercache/scaino/appcache/application_1386585879247_0001
>>> 2013-12-10 08:26:02,103 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
>>> event APPLICATION_STOP for appId application_1386585879247_0001
>>> 2013-12-10 08:26:02,110 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>>> Application application_1386585879247_0001 transitioned from
>>> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
>>> 2013-12-10 08:26:02,157 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
>>> Scheduling Log Deletion for application: application_1386585879247_0001,
>>> with delay of 10800 seconds
>>> 2013-12-10 08:26:04,688 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>>> Stopping resource-monitoring for container_1386585879247_0001_01_000001
>>> 2013-12-10 08:26:05,838 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>> Done waiting for Applications to be Finished. Still alive:
>>> [application_1386585879247_0001]
>>> 2013-12-10 08:26:05,839 INFO org.apache.hadoop.ipc.Server: Stopping
>>> server on 8010
>>> 2013-12-10 08:26:05,846 INFO org.apache.hadoop.ipc.Server: Stopping IPC
>>> Server listener on 8010
>>> 2013-12-10 08:26:05,847 INFO org.apache.hadoop.ipc.Server: Stopping IPC
>>> Server Responder
>>>
>>> I tried the pi and wordcount examples with same results, any ideas on
>>> how to debug this?
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Silvina Caíno
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Mime
View raw message