hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvina Caíno Lores <silvi.ca...@gmail.com>
Subject Re: Job stuck in running state on Hadoop 2.2.0
Date Wed, 11 Dec 2013 09:01:24 GMT
Actually now it seems to be running (or at least attempting to run) but I
get further errors:

hadoop jar
~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
pi 1 100

INFO mapreduce.Job: Job job_1386751964857_0001 failed with state FAILED due
to: Application application_1386751964857_0001 failed 2 times due to AM
Container for appattempt_1386751964857_0001_000002 exited with exitCode: 1
due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:504)
at org.apache.hadoop.util.Shell.run(Shell.java:417)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:636)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)



I guess it seems some sort of classpath issue because of this log:

/scratch/HDFS-scaino-2/logs/application_1386751964857_0001/container_1386751964857_0001_01_000001$
cat stderr
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/service/CompositeService
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.service.CompositeService
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more


I haven't found a solution yet despite the classpath looks nice:

hadoop classpath

/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar


Could that be related to the previous launch errors??

Thanks in advance :)




On 11 December 2013 00:29, Adam Kawa <kawa.adam@gmail.com> wrote:

> It sounds like the job was successfully submitted to the cluster, but
> there as some problem when starting/running AM, so that no progress is
> made. It happened to me once, when I was playing with YARN on a cluster
> consisting of very small machines, and I mis-configured YARN to allocated
> to AM more memory than the actual memory available on any machine on my
> cluster. So that RM was not able to start AM anywhere due to inability to
> find big enough container.
>
> Could you show the logs from the job? The link should be available on your
> console after you submit a job e.g.
> 13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job:
> http://compute-7-2:8088/proxy/application_1386668372725_0001/
>
>
> 2013/12/10 Silvina Caíno Lores <silvi.caino@gmail.com>
>
>> Thank you! I realized that, despite I exported the variables in the
>> scripts, there were a few errors and my desired configuration wasn't being
>> used (which explained other strange behavior).
>>
>> However, I'm still getting the same issue with the examples, for instance:
>>
>> hadoop jar
>> ~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
>> pi 1 100
>> Number of Maps = 1
>> Samples per Map = 100
>> 13/12/10 10:41:18 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> Wrote input for Map #0
>> Starting Job
>> 13/12/10 10:41:19 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 13/12/10 10:41:20 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: number of splits:1
>> 13/12/10 10:41:20 INFO Configuration.deprecation: user.name is
>> deprecated. Instead, use mapreduce.job.user.name
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.jar is
>> deprecated. Instead, use mapreduce.job.jar
>> 13/12/10 10:41:20 INFO Configuration.deprecation:
>> mapred.map.tasks.speculative.execution is deprecated. Instead, use
>> mapreduce.map.speculative
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.reduce.tasks is
>> deprecated. Instead, use mapreduce.job.reduces
>> 13/12/10 10:41:20 INFO Configuration.deprecation:
>> mapred.output.value.class is deprecated. Instead, use
>> mapreduce.job.output.value.class
>> 13/12/10 10:41:20 INFO Configuration.deprecation:
>> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
>> mapreduce.reduce.speculative
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.map.class is
>> deprecated. Instead, use mapreduce.job.map.class
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.job.name is
>> deprecated. Instead, use mapreduce.job.name
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.reduce.class
>> is deprecated. Instead, use mapreduce.job.reduce.class
>> 13/12/10 10:41:20 INFO Configuration.deprecation:
>> mapreduce.inputformat.class is deprecated. Instead, use
>> mapreduce.job.inputformat.class
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.input.dir is
>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.output.dir is
>> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
>> 13/12/10 10:41:20 INFO Configuration.deprecation:
>> mapreduce.outputformat.class is deprecated. Instead, use
>> mapreduce.job.outputformat.class
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.map.tasks is
>> deprecated. Instead, use mapreduce.job.maps
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.output.key.class
>> is deprecated. Instead, use mapreduce.job.output.key.class
>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.working.dir is
>> deprecated. Instead, use mapreduce.job.working.dir
>> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1386668372725_0001
>> 13/12/10 10:41:20 INFO impl.YarnClientImpl: Submitted application
>> application_1386668372725_0001 to ResourceManager at /0.0.0.0:8032
>> 13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job:
>> http://compute-7-2:8088/proxy/application_1386668372725_0001/
>> 13/12/10 10:41:21 INFO mapreduce.Job: Running job: job_1386668372725_0001
>> 13/12/10 10:41:31 INFO mapreduce.Job: Job job_1386668372725_0001 running
>> in uber mode : false
>> 13/12/10 10:41:31 INFO mapreduce.Job: map 0% reduce 0%
>> ---- stuck here ----
>>
>>
>> I hope the problem is not in the environment files. I have the following
>> at the beginning of hadoop-env.sh:
>>
>> # The java implementation to use.
>> export JAVA_HOME=/home/software/jdk1.7.0_25/
>>
>> # The jsvc implementation to use. Jsvc is required to run secure
>> datanodes.
>> #export JSVC_HOME=${JSVC_HOME}
>>
>> export
>> HADOOP_INSTALL=/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT
>>
>> export HADOOP_HDFS_HOME=$HADOOP_INSTALL
>> export HADOOP_COMMON_HOME=$HADOOP_INSTALL
>> export HADOOP_CONF_DIR=$HADOOP_INSTALL"/etc/hadoop"
>>
>>
>> and this in yarn-env.sh:
>>
>> export JAVA_HOME=/home/software/jdk1.7.0_25/
>>
>> export
>> HADOOP_INSTALL=/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT
>>
>> export HADOOP_HDFS_HOME=$HADOOP_INSTALL
>> export HADOOP_COMMON_HOME=$HADOOP_INSTALL
>> export HADOOP_CONF_DIR=$HADOOP_INSTALL"/etc/hadoop"
>>
>>
>> Not sure what to do about HADOOP_YARN_USER though, since I don't have a
>> dedicated user to run the demons.
>>
>> Thanks!
>>
>>
>> On 10 December 2013 10:10, Taka Shinagawa <taka.epsilon@gmail.com> wrote:
>>
>>> I had a similar problem after setting up Hadoop 2.2.0 based on the
>>> instructions at
>>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>>>
>>> Although it's not documented on the page, I needed to edit hadoop-env.sh
>>> and yarn-env.sh as well to update
>>> JAVA_HOME, HADOOP_CONF_DIR, HADOOP_YARN_USER and YARN_CONF_DIR.
>>>
>>> Once these variables are set, I was able to run the example successfully.
>>>
>>>
>>>
>>> On Mon, Dec 9, 2013 at 11:37 PM, Silvina Caíno Lores <
>>> silvi.caino@gmail.com> wrote:
>>>
>>>>
>>>> Hi everyone,
>>>>
>>>> I'm having trouble running the Hadoop examples in a single node. All
>>>> the executions get stuck at the running state at 0% map and reduce and the
>>>> logs don't seem to indicate any issue, besides the need to kill the node
>>>> manager:
>>>>
>>>> compute-0-7-3: nodemanager did not stop gracefully after 5 seconds:
>>>> killing with kill -9
>>>>
>>>> RM
>>>>
>>>> 2013-12-09 11:52:22,466 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>>>> Command to launch container container_1386585879247_0001_01_000001 :
>>>> $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties
>>>> -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0
>>>> -Dhadoop.root.logger=INFO,CLA -Xmx1024m
>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
>>>> 2><LOG_DIR>/stderr
>>>> 2013-12-09 11:52:22,882 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
>>>> launching container Container: [ContainerId:
>>>> container_1386585879247_0001_01_000001, NodeId: compute-0-7-3:8010,
>>>> NodeHttpAddress: compute-0-7-3:8042, Resource: <memory:2000, vCores:1>,
>>>> Priority: 0, Token: Token { kind: ContainerToken, service:
>>>> 10.0.7.3:8010 }, ] for AM appattempt_1386585879247_0001_000001
>>>> 2013-12-09 11:52:22,883 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>> appattempt_1386585879247_0001_000001 State change from ALLOCATED to LAUNCHED
>>>> 2013-12-09 11:52:23,371 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>>>> container_1386585879247_0001_01_000001 Container Transitioned from ACQUIRED
>>>> to RUNNING
>>>> 2013-12-09 11:52:30,922 INFO
>>>> SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for
>>>> appattempt_1386585879247_0001_000001 (auth:SIMPLE)
>>>> 2013-12-09 11:52:30,938 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
>>>> registration appattempt_1386585879247_0001_000001
>>>> 2013-12-09 11:52:30,939 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=scaino
>>>> IP=10.0.7.3 OPERATION=Register App Master TARGET=ApplicationMasterService
>>>> RESULT=SUCCESS APPID=application_1386585879247_0001
>>>> APPATTEMPTID=appattempt_1386585879247_0001_000001
>>>> 2013-12-09 11:52:30,941 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>> appattempt_1386585879247_0001_000001 State change from LAUNCHED to RUNNING
>>>> 2013-12-09 11:52:30,941 INFO
>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>>>> application_1386585879247_0001 State change from ACCEPTED to RUNNING
>>>>
>>>>
>>>> NM
>>>>
>>>> 2013-12-10 08:26:02,100 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
>>>> event CONTAINER_STOP for appId application_1386585879247_0001
>>>> 2013-12-10 08:26:02,102 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
>>>> Deleting absolute path :
>>>> /scratch/HDFS-scaino-2/tmp/nm-local-dir/usercache/scaino/appcache/application_1386585879247_0001
>>>> 2013-12-10 08:26:02,103 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
>>>> event APPLICATION_STOP for appId application_1386585879247_0001
>>>> 2013-12-10 08:26:02,110 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>>>> Application application_1386585879247_0001 transitioned from
>>>> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
>>>> 2013-12-10 08:26:02,157 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
>>>> Scheduling Log Deletion for application: application_1386585879247_0001,
>>>> with delay of 10800 seconds
>>>> 2013-12-10 08:26:04,688 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>>>> Stopping resource-monitoring for container_1386585879247_0001_01_000001
>>>> 2013-12-10 08:26:05,838 INFO
>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>>> Done waiting for Applications to be Finished. Still alive:
>>>> [application_1386585879247_0001]
>>>> 2013-12-10 08:26:05,839 INFO org.apache.hadoop.ipc.Server: Stopping
>>>> server on 8010
>>>> 2013-12-10 08:26:05,846 INFO org.apache.hadoop.ipc.Server: Stopping IPC
>>>> Server listener on 8010
>>>> 2013-12-10 08:26:05,847 INFO org.apache.hadoop.ipc.Server: Stopping IPC
>>>> Server Responder
>>>>
>>>> I tried the pi and wordcount examples with same results, any ideas on
>>>> how to debug this?
>>>>
>>>> Thanks in advance.
>>>>
>>>> Regards,
>>>> Silvina Caíno
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Mime
View raw message