spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Graves <tgraves...@yahoo.com>
Subject Re: App master failed to find application jar in the master branch on YARN
Date Tue, 19 Nov 2013 15:55:41 GMT
The property is deprecated but will still work. Either one is fine.

Launching the job from the namenode is fine . 

I brought up a cluster with 2.0.5-alpha and built the latest spark master branch and it runs
fine for me. It looks like namenode 2.0.5-alpha won't even start with the defaulFs of file:///.
 Please make sure your namenode is actually up and running and you are pointing to it because
you can run some jobs successfully without it (on a single node cluster), but when you have
a multinode cluster  here is the error I get when I run without a namenode up and it looks
very similar to your error message:

        appDiagnostics: Application application_1384876319080_0001 failed 1 times due
to AM Container for appattempt_1384876319080_0001_000001 exited with  exitCode: -1000 due
to: java.io.FileNotFoundException: File file:/home/tgravescs/spark-master/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
does not exist


When you changed the default fs config did you restart the cluster?


Can you try just running the examples jar:

SPARK_JAR=assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar

./spark-class  org.apache.spark.deploy.yarn.Client --jar examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
 --class org.apache.spark.examples.SparkPi  --args yarn-standalone  --num-workers 2  --master-memory
2g --worker-memory 2g --worker-cores 1

On the client side you should see messages like this:
13/11/19 15:41:30 INFO yarn.Client: Uploading file:/home/tgravescs/spark-master/examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
to hdfs://namenode.host.com:9000/user/tgravescs/.sparkStaging/application_1384874528558_0003/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
13/11/19 15:41:31 INFO yarn.Client: Uploading file:/home/tgravescs/spark-master/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
to hdfs://namenode.host.com:9000/user/tgravescs/.sparkStaging/application_1384874528558_0003/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar

Tom



On Tuesday, November 19, 2013 5:35 AM, guojc <guojc03@gmail.com> wrote:
 
Hi Tom,
   Thank you for your response. I  have double checked that I had upload both jar in the
same folder on hdfs. I think the <name>fs.default.name</name> you pointed out
is the old deprecated name for fs.defaultFS config accordiing  http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
.  Anyway, we have tried both  fs.default.name and  fs.defaultFS set to hdfs namenode,
and the situation remained same. And we have removed SPARK_HOME env variable on worker node.
 An additional information might be related is that job submission is done on the same
machine of hdfs namenode.  But I'm not sure this will cause the problem.

Thanks,
Jiacheng Guo



On Tue, Nov 19, 2013 at 11:50 AM, Tom Graves <tgraves_cs@yahoo.com> wrote:

Sorry for the delay. What is the default filesystem on your HDFS setup?  It looks like its
set to file: rather then hdfs://.  That is the only reason I can think its listing the directory
as  file:/home/work/.sparkStaging/application_1384588058297_0056.  Its basically just copying
it local rather then uploading to hdfs and its just trying to use the local  file:/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar. 
It generally would create that in hdfs so it accessible on all the nodes.  Is your /home/work
nfs mounted on all the nodes?    
>
>
>You can find the default fs by looking at the Hadoop config files.  Generally in core-site.xml.
 its specified by:         <name>fs.default.name</name>
>
>
>Its pretty odd if those are its erroring with file:// when you specified hdfs://.
>when you tried the hdfs:// did you upload both the spark jar and your client jar (SparkAUC-assembly-0.1.jar)?
 If not try that and make sure to put hdfs:// on them when you export SPARK_JAR and specify
the --jar option.  
>
>
>
>I'll try to reproduce the error tomorrow to see if a bug was introduced when I added the
feature to run spark from HDFS.
>
>
>Tom
>
>
>
>On Monday, November 18, 2013 11:13 AM, guojc <guojc03@gmail.com> wrote:
> 
>Hi Tom,
>   I'm on Hadoop 2.05.  I can launch application spark 0.8 release normally. However
I switch to git master branch version with application built with it, I got the jar not found
exception and same happens to the example application. I have tried both file:// protocol
and hdfs:// protocol with jar in local file system and hdfs respectively, and even tried jar
list parameter when new spark context.  The exception is slightly different for hdfs protocol
and local file path. My application launch command is   
>
>
> SPARK_JAR=/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
/home/work/guojiacheng/spark/spark-class  org.apache.spark.deploy.yarn.Client --jar /home/work/guojiacheng/spark-auc/target/scala-2.9.3/SparkAUC-assembly-0.1.jar
--class  myClass.SparkAUC --args -c --args yarn-standalone  --args -i --args hdfs://{hdfs_host}:9000/user/work/guojiacheng/data
--args -m --args hdfs://{hdfs_host}:9000/user/work/guojiacheng/model_large --args -o --args
hdfs://{hdfs_host}:9000/user/work/guojiacheng/score --num-workers 60  --master-memory 6g
--worker-memory 7g --worker-cores 1
>
>
>And my build command is SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
>
>
>Only thing I can think of might be related is on each cluster node, it has a env SPARK_HOME
point to a copy of 0.8 version's position, and its bin fold is in Path environment variable.
And 0.9 version is not there.  It was something left over, when cluster was setup.  But
I don't know whether it is related, as my understand is the yarn version try to distribute
spark through yarn.
>
>
>hdfs version error message:
>
>
>         appDiagnostics: Application application_1384588058297_0056 failed 1 times
due to AM Container for appattempt_1384588058297_0056_000001 exited with  exitCode: -1000
due to: RemoteTrace: 
>java.io.FileNotFoundException: File file:/home/work/.sparkStaging/application_1384588058297_0056/SparkAUC-assembly-0.1.jar
does not exist
>   
>local version error message.
>appDiagnostics: Application application_1384588058297_0066 failed 1 times due to AM Container
for appattempt_1384588058297_0066_000001 exited with  exitCode: -1000 due to: java.io.FileNotFoundException:
File file:/home/work/guojiacheng/spark/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
does not exist
>
>
>
>Best Regards,
>Jiacheng GUo
>
>
>
>
>
>On Mon, Nov 18, 2013 at 10:34 PM, Tom Graves <tgraves_cs@yahoo.com> wrote:
>
>Hey Jiacheng Guo,
>>
>>
>>do you have SPARK_EXAMPLES_JAR env variable set?  If you do, you have to add the
--addJars parameter to the yarn client and point to the spark examples jar.  Or just unset
SPARK_EXAMPLES_JAR env variable.
>>
>>
>>You should only have to set SPARK_JAR env variable.  
>>
>>
>>If that isn't the issue let me know the build command you used and hadoop version,
and your defaultFs or hadoop.
>>
>>
>>Tom
>>
>>
>>
>>On Saturday, November 16, 2013 2:32 AM, guojc <guojc03@gmail.com> wrote:
>> 
>>hi,
>>   After reading about the exiting progress in consolidating shuffle, I'm eager
to trying out the last master branch. However up to launch the example application, the job
failed with prompt the app master failed to find the target jar. appDiagnostics: Application
application_1384588058297_0017 failed 1 times due to AM Container for appattempt_1384588058297_0017_000001
exited with  exitCode: -1000 due to: java.io.FileNotFoundException: File file:/${my_work_dir}/spark/examples/target/scala-2.9.3/spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar
does not exist.
>>
>>
>>  Is there any change on how to launch a yarn job now?
>>
>>
>>Best Regards,
>>Jiacheng Guo
>>
>>
>>
>>
>
>
>
Mime
View raw message