spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <philip.og...@oracle.com>
Subject Re: Anyone know hot to submit spark job to yarn in java code?
Date Wed, 15 Jan 2014 18:38:17 GMT
Great question!  I was writing up a similar question this morning and 
decided to investigate some more before sending.  Here's what I'm 
trying.  I have created a new scala project that contains only 
spark-examples-assembly-0.8.1-incubating.jar and 
spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the 
classpath and I am trying to create a yarn-client SparkContext with the 
following:

val spark = new SparkContext("yarn-client", "my-app")

My hope is to run this on my laptop and have it execute/connect on the 
yarn application master.  The hope is that if I can get this to work, 
then I can do the same from a web application.  I'm trying to unpack 
run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure out 
what environment variables I need to set up etc.

I grabbed all the .xml files out of my clusters conf directory (in my 
case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and put 
them on my classpath.  I also set up environment variables SPARK_JAR, 
SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.

When I run my simple scala script, I get the following error:

Exception in thread "main" org.apache.spark.SparkException: Yarn 
application already ended,might be killed or not able to launch 
application master.
     at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
     at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
     at 
org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
     at SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
     at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)

I can look at my yarn UI and see that it registers a failed application, 
so I take this as incremental progress.  However, I'm not sure how to 
troubleshoot what I'm doing from here or if what I'm trying to do is 
even sensible/possible.  Any advice is appreciated.

Thanks,
Philip

On 1/15/2014 11:25 AM, John Zhao wrote:
> Now I am working on a web application and  I want to  submit a spark job to hadoop yarn.
> I have already do my own assemble and  can run it in command line by the following script:
>
> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
> export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> ./spark-class org.apache.spark.deploy.yarn.Client  --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
 --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory
1g --worker-memory 512m --worker-cores 1
>
> It works fine.
> The I realized that it is hard to submit the job from a web application .Looks like the
spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar
is a really big jar. I believe it contains everything .
> So my question is :
> 1) when I run the above script, which jar is beed submitted to the yarn server ?
> 2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the
role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime
and examples which will be running in yarn, am I right?
> 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow
the same logic to submit spark job. For now I can only find the command line way to submit
spark job to yarn. I believe there is a easy way to integration spark in a web allocation.
>
>
> Thanks.
> John.


Mime
View raw message