spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-9515) Creating JavaSparkContext with yarn-cluster mode throws NPE
Date Thu, 06 Aug 2015 16:54:06 GMT

    [ https://issues.apache.org/jira/browse/SPARK-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660321#comment-14660321
] 

Sean Owen commented on SPARK-9515:
----------------------------------

If you're going to not use spark-submit, you need to emulate its initialization. Here the
NPE is because this did not happen, but that's a function of how you're trying to call Spark
code.

> Creating JavaSparkContext with yarn-cluster mode throws NPE
> -----------------------------------------------------------
>
>                 Key: SPARK-9515
>                 URL: https://issues.apache.org/jira/browse/SPARK-9515
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.3.1
>            Reporter: nirav patel
>
> I have spark application that runs agains YARN cluster. I run spark application as part
of my web application. I can't use spark-submit script. Way I run it is `java -cp myApp.jar
com.myapp.Application` which in turn initiate JavaSparkContext. It used to work with spark
1.0.2 and standalone cluster but now with 1.3.1 and yarn its failing.
> Caused by: java.lang.NullPointerException
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580)
> 	at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
> 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
> 	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
> EDIT:
> I got it working with yarn-client mode however I want to test it out with yarn-cluster
mode as well.
> Application design is, we create singleton SparkContext object and preload few RDDs in
memory when our spring-boot application(tomcat container) starts. That allows us to submit
subsequent spark jobs without overhead of creating new sparkContext and RDDs. It performs
excellent for our SLA. We are serving real-time GLM in ms with that. I hope this is a reason
enough why we can't use spark-submit script to submit a job.
> Code is pretty simple. This is how we create sparkContext
> SparkConf conf = new SparkConf().setAppName(appName.toString()).setMaster("yarn-client");
> conf.set("spark.eventLog.enabled", "true");
> conf.set("spark.executor.extraClassPath", "/opt/mapr/hbase/hbase-0.98.12/lib/*");
> conf.set("spark.cores.max", sparkCoreMax);
> conf.set("spark.executor.memory", sparkExecMem);
> conf.set("spark.executor.extraJavaOptions", executorJavaOPts);
> conf.set("spark.akka.threads", sparkDriverThreads);
> JavaSparkContext sparkContext = new JavaSparkContext(conf);
> This is how we actually run sprig-boot app.
> java -Dloader.path=myspringbootapp.jar,/spark/spark-1.3.1/lib,/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop,/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/yarn
-XX:PermSize=512m -XX:MaxPermSize=512m -Xms1024m -jar myspringbootapp.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message