spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guru Medasani <gdm...@gmail.com>
Subject Re: Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ?
Date Thu, 18 Jun 2015 02:44:10 GMT
Hi Elkhan,

There are couple of ways to do this.

1) Spark-jobserver is a popular web server that is used to submit spark jobs.

https://github.com/spark-jobserver/spark-jobserver <https://github.com/spark-jobserver/spark-jobserver>

2) Spark-submit script sets the classpath for the job. Bypassing the spark-submit script means
you have to manage some of this work in your program itself.  

Here is a link with some discussions around how to handle this scenario. 

http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/What-dependencies-to-submit-Spark-jobs-programmatically-not-via/td-p/24721
<http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/What-dependencies-to-submit-Spark-jobs-programmatically-not-via/td-p/24721>


Guru Medasani
gdmeda@gmail.com



> On Jun 17, 2015, at 6:01 PM, Elkhan Dadashov <elkhan8502@gmail.com> wrote:
> 
> This is not independent programmatic way of running of Spark job on Yarn cluster.
> 
> That example demonstrates running on Yarn-client mode, also will be dependent of Jetty.
Users writing Spark programs do not want to depend on that.
> 
> I found this SparkLauncher class introduced in Spark 1.4 version (https://github.com/apache/spark/tree/master/launcher
<https://github.com/apache/spark/tree/master/launcher>) which allows running Spark jobs
in programmatic way. 
> 
> SparkLauncher exists in Java and Scala APIs, but I could not find in Python API.
> 
> Did not try it yet, but seems promising.
> 
> Example:
> 
> import org.apache.spark.launcher.SparkLauncher;
> 
> public class MyLauncher {
> 
> public static void main(String[] args) throws Exception {
> 
>      Process spark = new SparkLauncher()
> 
>        .setAppResource("/my/app.jar")
> 
>        .setMainClass("my.spark.app.Main")
> 
>        .setMaster("local")
> 
>        .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
> 
>         .launch();
> 
>       spark.waitFor();
> 
>    }
> 
>   }
> 
> }
> 
> 
> 
> On Wed, Jun 17, 2015 at 5:51 PM, Corey Nolet <cjnolet@gmail.com <mailto:cjnolet@gmail.com>>
wrote:
> An example of being able to do this is provided in the Spark Jetty Server project [1]

> 
> [1] https://github.com/calrissian/spark-jetty-server <https://github.com/calrissian/spark-jetty-server>
> 
> On Wed, Jun 17, 2015 at 8:29 PM, Elkhan Dadashov <elkhan8502@gmail.com <mailto:elkhan8502@gmail.com>>
wrote:
> Hi all,
> 
> Is there any way running Spark job in programmatic way on Yarn cluster without using
spark-submit script ?
> 
> I cannot include Spark jars on my Java application (due o dependency conflict and other
reasons), so I'll be shipping Spark assembly uber jar (spark-assembly-1.3.1-hadoop2.3.0.jar)
to Yarn cluster, and then execute job (Python or Java) on Yarn-cluster.
> 
> So is there any way running Spark job implemented in python file/Java class without calling
it through spark-submit script ?
> 
> Thanks.
> 
> 
> 
> 
> 
> 
> -- 
> 
> Best regards,
> Elkhan Dadashov


Mime
View raw message