spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: trying to understand yarn-client mode
Date Thu, 19 Jun 2014 19:45:09 GMT
okay. since for us the main purpose is to retrieve (small) data i guess i
will stick to yarn client mode. thx


On Thu, Jun 19, 2014 at 3:19 PM, DB Tsai <dbtsai@stanford.edu> wrote:

> Currently, we save the result in HDFS, and read it back in our
> application. Since Clinet.run is blocking call, it's easy to do it in
> this way.
>
> We are now working on using akka to bring back the result to app
> without going through the HDFS, and we can use akka to track the log,
> and stack trace, etc.
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Thu, Jun 19, 2014 at 12:08 PM, Koert Kuipers <koert@tresata.com> wrote:
> > db tsai,
> > if in yarn-cluster mode the driver runs inside yarn, how can you do a
> > rdd.collect and bring the results back to your application?
> >
> >
> > On Thu, Jun 19, 2014 at 2:33 PM, DB Tsai <dbtsai@stanford.edu> wrote:
> >>
> >> We are submitting the spark job in our tomcat application using
> >> yarn-cluster mode with great success. As Kevin said, yarn-client mode
> >> runs driver in your local JVM, and it will have really bad network
> >> overhead when one do reduce action which will pull all the result from
> >> executor to your local JVM. Also, since you can only have one spark
> >> context object in one JVM, it will be tricky to run multiple spark
> >> jobs concurrently with yarn-clinet mode.
> >>
> >> This is how we submit spark job with yarn-cluster mode. Please use the
> >> current master code, otherwise, after the job is finished, spark will
> >> kill the JVM and exit your app.
> >>
> >> We setup the configuration of spark in a scala map.
> >>
> >>   def getArgsFromConf(conf: Map[String, String]): Array[String] = {
> >>     Array[String](
> >>       "--jar", conf.get("app.jar").getOrElse(""),
> >>       "--addJars", conf.get("spark.addJars").getOrElse(""),
> >>       "--class", conf.get("spark.mainClass").getOrElse(""),
> >>       "--num-executors", conf.get("spark.numWorkers").getOrElse("1"),
> >>       "--driver-memory", conf.get("spark.masterMemory").getOrElse("1g"),
> >>       "--executor-memory",
> conf.get("spark.workerMemory").getOrElse("1g"),
> >>       "--executor-cores", conf.get("spark.workerCores").getOrElse("1"))
> >>   }
> >>
> >>       System.setProperty("SPARK_YARN_MODE", "true")
> >>       val sparkConf = new SparkConf
> >>       val args = getArgsFromConf(conf)
> >>       new Client(new ClientArguments(args, sparkConf), hadoopConfig,
> >> sparkConf).run
> >>
> >> Sincerely,
> >>
> >> DB Tsai
> >> -------------------------------------------------------
> >> My Blog: https://www.dbtsai.com
> >> LinkedIn: https://www.linkedin.com/in/dbtsai
> >>
> >>
> >> On Thu, Jun 19, 2014 at 11:22 AM, Kevin Markey <kevin.markey@oracle.com
> >
> >> wrote:
> >> > Yarn client is much like Spark client mode, except that the executors
> >> > are
> >> > running in Yarn containers managed by the Yarn resource manager on the
> >> > cluster instead of as Spark workers managed by the Spark master.  The
> >> > driver
> >> > executes as a local client in your local JVM.  It communicates with
> the
> >> > workers on the cluster.  Transformations are scheduled on the cluster
> by
> >> > the
> >> > driver's logic.  Actions involve communication between local driver
> and
> >> > remote cluster executors.  So, there is some additional network
> >> > overhead,
> >> > especially if the driver is not co-located on the cluster.  In
> >> > yarn-cluster
> >> > mode -- in contrast, the driver is executed as a thread in a Yarn
> >> > application master on the cluster.
> >> >
> >> > In either case, the assembly JAR must be available to the application
> on
> >> > the
> >> > cluster.  Best to copy it to HDFS and specify its location by
> exporting
> >> > its
> >> > location as SPARK_JAR.
> >> >
> >> > Kevin Markey
> >> >
> >> >
> >> > On 06/19/2014 11:22 AM, Koert Kuipers wrote:
> >> >
> >> > i am trying to understand how yarn-client mode works. i am not using
> >> > spark-submit, but instead launching a spark job from within my own
> >> > application.
> >> >
> >> > i can see my application contacting yarn successfully, but then in
> yarn
> >> > i
> >> > get an immediate error:
> >> >
> >> > Application application_1403117970283_0014 failed 2 times due to AM
> >> > Container for appattempt_1403117970283_0014_000002 exited with
> exitCode:
> >> > -1000 due to: File file:/home/koert/test-assembly-0.1-SNAPSHOT.jar
> does
> >> > not
> >> > exist
> >> > .Failing this attempt.. Failing the application.
> >> >
> >> > why is yarn trying to fetch my jar, and why as a local file? i would
> >> > expect
> >> > the jar to be send to yarn over the wire upon job submission?
> >> >
> >> >
> >
> >
>

Mime
View raw message