spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fernando O." <fot...@gmail.com>
Subject Re: Trying to make spark-jobserver work with yarn
Date Wed, 31 Dec 2014 17:24:19 GMT
Before jumping into a sea of dependencies and bash files:
Does anyone have an example of how to run a spark job without using
spark-submit or shell ?

On Tue, Dec 30, 2014 at 3:23 PM, Fernando O. <fotero@gmail.com> wrote:

> Hi all,
>     I'm investigating spark for a new project and I'm trying to use
> spark-jobserver because... I need to reuse and share RDDs and from what I
> read in the forum that's the "standard" :D
>
> Turns out that spark-jobserver doesn't seem to work on yarn, or at least
> it does not on 1.1.1
>
> My config is spark 1.1.1 (moving to 1.2.0 soon), hadoop 2.6 (which seems
> compatible with 2.4 from spark point of view... at least I was able to run
> spark-submit and shell tasks both in yarn-client and yarn-cluster modes)
>
>
>
>
> going back to my original point, I did some changes in spark-jobserver and
> how I can submit a job but I get:
>
> ....
> [2014-12-30 18:20:19,769] INFO  e.spark.deploy.yarn.Client []
> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample]
> - Max mem capabililty of a single resource in this cluster 15000
> [2014-12-30 18:20:19,770] INFO  e.spark.deploy.yarn.Client []
> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample]
> - Preparing Local resources
> [2014-12-30 18:20:20,041] INFO  e.spark.deploy.yarn.Client []
> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample]
> - Prepared Local resources Map(__spark__.jar -> resource { scheme: "file"
> port: -1 file:
> "/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar"
> } size: 343226 timestamp: 1416429031000 type: FILE visibility: PRIVATE)
>
> [...]
>
> [2014-12-30 18:20:20,139] INFO  e.spark.deploy.yarn.Client []
> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample]
> - Yarn AM launch context:
> [2014-12-30 18:20:20,140] INFO  e.spark.deploy.yarn.Client []
> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample]
> -   class:   org.apache.spark.deploy.yarn.ExecutorLauncher
> [2014-12-30 18:20:20,140] INFO  e.spark.deploy.yarn.Client []
> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample]
> -   env:     Map(CLASSPATH ->
> $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/*,
> SPARK_YARN_CACHE_FILES_FILE_SIZES -> 343226, SPARK_YARN_STAGING_DIR ->
> .sparkStaging/application_1419963137232_0001/,
> SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE, SPARK_USER -> ec2-user,
> SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS ->
> 1416429031000, SPARK_YARN_CACHE_FILES ->
> file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar#__spark__.jar)
>
> [...]
>
> [2014-12-30 18:03:04,474] INFO  YarnClientSchedulerBackend []
> [akka://JobServer/user/context-supervisor/ebac0153-spark.jobserver.WordCountExample]
> - Application report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1419962580444
>  yarnAppState: FAILED
>
> [2014-12-30 18:03:04,475] ERROR .jobserver.JobManagerActor []
> [akka://JobServer/user/context-supervisor/ebac0153-spark.jobserver.WordCountExample]
> - Failed to create context ebac0153-spark.jobserver.WordCountExample,
> shutting down actor
> org.apache.spark.SparkException: Yarn application already ended,might be
> killed or not able to launch application master.
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:117)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:93)
>
>
>
> In the hadoop console I can get the detailed issue
>
> Diagnostics: File
> file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar
> does not exist
> java.io.FileNotFoundException: File
> file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar
> does not exist
>
> now... it seems like spark is actually use a file I used for launching the
> task in other nodes
>
> Can anyone point me in the right direction of where that might be being
> set?
>
>

Mime
View raw message