livy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Hukill <ghuk...@gmail.com>
Subject including jar file in Livy in YARN
Date Wed, 06 Sep 2017 20:29:13 GMT
I've got a local jar file that I would like to use for spark jobs, let's
call it *foo.jar*.

I've successfully used it with pyspark, spark-submit, and through Livy's
REST API and python HttpClient.  However, I'm trying to get Livy running
through YARN, but I can't figure out how to include this jar file in a way
that Spark running in YARN will see it.

Formerly, I would include the path of this jar file in the config that I
sent with Livy session creation, like:

*LIVY_DEFAULT_SESSION_CONFIG = {*
*    'kind':'pyspark',*
*    'jars':['/path/to/foo.jar']*
*    }*

However, now that I'm trying the Livy Python HttpClient, I don't have that
option.  The client *does *have *client.add_jar()*, and that works if I'm
not running behind YARN.  But with YARN, I just keep getting
*ClassNotFoundException
*related to this jar being missing.

I've tried uploading this jar file to the HDFS, and then including that in
my spark conf file at SPARK_HOME/conf/spark-defaults.conf:
*# spark yarn*
*spark.yarn.jars hdfs://localhost/user/USERACCOUNT/foo.jar*

But I get a somewhat confusing message when Livy starts a session:
*INFO Client: Source and destination file systems are the same. Not
copying **hdfs://localhost/user/USERACCOUNT/foo.jar*

Is there a preferred way to include external jar files for Spark jobs in
Livy running in YARN?  I'd even be okay just copying this file to a
directory that Livy uploads with each session.

With spark_submit, it was pretty cut and dry that I'd include *--jars* with
the command, but I don't feel as though I have a similar option with Livy.
Where I could pass it in the session configuration (see above), or even
from the HttpClient method *client.add_jar()*, those no longer work with
YARN.

It kind of makes sense... that it's running in the YARN context and would
need to be uploaded to when the session is created, and that perhaps YARN
cannot see outside of the HDFS.

I've seen some say that pointing to the wrong Hadoop conf directory was the
problem, but as far as I can tell, I'm pointing to the correct place by
setting this in Livy's livy-env.sh:
*HADOOP_CONF_DIR=/Users/USERACCOUNT/opt/hadoop-2.7.4/etc/hadoop*

And here's my livy.conf related to deployment:



*# What spark master Livy sessions should use.livy.spark.master = yarn#
What spark deploy mode Livy sessions should use.livy.spark.deployMode =
client# If livy should impersonate the requesting users when creating a new
session.livy.impersonation.enabled = true*

I have a feeling this must be somewhat simple, but I'm quite stumped.  Any
suggestions would be much appreciated.

thanks,
Graham

Mime
View raw message