spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burak Yavuz <brk...@gmail.com>
Subject Re: Strange behavoir of pyspark with --jars option
Date Wed, 15 Jul 2015 06:38:45 GMT
Hi,
I believe the HiveContext uses a different class loader. It then falls back
to the system class loader if it can't find the classes in the context
class loader. The system class loader contains the classpath passed
through --driver-class-path
and spark.executor.extraClassPath. The JVM is already running during the
resolution of jars in --jars, therefore, they can't be added to the System
Classloader. Instead they live in a separate context classloader, which the
HiveContext doesn't use, hence the lost dependencies.

I know what I wrote may be a little complicated, please let me know if you
have any problems. I HTH.

Best,
Burak

On Tue, Jul 14, 2015 at 11:15 PM, gen tang <gen.tang86@gmail.com> wrote:

> Hi,
>
> I met some interesting problems with --jars options
> As I use the third party dependencies: elasticsearch-spark, I pass this
> jar with the following command:
> ./bin/spark-submit --jars path-to-dependencies ...
> It works well.
> However, if I use HiveContext.sql, spark will lost the dependencies that I
> passed.It seems that the execution of HiveContext will override the
> configuration.(But if we check sparkContext._conf, the configuration is
> unchanged)
>
> But if I passed dependencies with --driver-class-path
> and spark.executor.extraClassPath. The problem will disappear.
>
> Is there anyone know why this interesting problem happens?
>
> Thanks a lot for your help in advance.
>
> Cheers
> Gen
>

Mime
View raw message