spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config
Date Mon, 21 Aug 2017 12:04:00 GMT


Sean Owen commented on SPARK-21752:

Docs can't hurt, especially if they can be applied consistently. I think we'd want to say
that spark.jars.packages and spark.jars (and spark.jars.ivy ? and others?) can't be set programmatically
if they're meant to be available to the driver. My only hesitation is I'm honestly not sure
the extent of which configs that applies to, but getting most of them wouldn't hurt.

> Config spark.jars.packages is ignored in SparkSession config
> ------------------------------------------------------------
>                 Key: SPARK-21752
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jakub Nowacki
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
>     .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
>     .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
>     .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, and if
I use the loaded classes, Mongo connector in this case, but it's the same for other packages,
I get {{java.lang.ClassNotFoundException}} for the missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option {{--packages}},
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config(conf=conf)\
>     .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, I didn't
check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the {{SparkSession}}
builder config.
> Note that this is related to creating new {{SparkSession}} as getting new packages into
existing {{SparkSession}} doesn't indeed make sense. Thus this will only work with bare Python,
Scala or Java, and not on {{pyspark}} or {{spark-shell}} as they create the session automatically;
it this case one would need to use {{--packages}} option. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message