spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakub Nowacki (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config
Date Wed, 16 Aug 2017 21:47:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129475#comment-16129475
] 

Jakub Nowacki edited comment on SPARK-21752 at 8/16/17 9:46 PM:
----------------------------------------------------------------

I'm aware you cannot do it with pyspark command as you have a session automatically created
there. 

We use this spark session creation with Jupyter notebook or some workflow scripts (e.g. used
in Airflow), so this is pretty much bare Python with pyspark being a module; much like creating
SparkSession in Scala object's main function. I'm assuming you don't have SparkSession running
beforehand.

As for the double parenthesis in the first one, yes true, sorry. But it doesn't work nonetheless
as the parenthesis gives you just a syntax error.


was (Author: jsnowacki):
OK so you don't need session creation with pyspark command line. We use this spark session
creation with Jupyter notebook, so this is pretty much bare Python with pyspark being a module;
much like creating SparkSession in Scala object's main function. I'm assuming you don't have
SparkSession running beforehand.

As for the double parenthesis in the first one, yes true, sorry. But it doesn't work nonetheless
as the parenthesis gives you just a syntax error.

> Config spark.jars.packages is ignored in SparkSession config
> ------------------------------------------------------------
>
>                 Key: SPARK-21752
>                 URL: https://issues.apache.org/jira/browse/SPARK-21752
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
>     .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
>     .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
>     .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, and if
I use the loaded classes, Mongo connector in this case, but it's the same for other packages,
I get {{java.lang.ClassNotFoundException}} for the missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option {{--packages}},
e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0
pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
>     .appName('test-mongo')\
>     .master('local[*]')\
>     .config(conf=conf)\
>     .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, I didn't
check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the {{SparkSession}}
builder config.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message