spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jey Kottalam <>
Subject Re: PySpark on PyPi
Date Fri, 05 Jun 2015 20:12:46 GMT
Couldn't we have a pip installable "pyspark" package that just serves as a
shim to an existing Spark installation? Or it could even download the
latest Spark binary if SPARK_HOME isn't set during installation. Right now,
Spark doesn't play very well with the usual Python ecosystem. For example,
why do I need to use a strange incantation when booting up IPython if I
want to use PySpark in a notebook with MASTER="local[4]"? It would be much
nicer to just type `from pyspark import SparkContext; sc =
SparkContext("local[4]")` in my notebook.

I did a test and it seems like PySpark's basic unit-tests do pass when
SPARK_HOME is set and Py4J is on the PYTHONPATH:

python $SPARK_HOME/python/pyspark/


On Fri, Jun 5, 2015 at 10:57 AM, Josh Rosen <> wrote:

> This has been proposed before:
> There's currently tighter coupling between the Python and Java halves of
> PySpark than just requiring SPARK_HOME to be set; if we did this, I bet
> we'd run into tons of issues when users try to run a newer version of the
> Python half of PySpark against an older set of Java components or
> vice-versa.
> On Thu, Jun 4, 2015 at 10:45 PM, Olivier Girardot <
>> wrote:
>> Hi everyone,
>> Considering the python API as just a front needing the SPARK_HOME defined
>> anyway, I think it would be interesting to deploy the Python part of Spark
>> on PyPi in order to handle the dependencies in a Python project needing
>> PySpark via pip.
>> For now I just symlink the python/pyspark in my python install dir
>> site-packages/ in order for PyCharm or other lint tools to work properly.
>> I can do the work or anything.
>> What do you think ?
>> Regards,
>> Olivier.

View raw message