spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prabeesh k <>
Subject Re: Packaging Java + Python library
Date Mon, 13 Apr 2015 14:18:49 GMT
Refer this post

On 13 April 2015 at 17:41, Punya Biswal <> wrote:

> Dear Spark users,
> My team is working on a small library that builds on PySpark and is
> organized like PySpark as well -- it has a JVM component (that runs in the
> Spark driver and executor) and a Python component (that runs in the PySpark
> driver and executor processes). What's a good approach for packaging such a
> library?
> Some ideas we've considered:
>    - Package up the JVM component as a Jar and the Python component as a
>    binary egg. This is reasonable but it means that there are two separate
>    artifacts that people have to manage and keep in sync.
>    - Include Python files in the Jar and add it to the PYTHONPATH. This
>    follows the example of the Spark assembly jar, but deviates from the Python
>    community's standards.
> We'd really appreciate hearing experiences from other people who have
> built libraries on top of PySpark.
> Punya

View raw message