spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Sanduleac (JIRA)" <>
Subject [jira] [Created] (SPARK-20001) Support PythonRunner executing inside a Conda env
Date Fri, 17 Mar 2017 16:47:41 GMT
Dan Sanduleac created SPARK-20001:

             Summary: Support PythonRunner executing inside a Conda env
                 Key: SPARK-20001
             Project: Spark
          Issue Type: New Feature
          Components: PySpark, Spark Core
    Affects Versions: 2.2.0
            Reporter: Dan Sanduleac

Similar to SPARK-13587, I'm trying to allow the user to configure a Conda environment that
PythonRunner will run from. 
This change remembers theconda environment found on the driver and installs the same packages
on the executor side, only once per PythonWorkerFactory. The list of requested conda packages
are added to the PythonWorkerFactory cache, so two collects using the same environment (incl
packages) can re-use the same running executors.

This issue requires that the conda binary is already available on the driver as well as executors,
you just have to specify where it can be found.

Please see the attached issue on palantir/spark for additional details.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message