spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitiry (JIRA)" <>
Subject [jira] [Created] (SPARK-23460) PySpark concurrency python egg cache directory
Date Sun, 18 Feb 2018 06:25:00 GMT
Dmitiry created SPARK-23460:

             Summary: PySpark concurrency python egg cache directory
                 Key: SPARK-23460
             Project: Spark
          Issue Type: Question
          Components: PySpark
    Affects Versions: 2.1.2
         Environment: YARN last
            Reporter: Dmitiry

We are experiencing intermittent failures when running task on pyspark while installing dependencies
through --py-files with python egg. We set (else permission denied on egg cache):
--conf "spark.executorEnv.PYTHON_EGG_CACHE=./.python-eggs"{noformat}

INFO - File "build/bdist.linux-x86_64/egg/ua_parser/", line 409, in <module>
INFO - File "/usr/lib/python2.7/dist-packages/", line 904, in resource_filename
INFO - self, resource_name
INFO - File "/usr/lib/python2.7/dist-packages/", line 1380, in get_resource_filename
INFO - return self._extract_resource(manager, zip_path)
INFO - File "/usr/lib/python2.7/dist-packages/", line 1405, in _extract_resource
INFO - self.egg_name, self._parts(zip_path)
INFO - File "/usr/lib/python2.7/dist-packages/", line 984, in get_cache_path
INFO - self.extraction_error()
INFO - File "/usr/lib/python2.7/dist-packages/", line 950, in extraction_error
INFO - raise err
INFO - ExtractionError: Can't extract file(s) to egg cache
INFO - The following error occurred while trying to extract file(s) to the Python egg
INFO - cache:
INFO - [Errno 17] File exists: './.python-eggs'
INFO - The Python egg cache directory is currently set to:
INFO - ./.python-eggs/
INFO - Perhaps your account does not have write access to this directory? You can
INFO - change the cache directory by setting the PYTHON_EGG_CACHE environment
INFO - variable to point to an accessible directory.{noformat}

We create a package with an option `safe_zip=False`. But pyspark whatever use egg cache directory.

Is there any way around this?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message