spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19019) PySpark does not work with Python 3.6.0
Date Tue, 02 May 2017 10:50:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992699#comment-15992699
] 

Hyukjin Kwon commented on SPARK-19019:
--------------------------------------

To solve this problem fully, I had to port cloudpickle change too in the PR. Only fixing hijected
one described above dose not fully solve this issue. Please refer the discussion in the PR
and the change.

> PySpark does not work with Python 3.6.0
> ---------------------------------------
>
>                 Key: SPARK-19019
>                 URL: https://issues.apache.org/jira/browse/SPARK-19019
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Critical
>             Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0
>
>
> Currently, PySpark does not work with Python 3.6.0.
> Running {{./bin/pyspark}} simply throws the error as below:
> {code}
> Traceback (most recent call last):
>   File ".../spark/python/pyspark/shell.py", line 30, in <module>
>     import pyspark
>   File ".../spark/python/pyspark/__init__.py", line 46, in <module>
>     from pyspark.context import SparkContext
>   File ".../spark/python/pyspark/context.py", line 36, in <module>
>     from pyspark.java_gateway import launch_gateway
>   File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
>     from py4j.java_gateway import java_import, JavaGateway, GatewayClient
>   File "<frozen importlib._bootstrap>", line 961, in _find_and_load
>   File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
>   File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
>   File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
>   File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
>   File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py",
line 62, in <module>
>     import pkgutil
>   File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py",
line 22, in <module>
>     ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
>   File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
>     cls = _old_namedtuple(*args, **kwargs)
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename',
and 'module'
> {code}
> The problem is in https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394
as the error says and the cause seems because the arguments of {{namedtuple}} are now completely
keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).
> We currently copy this function via {{types.FunctionType}} which does not set the default
values of keyword-only arguments (meaning {{namedtuple.__kwdefaults__}}) and this seems causing
internally missing values in the function (non-bound arguments).
> This ends up as below:
> {code}
> import types
> import collections
> def _copy_func(f):
>     return types.FunctionType(f.__code__, f.__globals__, f.__name__,
>         f.__defaults__, f.__closure__)
> _old_namedtuple = _copy_func(collections.namedtuple)
> _old_namedtuple(, "b")
> _old_namedtuple("a")
> {code}
> If we call as below:
> {code}
> >>> _old_namedtuple("a", "b")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename',
and 'module'
> {code}
> It throws an exception as above becuase {{__kwdefaults__}} for required keyword arguments
seem unset in the copied function. So, if we give explicit value for these,
> {code}
> >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)
> <class '__main__.a'>
> {code}
> It works fine.
> It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message