spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21796) pyspark count failed in python3.5.2
Date Mon, 21 Aug 2017 09:47:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134960#comment-16134960
] 

Sean Owen commented on SPARK-21796:
-----------------------------------

OK, but you still likely have some problem if your Python processes are failing to read data
from a socket when unpickling. Could be mismatched Spark versions, packages, maybe not all
machines are updated as you think they are, maybe your config isn't taking, etc. You seem
to have narrowed it down to an env problem, right?

> pyspark count failed in python3.5.2
> -----------------------------------
>
>                 Key: SPARK-21796
>                 URL: https://issues.apache.org/jira/browse/SPARK-21796
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.1
>         Environment: Python 3.5.2  Anaconda3 4.2.0
>            Reporter: cen yuhai
>         Attachments: user
>
>
> steps:
> {code}
> pyspark
> user_data = sc.textFile("/data/external_table/ods/table/dt=2017-08-17/hour=01/*.txt")
> user_data.count()
> {code}
> Exceptions:
> {code}
> Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
>   File "/home/master/platform/spark/python/pyspark/worker.py", line 98, in main
>     command = pickleSer._read_with_length(infile)
>   File "/home/master/platform/spark/python/pyspark/serializers.py", line 164, in _read_with_length
>     return self.loads(obj)
>   File "/home/master/platform/spark/python/pyspark/serializers.py", line 419, in loads
>     return pickle.loads(obj, encoding=encoding)
> EOFError: Ran out of input
>         at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message