spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-2580) broken pipe collecting schemardd results
Date Tue, 29 Jul 2014 07:34:38 GMT

     [ https://issues.apache.org/jira/browse/SPARK-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Rosen resolved SPARK-2580.
-------------------------------

          Resolution: Fixed
       Fix Version/s: 1.0.3
                      1.1.0
    Target Version/s: 1.0.2

> broken pipe collecting schemardd results
> ----------------------------------------
>
>                 Key: SPARK-2580
>                 URL: https://issues.apache.org/jira/browse/SPARK-2580
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.0.0
>         Environment: fedora 21 local and rhel 7 clustered (standalone)
>            Reporter: Matthew Farrellee
>            Assignee: Davies Liu
>              Labels: py4j, pyspark
>             Fix For: 1.1.0, 1.0.3
>
>
> {code}
> from pyspark.sql import SQLContext
> sqlCtx = SQLContext(sc)
> # size of cluster impacts where this breaks (i.e. 2**15 vs 2**2)
> data = sc.parallelize([{'name': 'index', 'value': 0}] * 2**20)
> sdata = sqlCtx.inferSchema(data)
> sdata.first()
> {code}
> result: note - result returned as well as error
> {code}
> >>> sdata.first()
> 14/07/18 12:10:25 INFO SparkContext: Starting job: runJob at PythonRDD.scala:290
> 14/07/18 12:10:25 INFO DAGScheduler: Got job 43 (runJob at PythonRDD.scala:290) with
1 output partitions (allowLocal=true)
> 14/07/18 12:10:25 INFO DAGScheduler: Final stage: Stage 52(runJob at PythonRDD.scala:290)
> 14/07/18 12:10:25 INFO DAGScheduler: Parents of final stage: List()
> 14/07/18 12:10:25 INFO DAGScheduler: Missing parents: List()
> 14/07/18 12:10:25 INFO DAGScheduler: Computing the requested partition locally
> 14/07/18 12:10:25 INFO PythonRDD: Times: total = 45, boot = 3, init = 40, finish = 2
> 14/07/18 12:10:25 INFO SparkContext: Job finished: runJob at PythonRDD.scala:290, took
0.048348426 s
> {u'name': u'index', u'value': 0}
> >>> PySpark worker failed with exception:
> Traceback (most recent call last):
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/worker.py", line
77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/serializers.py",
line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/serializers.py",
line 124, in dump_stream
>     self._write_with_length(obj, stream)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/serializers.py",
line 139, in _write_with_length
>     stream.write(serialized)
> IOError: [Errno 32] Broken pipe
> Traceback (most recent call last):
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/daemon.py", line
130, in launch_worker
>     worker(listen_sock)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/daemon.py", line
119, in worker
>     outfile.flush()
> IOError: [Errno 32] Broken pipe
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message