spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker
Date Tue, 30 Apr 2019 21:08:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-27548:
------------------------------------

    Assignee: Apache Spark

> PySpark toLocalIterator does not raise errors from worker
> ---------------------------------------------------------
>
>                 Key: SPARK-27548
>                 URL: https://issues.apache.org/jira/browse/SPARK-27548
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.1
>            Reporter: Bryan Cutler
>            Assignee: Apache Spark
>            Priority: Major
>
> When using a PySpark RDD local iterator and an error occurs on the worker, it is not
picked up by Py4J and raised in the Python driver process. So unless looking at logs, there
is no way for the application to know the worker had an error. This is a test that should
pass if the error is raised in the driver:
> {code}
> def test_to_local_iterator_error(self):
>     def fail(_):
>         raise RuntimeError("local iterator error")
>     rdd = self.sc.parallelize(range(10)).map(fail)
>     with self.assertRaisesRegexp(Exception, "local iterator error"):
>         for _ in rdd.toLocalIterator():
>             pass{code}
> but it does not raise an exception:
> {noformat}
> Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
>   File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 428, in
main
>     process()
>   File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 423, in
process
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 505,
in dump_stream
>     vs = list(itertools.islice(iterator, batch))
>   File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/util.py", line 99, in wrapper
>     return f(*args, **kwargs)
>   File "/home/bryan/git/spark/python/pyspark/tests/test_rdd.py", line 742, in fail
>     raise RuntimeError("local iterator error")
> RuntimeError: local iterator error
>     at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:453)
> ...
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> FAIL
> ======================================================================
> FAIL: test_to_local_iterator_error (pyspark.tests.test_rdd.RDDTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/bryan/git/spark/python/pyspark/tests/test_rdd.py", line 748, in test_to_local_iterator_error
>     pass
> AssertionError: Exception not raised{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message