spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Robbins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout
Date Sat, 10 Feb 2018 18:27:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359568#comment-16359568
] 

Bruce Robbins commented on SPARK-23240:
---------------------------------------

A little background. A Spark installation had a Python sitecustomize.py like this:

 
{code:java}
try:
   import flotilla
 except ImportError as e:
   print e{code}
 

(flotilla is not the real python module, I just use that as an example).

Because flotilla was not installed on the user's cluster, the first output in daemon's stdout
was:

 
{noformat}
No module named flotilla{noformat}
 

In fact, this is what I get when I run pyspark.daemon with this sitecustomize.py installed:
{noformat}
bash-3.2$ python -m pyspark.daemon
python -m pyspark.daemon
No module named flotilla
^@^@\325{noformat}
Therefore, PythonWorkerFactory.startDaemon reads 'No m', or 0x4e6f206d or 1315905645, as
the port number.

Here's what happens when I run a pyspark action with the above sitecustomize.py installed:
{noformat}
>>> text_file = sc.textFile("/Users/bruce/ncdc_gsod").count()
odule named flotilla
18/02/10 09:44:27 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.IllegalArgumentException: port out of range:1315905645
 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
 at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
 at java.net.Socket.<init>(Socket.java:244){noformat}




 

> PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-23240
>                 URL: https://issues.apache.org/jira/browse/SPARK-23240
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.1
>            Reporter: Bruce Robbins
>            Priority: Minor
>
> Environmental issues or site-local customizations (i.e., sitecustomize.py present in
the python install directory) can interfere with daemon.py’s output to stdout. PythonWorkerFactory
produces unhelpful messages when this happens, causing some head scratching before the actual
issue is determined.
> Case #1: Extraneous data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory
uses the output as the daemon’s port number and ends up throwing an exception when creating
the socket:
> {noformat}
> java.lang.IllegalArgumentException: port out of range:1819239265
> 	at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
> 	at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
> 	at java.net.Socket.<init>(Socket.java:244)
> 	at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:78)
> {noformat}
> Case #2: No data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory throws
an EOFException exception reading the from the Process input stream.
> The second case is somewhat less mysterious than the first, because PythonWorkerFactory
also displays the stderr from the python process.
> When there is unexpected or missing output in pyspark.daemon’s stdout, PythonWorkerFactory
should say so.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message