spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Robbins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout
Date Sun, 28 Jan 2018 19:39:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342688#comment-16342688
] 

Bruce Robbins commented on SPARK-23240:
---------------------------------------

Hi [~hyukjin.kwon],

I am not sure this update covers the case where python site-local customizations puts arbitrary
data into stdout before the daemon module (whatever it is) is able to put a port number
into stdout. What happens in that case is that PythonWorkerFactory ends up reading the arbitrary
data as the port number, since it exists in the python process's stdout ahead of the actual
port number. My proposed pull request will not fix that, but it will produce an error message
that will explicitly implicate the daemon's python process as the source of the bad port
number.

> PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-23240
>                 URL: https://issues.apache.org/jira/browse/SPARK-23240
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.1
>            Reporter: Bruce Robbins
>            Priority: Minor
>
> Environmental issues or site-local customizations (i.e., sitecustomize.py present in
the python install directory) can interfere with daemon.py’s output to stdout. PythonWorkerFactory
produces unhelpful messages when this happens, causing some head scratching before the actual
issue is determined.
> Case #1: Extraneous data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory
uses the output as the daemon’s port number and ends up throwing an exception when creating
the socket:
> {noformat}
> java.lang.IllegalArgumentException: port out of range:1819239265
> 	at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
> 	at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
> 	at java.net.Socket.<init>(Socket.java:244)
> 	at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:78)
> {noformat}
> Case #2: No data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory throws
an EOFException exception reading the from the Process input stream.
> The second case is somewhat less mysterious than the first, because PythonWorkerFactory
also displays the stderr from the python process.
> When there is unexpected or missing output in pyspark.daemon’s stdout, PythonWorkerFactory
should say so.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message