spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuanjian Li (JIRA)" <>
Subject [jira] [Created] (SPARK-26549) PySpark worker reuse take no effect for Python3
Date Sat, 05 Jan 2019 18:49:00 GMT
Yuanjian Li created SPARK-26549:

             Summary: PySpark worker reuse take no effect for Python3
                 Key: SPARK-26549
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.0.0
            Reporter: Yuanjian Li

During [the follow-up work|]
for PySpark worker reuse scenario, we found that the worker reuse takes no effect for Python3
while works properly for Python2 and PyPy.
It happened because, during the python worker check end of the stream in Python3, we got an
unexpected value -1 here which refers to END_OF_DATA_SECTION. See the code in
# check end of stream
if read_int(infile) == SpecialLengths.END_OF_STREAM:
    write_int(SpecialLengths.END_OF_STREAM, outfile)
    # write a different value to tell JVM to not reuse this worker
    write_int(SpecialLengths.END_OF_DATA_SECTION, outfile)
The code works well for Python2 and PyPy cause the END_OF_DATA_SECTION has been handled during
load iterator from the socket stream, see the code in FramedSerializer:

    def load_stream(self, stream):
        while True:
                yield self._read_with_length(stream)
            except EOFError:

    def _read_with_length(self, stream):
        length = read_int(stream)
        if length == SpecialLengths.END_OF_DATA_SECTION:
            raise EOFError #END_OF_DATA_SECTION raised EOF here and catched in load_stream
        elif length == SpecialLengths.NULL:
            return None
        obj =
        if len(obj) < length:
            raise EOFError
        return self.loads(obj)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message