spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Imran Rashid (JIRA)" <>
Subject [jira] [Commented] (SPARK-17931) taskScheduler has some unneeded serialization
Date Wed, 01 Mar 2017 18:29:45 GMT


Imran Rashid commented on SPARK-17931:

[~gbloisi] thanks for reporting the issue.  Can you go ahead and open another jira for this?
 Please ping me on it.  Since you have a reproduction, it would also be helpful if you could
tell us what the offending property is that is going over 64KB.  Eg., you could do something
like this (untested):

if (value.size > 16*1024) {
  val f = File.createTempFile(s"long_property_$key","txt")
  logWarning(s"Value for $key has length ${value.size}, writing to $f")
  val out = new PrintWriter(f)

and then attach the generated file.

your workaround looks pretty dangerous -- if that property were actually important, then just
randomly truncating it would be a big problem.  This method should be safe to long strings,
but we might also want to find the source of that long string and avoid it.

> taskScheduler has some unneeded serialization
> ---------------------------------------------
>                 Key: SPARK-17931
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>            Reporter: Guoqiang Li
>            Assignee: Kay Ousterhout
>             Fix For: 2.2.0
> In the existing code, there are three layers of serialization
> involved in sending a task from the scheduler to an executor:
> - A Task object is serialized
> - The Task object is copied to a byte buffer that also
> contains serialized information about any additional JARs,
> files, and Properties needed for the task to execute. This
> byte buffer is stored as the member variable serializedTask
> in the TaskDescription class.
> - The TaskDescription is serialized (in addition to the serialized
> task + JARs, the TaskDescription class contains the task ID and
> other metadata) and sent in a LaunchTask message.
> While it is necessary to have two layers of serialization, so that
> the JAR, file, and Property info can be deserialized prior to
> deserializing the Task object, the third layer of deserialization is
> unnecessary (this is as a result of SPARK-2521). We should
> eliminate a layer of serialization by moving the JARs, files, and Properties
> into the TaskDescription class.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message