spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Imran Rashid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17931) taskScheduler has some unneeded serialization
Date Wed, 01 Mar 2017 18:29:45 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890749#comment-15890749
] 

Imran Rashid commented on SPARK-17931:
--------------------------------------

[~gbloisi] thanks for reporting the issue.  Can you go ahead and open another jira for this?
 Please ping me on it.  Since you have a reproduction, it would also be helpful if you could
tell us what the offending property is that is going over 64KB.  Eg., you could do something
like this (untested):

{code}
if (value.size > 16*1024) {
  val f = File.createTempFile(s"long_property_$key","txt")
  logWarning(s"Value for $key has length ${value.size}, writing to $f")
  val out = new PrintWriter(f)
  out.println(value)
  out.close()  
}
{code}

and then attach the generated file.

your workaround looks pretty dangerous -- if that property were actually important, then just
randomly truncating it would be a big problem.  This method should be safe to long strings,
but we might also want to find the source of that long string and avoid it.

> taskScheduler has some unneeded serialization
> ---------------------------------------------
>
>                 Key: SPARK-17931
>                 URL: https://issues.apache.org/jira/browse/SPARK-17931
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>            Reporter: Guoqiang Li
>            Assignee: Kay Ousterhout
>             Fix For: 2.2.0
>
>
> In the existing code, there are three layers of serialization
> involved in sending a task from the scheduler to an executor:
> - A Task object is serialized
> - The Task object is copied to a byte buffer that also
> contains serialized information about any additional JARs,
> files, and Properties needed for the task to execute. This
> byte buffer is stored as the member variable serializedTask
> in the TaskDescription class.
> - The TaskDescription is serialized (in addition to the serialized
> task + JARs, the TaskDescription class contains the task ID and
> other metadata) and sent in a LaunchTask message.
> While it is necessary to have two layers of serialization, so that
> the JAR, file, and Property info can be deserialized prior to
> deserializing the Task object, the third layer of deserialization is
> unnecessary (this is as a result of SPARK-2521). We should
> eliminate a layer of serialization by moving the JARs, files, and Properties
> into the TaskDescription class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message