spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
Date Thu, 21 May 2015 18:13:19 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554779#comment-14554779
] 

Josh Rosen commented on SPARK-7708:
-----------------------------------

It doesn't surprise me that Kryo doesn't work for closure serialization, since I don't know
that we have any end-to-end tests that run real jobs with the Kryo closure serializer (we
do have tests for using it for data serialization, though).  I'd really like to fix this,
though, so a pull request with a failing regression test and a fix would be very welcome.
 Perhaps we need to register SerializableBuffer with Kryo or do something similar so that
Kryo handles this case properly.

> Incorrect task serialization with Kryo closure serializer
> ---------------------------------------------------------
>
>                 Key: SPARK-7708
>                 URL: https://issues.apache.org/jira/browse/SPARK-7708
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.2
>            Reporter: Akshat Aranya
>
> I've been investigating the use of Kryo for closure serialization with Spark 1.2, and
it seems like I've hit upon a bug:
> When a task is serialized before scheduling, the following log message is generated:
> [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, <host>,
PROCESS_LOCAL, 302 bytes)
> This message comes from TaskSetManager which serializes the task using the closure serializer.
 Before the message is sent out, the TaskDescription (which included the original task as
a byte array), is serialized again into a byte array with the closure serializer.  I added
a log message for this in CoarseGrainedSchedulerBackend, which produces the following output:
> [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132
> The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ than serialized
task that it contains (302 bytes). This implies that TaskDescription.buffer is not getting
serialized correctly.
> On the executor side, the deserialization produces a null value for TaskDescription.buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message