spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <>
Subject [jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
Date Thu, 21 May 2015 18:13:19 GMT


Josh Rosen commented on SPARK-7708:

It doesn't surprise me that Kryo doesn't work for closure serialization, since I don't know
that we have any end-to-end tests that run real jobs with the Kryo closure serializer (we
do have tests for using it for data serialization, though).  I'd really like to fix this,
though, so a pull request with a failing regression test and a fix would be very welcome.
 Perhaps we need to register SerializableBuffer with Kryo or do something similar so that
Kryo handles this case properly.

> Incorrect task serialization with Kryo closure serializer
> ---------------------------------------------------------
>                 Key: SPARK-7708
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.2
>            Reporter: Akshat Aranya
> I've been investigating the use of Kryo for closure serialization with Spark 1.2, and
it seems like I've hit upon a bug:
> When a task is serialized before scheduling, the following log message is generated:
> [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, <host>,
PROCESS_LOCAL, 302 bytes)
> This message comes from TaskSetManager which serializes the task using the closure serializer.
 Before the message is sent out, the TaskDescription (which included the original task as
a byte array), is serialized again into a byte array with the closure serializer.  I added
a log message for this in CoarseGrainedSchedulerBackend, which produces the following output:
> [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132
> The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ than serialized
task that it contains (302 bytes). This implies that TaskDescription.buffer is not getting
serialized correctly.
> On the executor side, the deserialization produces a null value for TaskDescription.buffer.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message