beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (BEAM-626) AvroCoder not deserializing correctly in Kryo
Date Mon, 31 Oct 2016 17:55:58 GMT

    [ https://issues.apache.org/jira/browse/BEAM-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622842#comment-15622842
] 

Amit Sela edited comment on BEAM-626 at 10/31/16 5:54 PM:
----------------------------------------------------------

Those objects rely on Java serialization which is outperformed by Kryo serialization.
This may not be an issue for other frameworks as they serialize tasks (The DoFn and it's content)
to the worker because it happens every time a new worker is deployed/utilized but Spark streaming
has to do this every batch interval (sometimes as frequent as 500 msec) so it is better to
use Kryo. 

So far {{AvroCoder}} is the only issue, and the entire Spark community (which is one of the
largest Apache communities) deals with Kryo just fine.

As discussed before, if this becomes a pain I could expose an API to register serializers
(falling back to {{JavaSerialization}} is one of them).

What I really don't understand is the stand on this ticket - "AvroCoder not deserializing
correctly in Kryo" - as in making the coder work with Kryo..
If the proposed solution will degrade the quality/performance of the current state of the
implementation I'm for simply registering with Java but if this is not the case, what's the
harm with the proposed Kryo fix ?

>From my point of view, enabling the coder to work with Kryo while NOT making it worse
in any way, is a good thing.


was (Author: amitsela):
Those objects rely on Java serialization which is outperformed by Kryo serialization.
This may not be an issue for other frameworks as they serialize tasks (The DoFn and it's content)
to the worker because it happens every time a new worker is deployed/utilized but Spark streaming
has to do this every batch interval (sometimes as frequent as 500 msec) so it is better to
use Kryo. 

So far {{AvroCoder}} is the only issue, and the entire Spark community (which is one of the
largest Apache communities) deals with Kryo just fine.

As discussed before, if this becomes a pain I could expose an API to register serializers
(falling back to {{JavaSerialization}} is one of them).

What I really don't understand is the stand on this ticket - "AvroCoder not deserializing
correctly in Kryo" - as in making the coder work with Kryo..
If the proposed solution was to degrade the quality/performance of the current state of the
implementation I would be the first to suggest we simply register with Java but this is not
the case. First proposal wasn't threadsafe, which is not different from the current state,
and adding synchronization might heart performance (regardless of Kryo).

>From my point of view, enabling the coder to work with Kryo while NOT making it worse
in any way, is a good thing.

> AvroCoder not deserializing correctly in Kryo
> ---------------------------------------------
>
>                 Key: BEAM-626
>                 URL: https://issues.apache.org/jira/browse/BEAM-626
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Aviem Zur
>            Assignee: Aviem Zur
>            Priority: Minor
>
> Unlike with Java serialization, when deserializing AvroCoder using Kryo, the resulting
AvroCoder is missing all of its transient fields.
> The reason it works with Java serialization is because of the usage of writeReplace and
readResolve, which Kryo does not adhere to.
> In ProtoCoder for example there are also unserializable members, the way it is solved
there is lazy initializing these members via their getters, so they are initialized in the
deserialized object on first call to the member.
> It seems AvroCoder is the only class in Beam to use writeReplace convention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message