apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vlad Rozov <v.ro...@datatorrent.com>
Subject Re: Serialization in Apex
Date Tue, 17 May 2016 20:17:39 GMT
+1. Java serialization is much slower than Kryo serialization and using 
Java serialization must be an explicit application designer choice.

Thank you,
Vlad

On 5/17/16 11:50, Thomas Weise wrote:
> IMO automatically picking a serialializer conflicts with predictable system
> behavior. If the serialization does not work I would want to know that
> instead of the system doing some trick and arrive at suboptimal or faulty
> behavior.
>
> That does not mean we cannot have optimizations though, as long as there is
> explicit user control.
>
> Thomas
>
>
> On Tue, May 17, 2016 at 11:34 AM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
>> As Ram ans Sandesh pointed out, we do have @Bind and @DefaultSerializer
>> annotations. However, these are tightly coupled with the field in question
>> and do require modifying external code. Additionally it may also break
>> other systems, if we are binding it to a JavaSerializer and perhaps there
>> are systems which have other means of serializing the field.
>>
>> My point was more to do with user having to worry about what serializer to
>> use and how to serialize objects.
>> For example, I liked the approach that Storm takes by falling back to Java
>> serialization automatically in case the target class does not have a
>> default constructor.
>>
>> Of course, we can explore type based serialization. But this email was more
>> about the usability aspect; to handle classes not having default
>> constructors in general, not just POJO tuples.
>>
>> ~Bhupesh
>>
>>
>>
>> On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni <pramod@datatorrent.com>
>> wrote:
>>
>>> Can we do a test where we hard code a codec for a POJO and compare
>>> performance against kryo. Thereafter we can dynamically compose a
>>> codec via pojoutils and inject it.
>>>
>>> Thanks
>>>
>>>> On May 17, 2016, at 8:16 AM, Vlad Rozov <v.rozov@datatorrent.com>
>> wrote:
>>>> +1 for type based serialization. Tuples in most cases are flat
>>> records/pojo and it should be possible programmatically construct a codec
>>> that will significantly outperform Kryo. It should also reduce amount of
>>> data passed over the wire. I started to look in that direction as well as
>>> Kryo serialization is one of bottlenecks that limits Apex throughput when
>>> operators are deployed into different containers including NODE_LOCAL
>> case.
>>>> Thank you,
>>>> Vlad
>>>>
>>>>> On 5/17/16 07:13, Sandesh Hegde wrote:
>>>>> If it is possible to serialize, platform should do it automatically,
>> it
>>>>> reduces the tribal knowledge requirement to use the platform. Couples
>> of
>>>>> month back, I also sent out the similar email.
>>>>>
>>>>> Type based serialization may improve the performance.
>>>>>
>>>>>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <ram@datatorrent.com
>>> wrote:
>>>>>> Traditionally, we've recommended using
>>>>>> "@DefaultSerializer(JavaSerializer.class)" or
>>>>>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at
>>>>>>
>>>>>>
>> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception
>>>>>> Can you describe why those approaches are not adequate ?
>>>>>>
>>>>>> Ram
>>>>>>
>>>>>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> While working on the integration of Apex with Apache Samoa, I
am
>>> coming
>>>>>>> across some scenarios where I have to add default constructors
in
>> some
>>>>>>> external classes to make them Kryo serializable. Although this
>> should
>>> be
>>>>>>> okay, we would like to avoid modifying external classes as far
as
>>>>>> possible.
>>>>>>> Some other streaming engines have taken different approaches
towards
>>>>>>> serialization.
>>>>>>>
>>>>>>> I looked at Flink and Storm serialization mechanisms.
>>>>>>>
>>>>>>> Storm has a fall back mechanism on Java serialization. It does
use
>>> Kryo
>>>>>> for
>>>>>>> serialization due to performance. But, if the class is not
>>> serializable
>>>>>>> using Kryo, then it will try to serialize it using Java
>>> serialization. If
>>>>>>> even then it cannot serialize, then it throws an error. [1]
>>>>>>>
>>>>>>> Flink has its own serialization stack where it uses a serializer
>>> based on
>>>>>>> the type information known about the data. [2]
>>>>>>>
>>>>>>> What does the community think about the current state of
>>> serialization in
>>>>>>> Apex. Is there a need to explore some approaches which could
avoid
>>>>>>> serialization issues such as the one described above? Are there
any
>>> other
>>>>>>> approaches one could use?
>>>>>>>
>>>>>>> 1.
>> http://storm.apache.org/releases/current/Serialization.html#java-serialization
>>>>>>> 2.
>> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
>>>>>>> ~Bhupesh


Mime
View raw message