apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Serialization in Apex
Date Tue, 17 May 2016 18:34:23 GMT
As Ram ans Sandesh pointed out, we do have @Bind and @DefaultSerializer
annotations. However, these are tightly coupled with the field in question
and do require modifying external code. Additionally it may also break
other systems, if we are binding it to a JavaSerializer and perhaps there
are systems which have other means of serializing the field.

My point was more to do with user having to worry about what serializer to
use and how to serialize objects.
For example, I liked the approach that Storm takes by falling back to Java
serialization automatically in case the target class does not have a
default constructor.

Of course, we can explore type based serialization. But this email was more
about the usability aspect; to handle classes not having default
constructors in general, not just POJO tuples.

~Bhupesh



On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> Can we do a test where we hard code a codec for a POJO and compare
> performance against kryo. Thereafter we can dynamically compose a
> codec via pojoutils and inject it.
>
> Thanks
>
> > On May 17, 2016, at 8:16 AM, Vlad Rozov <v.rozov@datatorrent.com> wrote:
> >
> > +1 for type based serialization. Tuples in most cases are flat
> records/pojo and it should be possible programmatically construct a codec
> that will significantly outperform Kryo. It should also reduce amount of
> data passed over the wire. I started to look in that direction as well as
> Kryo serialization is one of bottlenecks that limits Apex throughput when
> operators are deployed into different containers including NODE_LOCAL case.
> >
> > Thank you,
> > Vlad
> >
> >> On 5/17/16 07:13, Sandesh Hegde wrote:
> >> If it is possible to serialize, platform should do it automatically, it
> >> reduces the tribal knowledge requirement to use the platform. Couples of
> >> month back, I also sent out the similar email.
> >>
> >> Type based serialization may improve the performance.
> >>
> >>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <ram@datatorrent.com>
> wrote:
> >>>
> >>> Traditionally, we've recommended using
> >>> "@DefaultSerializer(JavaSerializer.class)" or
> >>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at
> >>>
> >>>
> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception
> >>>
> >>> Can you describe why those approaches are not adequate ?
> >>>
> >>> Ram
> >>>
> >>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com>
> >>> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> While working on the integration of Apex with Apache Samoa, I am
> coming
> >>>> across some scenarios where I have to add default constructors in some
> >>>> external classes to make them Kryo serializable. Although this should
> be
> >>>> okay, we would like to avoid modifying external classes as far as
> >>> possible.
> >>>> Some other streaming engines have taken different approaches towards
> >>>> serialization.
> >>>>
> >>>> I looked at Flink and Storm serialization mechanisms.
> >>>>
> >>>> Storm has a fall back mechanism on Java serialization. It does use
> Kryo
> >>> for
> >>>> serialization due to performance. But, if the class is not
> serializable
> >>>> using Kryo, then it will try to serialize it using Java
> serialization. If
> >>>> even then it cannot serialize, then it throws an error. [1]
> >>>>
> >>>> Flink has its own serialization stack where it uses a serializer
> based on
> >>>> the type information known about the data. [2]
> >>>>
> >>>> What does the community think about the current state of
> serialization in
> >>>> Apex. Is there a need to explore some approaches which could avoid
> >>>> serialization issues such as the one described above? Are there any
> other
> >>>> approaches one could use?
> >>>>
> >>>> 1.
> >>>
> http://storm.apache.org/releases/current/Serialization.html#java-serialization
> >>>> 2.
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
> >>>>
> >>>> ~Bhupesh
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message