apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Yan <da...@datatorrent.com>
Subject Re: Serialization in Apex
Date Wed, 18 May 2016 17:36:32 GMT
I think having a fallback to Java serialization is a good thing.
I can imagine a user having trouble with Kryo serialization of their
operator and unable to figure out then give up totally without us even
knowing.

David

On Tue, May 17, 2016 at 11:50 AM, Thomas Weise <thomas@datatorrent.com>
wrote:

> IMO automatically picking a serialializer conflicts with predictable system
> behavior. If the serialization does not work I would want to know that
> instead of the system doing some trick and arrive at suboptimal or faulty
> behavior.
>
> That does not mean we cannot have optimizations though, as long as there is
> explicit user control.
>
> Thomas
>
>
> On Tue, May 17, 2016 at 11:34 AM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
> > As Ram ans Sandesh pointed out, we do have @Bind and @DefaultSerializer
> > annotations. However, these are tightly coupled with the field in
> question
> > and do require modifying external code. Additionally it may also break
> > other systems, if we are binding it to a JavaSerializer and perhaps there
> > are systems which have other means of serializing the field.
> >
> > My point was more to do with user having to worry about what serializer
> to
> > use and how to serialize objects.
> > For example, I liked the approach that Storm takes by falling back to
> Java
> > serialization automatically in case the target class does not have a
> > default constructor.
> >
> > Of course, we can explore type based serialization. But this email was
> more
> > about the usability aspect; to handle classes not having default
> > constructors in general, not just POJO tuples.
> >
> > ~Bhupesh
> >
> >
> >
> > On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni <pramod@datatorrent.com
> >
> > wrote:
> >
> > > Can we do a test where we hard code a codec for a POJO and compare
> > > performance against kryo. Thereafter we can dynamically compose a
> > > codec via pojoutils and inject it.
> > >
> > > Thanks
> > >
> > > > On May 17, 2016, at 8:16 AM, Vlad Rozov <v.rozov@datatorrent.com>
> > wrote:
> > > >
> > > > +1 for type based serialization. Tuples in most cases are flat
> > > records/pojo and it should be possible programmatically construct a
> codec
> > > that will significantly outperform Kryo. It should also reduce amount
> of
> > > data passed over the wire. I started to look in that direction as well
> as
> > > Kryo serialization is one of bottlenecks that limits Apex throughput
> when
> > > operators are deployed into different containers including NODE_LOCAL
> > case.
> > > >
> > > > Thank you,
> > > > Vlad
> > > >
> > > >> On 5/17/16 07:13, Sandesh Hegde wrote:
> > > >> If it is possible to serialize, platform should do it automatically,
> > it
> > > >> reduces the tribal knowledge requirement to use the platform.
> Couples
> > of
> > > >> month back, I also sent out the similar email.
> > > >>
> > > >> Type based serialization may improve the performance.
> > > >>
> > > >>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <
> ram@datatorrent.com
> > >
> > > wrote:
> > > >>>
> > > >>> Traditionally, we've recommended using
> > > >>> "@DefaultSerializer(JavaSerializer.class)" or
> > > >>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at
> > > >>>
> > > >>>
> > >
> >
> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception
> > > >>>
> > > >>> Can you describe why those approaches are not adequate ?
> > > >>>
> > > >>> Ram
> > > >>>
> > > >>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <
> > > bhupesh@datatorrent.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi All,
> > > >>>>
> > > >>>> While working on the integration of Apex with Apache Samoa,
I am
> > > coming
> > > >>>> across some scenarios where I have to add default constructors
in
> > some
> > > >>>> external classes to make them Kryo serializable. Although
this
> > should
> > > be
> > > >>>> okay, we would like to avoid modifying external classes as
far as
> > > >>> possible.
> > > >>>> Some other streaming engines have taken different approaches
> towards
> > > >>>> serialization.
> > > >>>>
> > > >>>> I looked at Flink and Storm serialization mechanisms.
> > > >>>>
> > > >>>> Storm has a fall back mechanism on Java serialization. It
does use
> > > Kryo
> > > >>> for
> > > >>>> serialization due to performance. But, if the class is not
> > > serializable
> > > >>>> using Kryo, then it will try to serialize it using Java
> > > serialization. If
> > > >>>> even then it cannot serialize, then it throws an error. [1]
> > > >>>>
> > > >>>> Flink has its own serialization stack where it uses a serializer
> > > based on
> > > >>>> the type information known about the data. [2]
> > > >>>>
> > > >>>> What does the community think about the current state of
> > > serialization in
> > > >>>> Apex. Is there a need to explore some approaches which could
avoid
> > > >>>> serialization issues such as the one described above? Are
there
> any
> > > other
> > > >>>> approaches one could use?
> > > >>>>
> > > >>>> 1.
> > > >>>
> > >
> >
> http://storm.apache.org/releases/current/Serialization.html#java-serialization
> > > >>>> 2.
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
> > > >>>>
> > > >>>> ~Bhupesh
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message