apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Farkas <timothytiborfar...@gmail.com>
Subject Re: Serialization in Apex
Date Wed, 18 May 2016 17:52:32 GMT
It will help a new user get something up and running quickly, but it may
leave people scratching their heads as to why performance is so bad. If we
move in the direction of an automatic fallback I think we should also
devise a way to explicitly warn the users with something more than just an
obscure warning message. Perhaps there can be a rest endpoint in the app
master that a UI can tap into which keeps a log of all the tuning decisions
made by the platform, and a corresponding dtcli command? That way we can
tell newcomers to check that log if they are having any performance issues.

Thanks,
Tim

On Wed, May 18, 2016 at 10:36 AM, David Yan <david@datatorrent.com> wrote:

> I think having a fallback to Java serialization is a good thing.
> I can imagine a user having trouble with Kryo serialization of their
> operator and unable to figure out then give up totally without us even
> knowing.
>
> David
>
> On Tue, May 17, 2016 at 11:50 AM, Thomas Weise <thomas@datatorrent.com>
> wrote:
>
> > IMO automatically picking a serialializer conflicts with predictable
> system
> > behavior. If the serialization does not work I would want to know that
> > instead of the system doing some trick and arrive at suboptimal or faulty
> > behavior.
> >
> > That does not mean we cannot have optimizations though, as long as there
> is
> > explicit user control.
> >
> > Thomas
> >
> >
> > On Tue, May 17, 2016 at 11:34 AM, Bhupesh Chawda <
> bhupesh@datatorrent.com>
> > wrote:
> >
> > > As Ram ans Sandesh pointed out, we do have @Bind and @DefaultSerializer
> > > annotations. However, these are tightly coupled with the field in
> > question
> > > and do require modifying external code. Additionally it may also break
> > > other systems, if we are binding it to a JavaSerializer and perhaps
> there
> > > are systems which have other means of serializing the field.
> > >
> > > My point was more to do with user having to worry about what serializer
> > to
> > > use and how to serialize objects.
> > > For example, I liked the approach that Storm takes by falling back to
> > Java
> > > serialization automatically in case the target class does not have a
> > > default constructor.
> > >
> > > Of course, we can explore type based serialization. But this email was
> > more
> > > about the usability aspect; to handle classes not having default
> > > constructors in general, not just POJO tuples.
> > >
> > > ~Bhupesh
> > >
> > >
> > >
> > > On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni <
> pramod@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Can we do a test where we hard code a codec for a POJO and compare
> > > > performance against kryo. Thereafter we can dynamically compose a
> > > > codec via pojoutils and inject it.
> > > >
> > > > Thanks
> > > >
> > > > > On May 17, 2016, at 8:16 AM, Vlad Rozov <v.rozov@datatorrent.com>
> > > wrote:
> > > > >
> > > > > +1 for type based serialization. Tuples in most cases are flat
> > > > records/pojo and it should be possible programmatically construct a
> > codec
> > > > that will significantly outperform Kryo. It should also reduce amount
> > of
> > > > data passed over the wire. I started to look in that direction as
> well
> > as
> > > > Kryo serialization is one of bottlenecks that limits Apex throughput
> > when
> > > > operators are deployed into different containers including NODE_LOCAL
> > > case.
> > > > >
> > > > > Thank you,
> > > > > Vlad
> > > > >
> > > > >> On 5/17/16 07:13, Sandesh Hegde wrote:
> > > > >> If it is possible to serialize, platform should do it
> automatically,
> > > it
> > > > >> reduces the tribal knowledge requirement to use the platform.
> > Couples
> > > of
> > > > >> month back, I also sent out the similar email.
> > > > >>
> > > > >> Type based serialization may improve the performance.
> > > > >>
> > > > >>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <
> > ram@datatorrent.com
> > > >
> > > > wrote:
> > > > >>>
> > > > >>> Traditionally, we've recommended using
> > > > >>> "@DefaultSerializer(JavaSerializer.class)" or
> > > > >>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined
at
> > > > >>>
> > > > >>>
> > > >
> > >
> >
> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception
> > > > >>>
> > > > >>> Can you describe why those approaches are not adequate ?
> > > > >>>
> > > > >>> Ram
> > > > >>>
> > > > >>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <
> > > > bhupesh@datatorrent.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Hi All,
> > > > >>>>
> > > > >>>> While working on the integration of Apex with Apache
Samoa, I am
> > > > coming
> > > > >>>> across some scenarios where I have to add default constructors
> in
> > > some
> > > > >>>> external classes to make them Kryo serializable. Although
this
> > > should
> > > > be
> > > > >>>> okay, we would like to avoid modifying external classes
as far
> as
> > > > >>> possible.
> > > > >>>> Some other streaming engines have taken different approaches
> > towards
> > > > >>>> serialization.
> > > > >>>>
> > > > >>>> I looked at Flink and Storm serialization mechanisms.
> > > > >>>>
> > > > >>>> Storm has a fall back mechanism on Java serialization.
It does
> use
> > > > Kryo
> > > > >>> for
> > > > >>>> serialization due to performance. But, if the class is
not
> > > > serializable
> > > > >>>> using Kryo, then it will try to serialize it using Java
> > > > serialization. If
> > > > >>>> even then it cannot serialize, then it throws an error.
[1]
> > > > >>>>
> > > > >>>> Flink has its own serialization stack where it uses a
serializer
> > > > based on
> > > > >>>> the type information known about the data. [2]
> > > > >>>>
> > > > >>>> What does the community think about the current state
of
> > > > serialization in
> > > > >>>> Apex. Is there a need to explore some approaches which
could
> avoid
> > > > >>>> serialization issues such as the one described above?
Are there
> > any
> > > > other
> > > > >>>> approaches one could use?
> > > > >>>>
> > > > >>>> 1.
> > > > >>>
> > > >
> > >
> >
> http://storm.apache.org/releases/current/Serialization.html#java-serialization
> > > > >>>> 2.
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
> > > > >>>>
> > > > >>>> ~Bhupesh
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message