apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandesh Hegde <sand...@datatorrent.com>
Subject Re: Serialization in Apex
Date Wed, 18 May 2016 17:58:59 GMT
Users should explicitly set this
attribute,
 as it affects the performance greatly. If Kryo fails to deserialize then
instead of showing
stack
 trace, we can show a message indicating them to set the Attribute.


On Wed, May 18, 2016 at 10:52 AM Timothy Farkas <
timothytiborfarkas@gmail.com> wrote:

> It will help a new user get something up and running quickly, but it may
> leave people scratching their heads as to why performance is so bad. If we
> move in the direction of an automatic fallback I think we should also
> devise a way to explicitly warn the users with something more than just an
> obscure warning message. Perhaps there can be a rest endpoint in the app
> master that a UI can tap into which keeps a log of all the tuning decisions
> made by the platform, and a corresponding dtcli command? That way we can
> tell newcomers to check that log if they are having any performance issues.
>
> Thanks,
> Tim
>
> On Wed, May 18, 2016 at 10:36 AM, David Yan <david@datatorrent.com> wrote:
>
> > I think having a fallback to Java serialization is a good thing.
> > I can imagine a user having trouble with Kryo serialization of their
> > operator and unable to figure out then give up totally without us even
> > knowing.
> >
> > David
> >
> > On Tue, May 17, 2016 at 11:50 AM, Thomas Weise <thomas@datatorrent.com>
> > wrote:
> >
> > > IMO automatically picking a serialializer conflicts with predictable
> > system
> > > behavior. If the serialization does not work I would want to know that
> > > instead of the system doing some trick and arrive at suboptimal or
> faulty
> > > behavior.
> > >
> > > That does not mean we cannot have optimizations though, as long as
> there
> > is
> > > explicit user control.
> > >
> > > Thomas
> > >
> > >
> > > On Tue, May 17, 2016 at 11:34 AM, Bhupesh Chawda <
> > bhupesh@datatorrent.com>
> > > wrote:
> > >
> > > > As Ram ans Sandesh pointed out, we do have @Bind and
> @DefaultSerializer
> > > > annotations. However, these are tightly coupled with the field in
> > > question
> > > > and do require modifying external code. Additionally it may also
> break
> > > > other systems, if we are binding it to a JavaSerializer and perhaps
> > there
> > > > are systems which have other means of serializing the field.
> > > >
> > > > My point was more to do with user having to worry about what
> serializer
> > > to
> > > > use and how to serialize objects.
> > > > For example, I liked the approach that Storm takes by falling back to
> > > Java
> > > > serialization automatically in case the target class does not have a
> > > > default constructor.
> > > >
> > > > Of course, we can explore type based serialization. But this email
> was
> > > more
> > > > about the usability aspect; to handle classes not having default
> > > > constructors in general, not just POJO tuples.
> > > >
> > > > ~Bhupesh
> > > >
> > > >
> > > >
> > > > On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni <
> > pramod@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > Can we do a test where we hard code a codec for a POJO and compare
> > > > > performance against kryo. Thereafter we can dynamically compose a
> > > > > codec via pojoutils and inject it.
> > > > >
> > > > > Thanks
> > > > >
> > > > > > On May 17, 2016, at 8:16 AM, Vlad Rozov <v.rozov@datatorrent.com
> >
> > > > wrote:
> > > > > >
> > > > > > +1 for type based serialization. Tuples in most cases are flat
> > > > > records/pojo and it should be possible programmatically construct
a
> > > codec
> > > > > that will significantly outperform Kryo. It should also reduce
> amount
> > > of
> > > > > data passed over the wire. I started to look in that direction as
> > well
> > > as
> > > > > Kryo serialization is one of bottlenecks that limits Apex
> throughput
> > > when
> > > > > operators are deployed into different containers including
> NODE_LOCAL
> > > > case.
> > > > > >
> > > > > > Thank you,
> > > > > > Vlad
> > > > > >
> > > > > >> On 5/17/16 07:13, Sandesh Hegde wrote:
> > > > > >> If it is possible to serialize, platform should do it
> > automatically,
> > > > it
> > > > > >> reduces the tribal knowledge requirement to use the platform.
> > > Couples
> > > > of
> > > > > >> month back, I also sent out the similar email.
> > > > > >>
> > > > > >> Type based serialization may improve the performance.
> > > > > >>
> > > > > >>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <
> > > ram@datatorrent.com
> > > > >
> > > > > wrote:
> > > > > >>>
> > > > > >>> Traditionally, we've recommended using
> > > > > >>> "@DefaultSerializer(JavaSerializer.class)" or
> > > > > >>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined
at
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception
> > > > > >>>
> > > > > >>> Can you describe why those approaches are not adequate
?
> > > > > >>>
> > > > > >>> Ram
> > > > > >>>
> > > > > >>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <
> > > > > bhupesh@datatorrent.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Hi All,
> > > > > >>>>
> > > > > >>>> While working on the integration of Apex with Apache
Samoa, I
> am
> > > > > coming
> > > > > >>>> across some scenarios where I have to add default
constructors
> > in
> > > > some
> > > > > >>>> external classes to make them Kryo serializable.
Although this
> > > > should
> > > > > be
> > > > > >>>> okay, we would like to avoid modifying external
classes as far
> > as
> > > > > >>> possible.
> > > > > >>>> Some other streaming engines have taken different
approaches
> > > towards
> > > > > >>>> serialization.
> > > > > >>>>
> > > > > >>>> I looked at Flink and Storm serialization mechanisms.
> > > > > >>>>
> > > > > >>>> Storm has a fall back mechanism on Java serialization.
It does
> > use
> > > > > Kryo
> > > > > >>> for
> > > > > >>>> serialization due to performance. But, if the class
is not
> > > > > serializable
> > > > > >>>> using Kryo, then it will try to serialize it using
Java
> > > > > serialization. If
> > > > > >>>> even then it cannot serialize, then it throws an
error. [1]
> > > > > >>>>
> > > > > >>>> Flink has its own serialization stack where it uses
a
> serializer
> > > > > based on
> > > > > >>>> the type information known about the data. [2]
> > > > > >>>>
> > > > > >>>> What does the community think about the current
state of
> > > > > serialization in
> > > > > >>>> Apex. Is there a need to explore some approaches
which could
> > avoid
> > > > > >>>> serialization issues such as the one described above?
Are
> there
> > > any
> > > > > other
> > > > > >>>> approaches one could use?
> > > > > >>>>
> > > > > >>>> 1.
> > > > > >>>
> > > > >
> > > >
> > >
> >
> http://storm.apache.org/releases/current/Serialization.html#java-serialization
> > > > > >>>> 2.
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
> > > > > >>>>
> > > > > >>>> ~Bhupesh
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message