flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Handling custom types with Kryo
Date Tue, 20 Jan 2015 05:38:59 GMT
Yes, I agree that the Avro serializer should be available by default. That
is one case of a typical type that should work out of the box, given that
we support Avro file formats.

Let me summarize how I understood that suggestion:

 - We make Avro available by default by registering a default serializer
for the SpecificBase

 - We create a library of serializers. We do not register them by default.

 - Via FLINK-1417, we analyze the types. For any (nested) type that we
encounter for which we have a serializer in the library, we register that
serializer as the default serializer. Also, for every (nested) type we
encounter, we register a tag at Kryo.

I like that, it should give a nice and smooth user experience.

Greetings,
Stephan




On Mon, Jan 19, 2015 at 12:32 PM, Robert Metzger <rmetzger@apache.org>
wrote:

> Hi,
>
> thank you for putting our discussion to the mailing list. This is indeed
> where such discussions belong. For the others, we started discussing here:
> https://github.com/apache/flink/pull/304
>
> I think there is one additional approach, which is probably close to (1):
> We only register those serializers by default which we don't see in the
> pre-flight phase (I think right now thats only GenericData.Array from
> Avro).
> We would come across all the other classes (Jodatime, Protobuf, Avro,
> Thrift, ...) when traversing the class hierarchy, as proposed in
> FLINK-1417. With this approach, users get the best out-of-the box
> experience and the number of registered classes / serializers is kept at a
> minimum.
> We can still offer means to register additional serializers (I think thats
> already merged to master).
>
> My main concern with this particular issue is a good out of the box user
> experience. If there is an issue with type serialization, users will notice
> it very early. (In my experience people often have their existing datatypes
> they use with other systems, and they want to continue using them)
> Therefore, I want to put some effort into making it as good as possible. I
> would actually sacrifice performance over stability/usability here. Our
> system is flexible enough to replace it later with a more efficient
> serialization if that becomes an issue. But maybe my suggestion above is
> already sufficient.
>
> We could also think about introducing a configuration variable which allows
> users to disable the default serializers.
>
>
> Regarding the second question: Is there a downside registering all types
> for tagging? We reduce the overall I/O which is good for performance.
>
> Best,
> Robert
>
>
>
> On Mon, Jan 19, 2015 at 8:24 PM, Stephan Ewen <sewen@apache.org> wrote:
>
> > Hi all!
> >
> > We have various pending pull requests that add support for certain types
> by
> > adding extra kryo serializers.
> >
> > I think we need to decide how we want to handle the support for extra
> > types, because more are certainly to come.
> >
> > As I understand it, we have three broad options:
> >
> > (1)
> > Add as many serializers to Kryo by default as possible.
> >  Pro:
> >     - Many types work out of the box
> >  Contra:
> >     - We may eventually overload the kryo registry with serializers
> >       that are not needed for most cases and suffer in performance
> >     - It is hard to guess which types work out of the box (intransparent)
> >
> >
> > (2)
> > We create a collection of serializers and a registration util.
> > --------
> > val env = ExecutionEnvironemnt.getExecutionEnviroment()
> >
> > Serializers.registerProtoBufSerializers(env);
> > Serializers.registerJavaUtilSerializers(env);
> > ---------
> > Pro:
> >   - Easy for users
> >   - We can grow the set of supported types very large without overloading
> > Kryo
> >   - It is transparent what gets registered
> >
> > Contra:
> >   - Not quite as convenient as if things just run
> >
> >
> > (3)
> > We do nothing and let the user create and register whatever is needed.
> >
> > We could have a library and utility for serializers for certain
> libraries.
> > Users could use this to conveniently add serializers for the libraries
> they
> > use.
> > Pro:
> >   - Simple for us ;-)
> > Contra:
> >   - More repeated work for users
> >
> >
> > ========================
> >
> > For approach (1) and (2), there is an orthogonal question of whether we
> > want to simply register default serializers (that enable that types work)
> > or also register types for tags, to speed up the serialization of those
> > types.
> >
> >
> > Greetings,
> > Stephan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message