flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Changing how TypeComparators Work
Date Wed, 27 Aug 2014 19:24:32 GMT
A lot of comparisons (and that is what makes it fast) can happen on the
binary data, without turning them into a comparable.

The latest pull request in the String representation for example improved
String-keyed sorting quite a bit:

https://github.com/apache/incubator-flink/pull/4





On Wed, Aug 27, 2014 at 9:00 PM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> It would work on binary data, for example for tuples it would become:
>
> public Comparable extractKeys(DataInputView in) {
>     Object[] fields = ...
>     // extract only relevant fields from binary input and save in fields
>     return Tuple(fields) // something like that
> }
>
> And for normalized keys something similar can be done.
>
> Aljoscha
>
> On Wed, Aug 27, 2014 at 8:39 PM, Stephan Ewen <sewen@apache.org> wrote:
> > The design of the comparators so far was to make them work on the binary
> > data. That we need to retain, in my opinion, otherwise there is no
> > way to get good performance out of working on serialized data.
> >
> > I personally think that creating a tuple2 (key/value pair) when using
> > selector functions is actually good:
> > The key type (being treated by its dedicated comparator) benefits from
> all
> > the optimizations implemented for that type (bin copying, normalized
> keys,
> > ...)
> > That would be very hard to incorporate into any comparator that just
> > deserializes some comparable.
> >
> > Also, the key extractor can contain sort of heavy magic (such as to block
> > keys), whatever a user put in there. If we put that into the comparator,
> it
> > gets called for
> > every comparison!
> >
> > I do agree, though, that we need to come up with a better interface that
> > seamlessly allows working on binary versions and on objects, without
> > duplicating too much code.
> >
> > From your suggestion, I am not sure I got everything. Could you post a
> > concrete example or code?
> >
> > Stephan
> >
> >
> >
> >
> > On Wed, Aug 27, 2014 at 5:02 PM, Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> >
> >> Hi Guys,
> >> while porting the Java API to Scala I'm noticing how complicated
> >> things are because of how our TypeComparators work: 1) There is only
> >> one type of comparator per TypeInformation which is created by the
> >> TypeInformation. Therefore, our KeySelectors are not actually
> >> implemented as comparators but as generated mappers that emit a
> >> Tuple2, because you wouldn't for example be able to generate a
> >> SelectorFunctionComparator for a TupleTypeInfo.  (There's also a lot
> >> of magic going on with wrapping and unwrapping those tuples in Reduce,
> >> Join, and CoGroup.) 2) Comparators cannot really interoperate, there
> >> is special case code for the combinations that work. This will only
> >> get worse when we properly introduce POJO types, which should work
> >> well with tuple comparators and the other comparators.
> >>
> >> My proposal is this: No more TypeComparator on a per type basis. Just
> >> a generic comparator and PairComparator that work on Comparable. What
> >> used to be TypeComparators become SelectionExtractors that return a
> >> Comparable. Make Tuple comparable or add new ComparableTuple.  The
> >> TupleSelectionExtractor would return a comparable tuple of the
> >> appropriate length (same for POJOs). For Tuple extractors that operate
> >> on only one field they would immediately return that field, without
> >> wrapping it in a tuple. This would directly support our existing
> >> KeySelector functions since the already return Comparable, when
> >> returning a tuple in a key selector function this would be compatible
> >> with a TupleSelectionExtractor (on the other join side, for example).
> >>
> >> That's my idea. What do you think? I think the current state is not
> >> maintainable, so we should do something quickly. :D
> >>
> >> Cheers,
> >> Aljoscha
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message