flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Changing how TypeComparators Work
Date Wed, 27 Aug 2014 20:10:23 GMT
Ok, then we have to find some other solution for interop between
comparators. What's the performance increase from that pull request?

On Wed, Aug 27, 2014 at 9:24 PM, Stephan Ewen <sewen@apache.org> wrote:
> A lot of comparisons (and that is what makes it fast) can happen on the
> binary data, without turning them into a comparable.
>
> The latest pull request in the String representation for example improved
> String-keyed sorting quite a bit:
>
> https://github.com/apache/incubator-flink/pull/4
>
>
>
>
>
> On Wed, Aug 27, 2014 at 9:00 PM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
>> It would work on binary data, for example for tuples it would become:
>>
>> public Comparable extractKeys(DataInputView in) {
>>     Object[] fields = ...
>>     // extract only relevant fields from binary input and save in fields
>>     return Tuple(fields) // something like that
>> }
>>
>> And for normalized keys something similar can be done.
>>
>> Aljoscha
>>
>> On Wed, Aug 27, 2014 at 8:39 PM, Stephan Ewen <sewen@apache.org> wrote:
>> > The design of the comparators so far was to make them work on the binary
>> > data. That we need to retain, in my opinion, otherwise there is no
>> > way to get good performance out of working on serialized data.
>> >
>> > I personally think that creating a tuple2 (key/value pair) when using
>> > selector functions is actually good:
>> > The key type (being treated by its dedicated comparator) benefits from
>> all
>> > the optimizations implemented for that type (bin copying, normalized
>> keys,
>> > ...)
>> > That would be very hard to incorporate into any comparator that just
>> > deserializes some comparable.
>> >
>> > Also, the key extractor can contain sort of heavy magic (such as to block
>> > keys), whatever a user put in there. If we put that into the comparator,
>> it
>> > gets called for
>> > every comparison!
>> >
>> > I do agree, though, that we need to come up with a better interface that
>> > seamlessly allows working on binary versions and on objects, without
>> > duplicating too much code.
>> >
>> > From your suggestion, I am not sure I got everything. Could you post a
>> > concrete example or code?
>> >
>> > Stephan
>> >
>> >
>> >
>> >
>> > On Wed, Aug 27, 2014 at 5:02 PM, Aljoscha Krettek <aljoscha@apache.org>
>> > wrote:
>> >
>> >> Hi Guys,
>> >> while porting the Java API to Scala I'm noticing how complicated
>> >> things are because of how our TypeComparators work: 1) There is only
>> >> one type of comparator per TypeInformation which is created by the
>> >> TypeInformation. Therefore, our KeySelectors are not actually
>> >> implemented as comparators but as generated mappers that emit a
>> >> Tuple2, because you wouldn't for example be able to generate a
>> >> SelectorFunctionComparator for a TupleTypeInfo.  (There's also a lot
>> >> of magic going on with wrapping and unwrapping those tuples in Reduce,
>> >> Join, and CoGroup.) 2) Comparators cannot really interoperate, there
>> >> is special case code for the combinations that work. This will only
>> >> get worse when we properly introduce POJO types, which should work
>> >> well with tuple comparators and the other comparators.
>> >>
>> >> My proposal is this: No more TypeComparator on a per type basis. Just
>> >> a generic comparator and PairComparator that work on Comparable. What
>> >> used to be TypeComparators become SelectionExtractors that return a
>> >> Comparable. Make Tuple comparable or add new ComparableTuple.  The
>> >> TupleSelectionExtractor would return a comparable tuple of the
>> >> appropriate length (same for POJOs). For Tuple extractors that operate
>> >> on only one field they would immediately return that field, without
>> >> wrapping it in a tuple. This would directly support our existing
>> >> KeySelector functions since the already return Comparable, when
>> >> returning a tuple in a key selector function this would be compatible
>> >> with a TupleSelectionExtractor (on the other join side, for example).
>> >>
>> >> That's my idea. What do you think? I think the current state is not
>> >> maintainable, so we should do something quickly. :D
>> >>
>> >> Cheers,
>> >> Aljoscha
>> >>
>>

Mime
View raw message