flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Changing how TypeComparators Work
Date Wed, 27 Aug 2014 19:00:52 GMT
It would work on binary data, for example for tuples it would become:

public Comparable extractKeys(DataInputView in) {
    Object[] fields = ...
    // extract only relevant fields from binary input and save in fields
    return Tuple(fields) // something like that

And for normalized keys something similar can be done.


On Wed, Aug 27, 2014 at 8:39 PM, Stephan Ewen <sewen@apache.org> wrote:
> The design of the comparators so far was to make them work on the binary
> data. That we need to retain, in my opinion, otherwise there is no
> way to get good performance out of working on serialized data.
> I personally think that creating a tuple2 (key/value pair) when using
> selector functions is actually good:
> The key type (being treated by its dedicated comparator) benefits from all
> the optimizations implemented for that type (bin copying, normalized keys,
> ...)
> That would be very hard to incorporate into any comparator that just
> deserializes some comparable.
> Also, the key extractor can contain sort of heavy magic (such as to block
> keys), whatever a user put in there. If we put that into the comparator, it
> gets called for
> every comparison!
> I do agree, though, that we need to come up with a better interface that
> seamlessly allows working on binary versions and on objects, without
> duplicating too much code.
> From your suggestion, I am not sure I got everything. Could you post a
> concrete example or code?
> Stephan
> On Wed, Aug 27, 2014 at 5:02 PM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>> Hi Guys,
>> while porting the Java API to Scala I'm noticing how complicated
>> things are because of how our TypeComparators work: 1) There is only
>> one type of comparator per TypeInformation which is created by the
>> TypeInformation. Therefore, our KeySelectors are not actually
>> implemented as comparators but as generated mappers that emit a
>> Tuple2, because you wouldn't for example be able to generate a
>> SelectorFunctionComparator for a TupleTypeInfo.  (There's also a lot
>> of magic going on with wrapping and unwrapping those tuples in Reduce,
>> Join, and CoGroup.) 2) Comparators cannot really interoperate, there
>> is special case code for the combinations that work. This will only
>> get worse when we properly introduce POJO types, which should work
>> well with tuple comparators and the other comparators.
>> My proposal is this: No more TypeComparator on a per type basis. Just
>> a generic comparator and PairComparator that work on Comparable. What
>> used to be TypeComparators become SelectionExtractors that return a
>> Comparable. Make Tuple comparable or add new ComparableTuple.  The
>> TupleSelectionExtractor would return a comparable tuple of the
>> appropriate length (same for POJOs). For Tuple extractors that operate
>> on only one field they would immediately return that field, without
>> wrapping it in a tuple. This would directly support our existing
>> KeySelector functions since the already return Comparable, when
>> returning a tuple in a key selector function this would be compatible
>> with a TupleSelectionExtractor (on the other join side, for example).
>> That's my idea. What do you think? I think the current state is not
>> maintainable, so we should do something quickly. :D
>> Cheers,
>> Aljoscha

View raw message