Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D2B611A0B for ; Wed, 27 Aug 2014 19:24:58 +0000 (UTC) Received: (qmail 7741 invoked by uid 500); 27 Aug 2014 19:24:58 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 7687 invoked by uid 500); 27 Aug 2014 19:24:57 -0000 Mailing-List: contact dev-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list dev@flink.incubator.apache.org Received: (qmail 7675 invoked by uid 99); 27 Aug 2014 19:24:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 19:24:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ewenstephan@gmail.com designates 209.85.223.178 as permitted sender) Received: from [209.85.223.178] (HELO mail-ie0-f178.google.com) (209.85.223.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 19:24:53 +0000 Received: by mail-ie0-f178.google.com with SMTP id rd18so875577iec.37 for ; Wed, 27 Aug 2014 12:24:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=tKJnps5aeTqF3uhBywqZw+OMIFGCiDGypuK6beLrE4g=; b=bSqBNTLu4LdbN43wyId/JHTXL4gehH33nUHCCTQVoa9oxlXzqfD1bSK4z39kItz1Ds +5jYV1BpNOZk/+xS4+Yzd+WkEyi7lh4FV7nZJKMcbQT5UrRzAR7yPjkwXg3N2ov8tVRL PTEW7JpCdPbJ4i+29dw0OiohzxFktD4brTBu9yFMKR3ZfVYiH4Po4KvoUIuE5GvsRi1/ dw96jGmBCNknGasDZ61x2Zry8P6OumXlJs84hRhCjXV0TL9Dj4CCHm8KZpKp8OXR758R s6x9gkNFpwVJrRhQrN4oS5RYuOP+Xw9gNYFEX+6CTNd+4rKZtsS5X+HboHdBQ1fgwqUc vorQ== MIME-Version: 1.0 X-Received: by 10.50.43.193 with SMTP id y1mr36969igl.32.1409167472479; Wed, 27 Aug 2014 12:24:32 -0700 (PDT) Sender: ewenstephan@gmail.com Received: by 10.64.59.41 with HTTP; Wed, 27 Aug 2014 12:24:32 -0700 (PDT) In-Reply-To: References: Date: Wed, 27 Aug 2014 21:24:32 +0200 X-Google-Sender-Auth: WNQpIy0PsJob2PnZahkU21HGLpM Message-ID: Subject: Re: Changing how TypeComparators Work From: Stephan Ewen To: dev@flink.incubator.apache.org Content-Type: multipart/alternative; boundary=089e0111d1bea053620501a15e0f X-Virus-Checked: Checked by ClamAV on apache.org --089e0111d1bea053620501a15e0f Content-Type: text/plain; charset=UTF-8 A lot of comparisons (and that is what makes it fast) can happen on the binary data, without turning them into a comparable. The latest pull request in the String representation for example improved String-keyed sorting quite a bit: https://github.com/apache/incubator-flink/pull/4 On Wed, Aug 27, 2014 at 9:00 PM, Aljoscha Krettek wrote: > It would work on binary data, for example for tuples it would become: > > public Comparable extractKeys(DataInputView in) { > Object[] fields = ... > // extract only relevant fields from binary input and save in fields > return Tuple(fields) // something like that > } > > And for normalized keys something similar can be done. > > Aljoscha > > On Wed, Aug 27, 2014 at 8:39 PM, Stephan Ewen wrote: > > The design of the comparators so far was to make them work on the binary > > data. That we need to retain, in my opinion, otherwise there is no > > way to get good performance out of working on serialized data. > > > > I personally think that creating a tuple2 (key/value pair) when using > > selector functions is actually good: > > The key type (being treated by its dedicated comparator) benefits from > all > > the optimizations implemented for that type (bin copying, normalized > keys, > > ...) > > That would be very hard to incorporate into any comparator that just > > deserializes some comparable. > > > > Also, the key extractor can contain sort of heavy magic (such as to block > > keys), whatever a user put in there. If we put that into the comparator, > it > > gets called for > > every comparison! > > > > I do agree, though, that we need to come up with a better interface that > > seamlessly allows working on binary versions and on objects, without > > duplicating too much code. > > > > From your suggestion, I am not sure I got everything. Could you post a > > concrete example or code? > > > > Stephan > > > > > > > > > > On Wed, Aug 27, 2014 at 5:02 PM, Aljoscha Krettek > > wrote: > > > >> Hi Guys, > >> while porting the Java API to Scala I'm noticing how complicated > >> things are because of how our TypeComparators work: 1) There is only > >> one type of comparator per TypeInformation which is created by the > >> TypeInformation. Therefore, our KeySelectors are not actually > >> implemented as comparators but as generated mappers that emit a > >> Tuple2, because you wouldn't for example be able to generate a > >> SelectorFunctionComparator for a TupleTypeInfo. (There's also a lot > >> of magic going on with wrapping and unwrapping those tuples in Reduce, > >> Join, and CoGroup.) 2) Comparators cannot really interoperate, there > >> is special case code for the combinations that work. This will only > >> get worse when we properly introduce POJO types, which should work > >> well with tuple comparators and the other comparators. > >> > >> My proposal is this: No more TypeComparator on a per type basis. Just > >> a generic comparator and PairComparator that work on Comparable. What > >> used to be TypeComparators become SelectionExtractors that return a > >> Comparable. Make Tuple comparable or add new ComparableTuple. The > >> TupleSelectionExtractor would return a comparable tuple of the > >> appropriate length (same for POJOs). For Tuple extractors that operate > >> on only one field they would immediately return that field, without > >> wrapping it in a tuple. This would directly support our existing > >> KeySelector functions since the already return Comparable, when > >> returning a tuple in a key selector function this would be compatible > >> with a TupleSelectionExtractor (on the other join side, for example). > >> > >> That's my idea. What do you think? I think the current state is not > >> maintainable, so we should do something quickly. :D > >> > >> Cheers, > >> Aljoscha > >> > --089e0111d1bea053620501a15e0f--