Return-Path: X-Original-To: apmail-calcite-dev-archive@www.apache.org Delivered-To: apmail-calcite-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C0C53189C4 for ; Fri, 31 Jul 2015 16:23:54 +0000 (UTC) Received: (qmail 33680 invoked by uid 500); 31 Jul 2015 16:23:38 -0000 Delivered-To: apmail-calcite-dev-archive@calcite.apache.org Received: (qmail 33616 invoked by uid 500); 31 Jul 2015 16:23:38 -0000 Mailing-List: contact dev-help@calcite.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@calcite.incubator.apache.org Delivered-To: mailing list dev@calcite.incubator.apache.org Received: (qmail 33604 invoked by uid 99); 31 Jul 2015 16:23:38 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2015 16:23:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 189CE196471 for ; Fri, 31 Jul 2015 16:23:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=maprtech.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 5LQ7W-lvzrJZ for ; Fri, 31 Jul 2015 16:23:27 +0000 (UTC) Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 231EC428DC for ; Fri, 31 Jul 2015 16:23:27 +0000 (UTC) Received: by wibud3 with SMTP id ud3so38547701wib.0 for ; Fri, 31 Jul 2015 09:22:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maprtech.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=lSqCQ+76wF3al294mSX4OBdFJ0DB0yTmDi16TNeb1hw=; b=XDdL1ssXF+xgJsXq1HM5EvcpZnGw4+UDCTOOb3xeP5jdbS0zEosW3OcJv2ockhCELt gmtN1fPoLJ1knEUOIgz0ZnzoHfe645zAKidlSJoYHK2VIDT1IYOp5IDO0wjGTjiolgAt v91pziKSFgwcn3wrsyboGVX1YkqMMf2kGL4TU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=lSqCQ+76wF3al294mSX4OBdFJ0DB0yTmDi16TNeb1hw=; b=aNb0v7F3J635CNBAnlZbfwZjo6mFMst8UUNKWjXa8gGO95+Ie+W7WU+LWEFDzV27Vs 4mYfWEfjy33AFJLs6JkJYn1KaL+AJ62LDpcya84xUpVUiCrlKdFn75uGmO5lUxxW2CvU 5YqYSZpZ7OPOLALSaZWgjmKe+mxNa8vBAkx50wlVgZfwxWgsnO/0bx1JA1z4Sh7+6q+Z L1vcCD1wZ5+KMZl0MJdfeEuSfrJaDe3VESJWbQ5TFo5MixzyotrVNMzhp7gPkNPTqrEs uFQH8Hp4SToumHeUBUV2XJwv5MSoAtMYpjVPAOdB2wKbctHlEEXbxUPtYYHOzEf+Vo9D ftUQ== X-Gm-Message-State: ALoCoQmFYmWme2RzwdPi4i2VWmR/Egv20uqhkNTypbL2hfXhPHp/vk3/x8VuFRmU58ptF3u4QW4E MIME-Version: 1.0 X-Received: by 10.180.77.193 with SMTP id u1mr8571141wiw.50.1438359761159; Fri, 31 Jul 2015 09:22:41 -0700 (PDT) Received: by 10.194.81.2 with HTTP; Fri, 31 Jul 2015 09:22:41 -0700 (PDT) In-Reply-To: References: Date: Fri, 31 Jul 2015 09:22:41 -0700 Message-ID: Subject: Re: Collation meets relational algebra From: Aman Sinha To: dev@calcite.incubator.apache.org Cc: Maryann Xue , Julian Hyde , Maryann Xue Content-Type: multipart/alternative; boundary=f46d043d67599f9ae9051c2e3ab7 --f46d043d67599f9ae9051c2e3ab7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yes, in general collation is a better fit as a physical property rather than logical property of a plan node. With regard to places where it makes sense to treat it as logical property, agree with the ORDER-BY comments and these should be extended to window functions too: SELECT b, RANK() OVER (ORDER BY b) FROM table; I would think the LogicalWindow should have collation on b. Jinfeng, the subquery's ORDER-BY can be dropped in some cases but not all.. for instance in the following query: SELECT a1 FROM (SELECT a1 FROM t1 WHERE .... ORDER BY a1) LIMIT 10; The OB should not be dropped. There are other cases, this is one example. Aman On Fri, Jul 31, 2015 at 9:09 AM, Jinfeng Ni wrote: > I think it makes sense that LogicalAggregate does not have collation, sin= ce > a LogicalAggregate could be implemented with different physical operator, > either hash-based aggregation, or sort-based aggregation. Only when > LogicalAggregate is converted into physical aggregator, it makes sense t= o > have collation, depending on the which physical operator is used. > > Same thing could be applied to LogicalJoin, which could be implemented > either as hash-join, or sort-based join. > > At logical level, the only collation will come from the top level ORDER B= Y > clause. In that sense, I feel that the ORDER BY clause in a SUBQUERY, or > VIEW probably should be removed in logical planning, since semantically i= t > does not impact query result. > > SELECT S.C1, T2.C4 > FROM (SELECT C1, C2, C3 > FROM T1 ORDER BY C1) AS S JOIN > T2 > ON S ... > ORDER BY T2.C4; > > In Drill, we separate logical planning from physical planning, where the > collation (together with distribution trait) will matter in physical > planing. > > > > > On Fri, Jul 31, 2015 at 7:27 AM, Milinda Pathirage > wrote: > > > Thanks Julian for looking in to this. Thanks Maryann for detecting the > > issue in CALCITE-783 patch. > > > > As I understand we only need input's (input to aggregate) order related > > metadata at the level of aggregate. I think I was wrong saying that > > LogicalAggregate discards collation metadata from input in CALCITE-784 > > given that input is accessible from LogicalAggregate. We will only need > to > > do some calculations on input's collation metadata (or something simila= r) > > if we need to infer something about LogicalAggregate to be use by > operators > > which take aggregate as an input. > > > > Thanks > > Milinda > > > > On Thu, Jul 30, 2015 at 11:32 PM, Maryann Xue > > wrote: > > > > > Thanks Julian for taking time to sort out all these requirements and > > > rethink about the model! > > > Thank you Milinda! Really appreciate your quick response to the issue= . > > > > > > On Thu, Jul 30, 2015 at 4:57 PM, Julian Hyde wrote= : > > > > > >> There are a few issues in play regarding collations (783, 784, 793; > see > > >> links below) and they seem to be overlapping. Maryann and Milinda ha= ve > > been > > >> at odds with each other (in the politest possible way!) > > >> > > >> The cause is that they are both doing very interesting new work usin= g > > >> collation: > > >> * Maryann is optimizing Phoenix plans to use secondary indexes. Thes= e > > are > > >> tables that are project-sort materializations of a base table, itsel= f > > >> sorted. > > >> * Milinda is planning Samza streaming-aggregation queries. A plan ca= n > > >> only be found if you know that the stream is sorted on one of the > > >> aggregation keys, usually a time column. > > >> > > >> I spoke with Maryann about this today. I think that logical plans > should > > >> not have a sort order: > > >> * In 783 and 784, I think I was wrong to allow logical RelNodes > > >> (LogicalProject and LogicalAggregate) to have collations. Because th= ey > > are > > >> logical, they are inherently un-sorted. (But they may be based on a > > table, > > >> say an ArrayTable, that does have a sort order.) > > >> * In 793, Maryann was right so say that we should not bake in the > > >> collation that a plan *happens to have* when the SQL is first > > translated, > > >> because trying to find a physical plan with the same collation > restricts > > >> our options. > > >> > > >> But SQL ASTs should have a sort order (if the top node is an ORDER B= Y > > >> clause, or if a table referenced in the FROM clause is a stream) and > > >> physical RelNodes should also have a sort order. > > >> > > >> And Milinda=E2=80=99s logical plans need a concept similar to sortin= g. Maybe a > > >> piece of metadata that this RelNode *could be sorted by X, Y if > > desired*. > > >> Any table can, of course, be re-sorted into any order you like, but = a > > >> stream, which is infinite, can only be re-sorted to an order that do= es > > not > > >> conflict with the order of the incoming data. > > >> > > >> I still need to roll up my sleeves and help these patient developers > > >> (especially Milinda) get something working, but I hope it helps to > have > > a > > >> general direction. > > >> > > >> Julian > > >> > > >> * https://issues.apache.org/jira/browse/CALCITE-783 Infer collation > of > > >> Project using monotonicity > > >> * https://issues.apache.org/jira/browse/CALCITE-784 > LogicalAggregate's > > >> create method discards any collation traits from input > > >> * https://issues.apache.org/jira/browse/CALCITE-793 The compiler ask= s > > >> for unnecessary collation trait on plan with materialized view > > >> * https://issues.apache.org/jira/browse/CALCITE-825 Allow user to > > >> specify sort order of an ArrayTable > > >> > > > > > > > > > > > > -- > > Milinda Pathirage > > > > PhD Student | Research Assistant > > School of Informatics and Computing | Data to Insight Center > > Indiana University > > > > twitter: milindalakmal > > skype: milinda.pathirage > > blog: http://milinda.pathirage.org > > > --f46d043d67599f9ae9051c2e3ab7--