calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <asi...@maprtech.com>
Subject Re: Collation meets relational algebra
Date Fri, 31 Jul 2015 16:22:41 GMT
Yes, in general collation is a better fit as a physical property rather
than logical property of a plan node.  With regard to places where it makes
sense to treat it as logical property, agree with the ORDER-BY comments and
these should be extended to window functions too:
   SELECT  b,  RANK() OVER (ORDER BY b) FROM table;
I would think the LogicalWindow  should have collation on b.

Jinfeng, the subquery's ORDER-BY can be dropped in some cases but not all..
for instance in the following query:
  SELECT  a1 FROM (SELECT a1 FROM t1 WHERE .... ORDER BY a1)  LIMIT 10;
The OB should not be dropped.  There are other cases, this is one example.

Aman

On Fri, Jul 31, 2015 at 9:09 AM, Jinfeng Ni <jinfengni99@gmail.com> wrote:

> I think it makes sense that LogicalAggregate does not have collation, since
> a LogicalAggregate could be implemented with different physical operator,
> either hash-based aggregation, or sort-based aggregation. Only when
> LogicalAggregate is converted into physical aggregator,  it makes sense to
> have collation, depending on the which physical operator is used.
>
> Same thing could be applied to LogicalJoin, which could be implemented
> either as hash-join, or sort-based join.
>
> At logical level, the only collation will come from the top level ORDER BY
> clause.  In that sense, I feel that the ORDER BY clause in a SUBQUERY, or
> VIEW probably should be removed in logical planning, since semantically it
> does not impact query result.
>
>  SELECT   S.C1, T2.C4
>  FROM  (SELECT C1, C2, C3
>                FROM T1 ORDER BY C1) AS S JOIN
>                T2
> ON S  ...
> ORDER BY T2.C4;
>
> In Drill, we separate logical planning from physical planning, where the
> collation (together with distribution trait) will matter in physical
> planing.
>
>
>
>
> On Fri, Jul 31, 2015 at 7:27 AM, Milinda Pathirage <mpathira@umail.iu.edu>
> wrote:
>
> > Thanks Julian for looking in to this. Thanks Maryann for detecting the
> > issue in CALCITE-783 patch.
> >
> > As I understand we only need input's (input to aggregate) order related
> > metadata at the level of aggregate. I think I was wrong saying that
> > LogicalAggregate discards collation metadata from input in CALCITE-784
> > given that input is accessible from LogicalAggregate. We will only need
> to
> > do some calculations on input's collation metadata (or something similar)
> > if we need to infer something about LogicalAggregate to be use by
> operators
> > which take aggregate as an input.
> >
> > Thanks
> > Milinda
> >
> > On Thu, Jul 30, 2015 at 11:32 PM, Maryann Xue <maryann.xue@gmail.com>
> > wrote:
> >
> > > Thanks Julian for taking time to sort out all these requirements and
> > > rethink about the model!
> > > Thank you Milinda! Really appreciate your quick response to the issue.
> > >
> > > On Thu, Jul 30, 2015 at 4:57 PM, Julian Hyde <jhyde@apache.org> wrote:
> > >
> > >> There are a few issues in play regarding collations (783, 784, 793;
> see
> > >> links below) and they seem to be overlapping. Maryann and Milinda have
> > been
> > >> at odds with each other (in the politest possible way!)
> > >>
> > >> The cause is that they are both doing very interesting new work using
> > >> collation:
> > >> * Maryann is optimizing Phoenix plans to use secondary indexes. These
> > are
> > >> tables that are project-sort materializations of a base table, itself
> > >> sorted.
> > >> * Milinda is planning Samza streaming-aggregation queries. A plan can
> > >> only be found if you know that the stream is sorted on one of the
> > >> aggregation keys, usually a time column.
> > >>
> > >> I spoke with Maryann about this today. I think that logical plans
> should
> > >> not have a sort order:
> > >> * In 783 and 784, I think I was wrong to allow logical RelNodes
> > >> (LogicalProject and LogicalAggregate) to have collations. Because they
> > are
> > >> logical, they are inherently un-sorted. (But they may be based on a
> > table,
> > >> say an ArrayTable, that does have a sort order.)
> > >> * In 793, Maryann was right so say that we should not bake in the
> > >> collation that a plan *happens to have* when the SQL is first
> > translated,
> > >> because trying to find a physical plan with the same collation
> restricts
> > >> our options.
> > >>
> > >> But SQL ASTs should have a sort order (if the top node is an ORDER BY
> > >> clause, or if a table referenced in the FROM clause is a stream) and
> > >> physical RelNodes should also have a sort order.
> > >>
> > >> And Milinda’s logical plans need a concept similar to sorting. Maybe
a
> > >> piece of metadata that this RelNode *could be sorted by X, Y if
> > desired*.
> > >> Any table can, of course, be re-sorted into any order you like, but a
> > >> stream, which is infinite, can only be re-sorted to an order that does
> > not
> > >> conflict with the order of the incoming data.
> > >>
> > >> I still need to roll up my sleeves and help these patient developers
> > >> (especially Milinda) get something working, but I hope it helps to
> have
> > a
> > >> general direction.
> > >>
> > >> Julian
> > >>
> > >> * https://issues.apache.org/jira/browse/CALCITE-783 Infer collation
> of
> > >> Project using monotonicity
> > >> * https://issues.apache.org/jira/browse/CALCITE-784
> LogicalAggregate's
> > >> create method discards any collation traits from input
> > >> * https://issues.apache.org/jira/browse/CALCITE-793 The compiler asks
> > >> for unnecessary collation trait on plan with materialized view
> > >> * https://issues.apache.org/jira/browse/CALCITE-825 Allow user to
> > >> specify sort order of an ArrayTable
> > >>
> > >
> > >
> >
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message