calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: Collation meets relational algebra
Date Fri, 31 Jul 2015 16:09:53 GMT
I think it makes sense that LogicalAggregate does not have collation, since
a LogicalAggregate could be implemented with different physical operator,
either hash-based aggregation, or sort-based aggregation. Only when
LogicalAggregate is converted into physical aggregator,  it makes sense to
have collation, depending on the which physical operator is used.

Same thing could be applied to LogicalJoin, which could be implemented
either as hash-join, or sort-based join.

At logical level, the only collation will come from the top level ORDER BY
clause.  In that sense, I feel that the ORDER BY clause in a SUBQUERY, or
VIEW probably should be removed in logical planning, since semantically it
does not impact query result.

 SELECT   S.C1, T2.C4
 FROM  (SELECT C1, C2, C3
               FROM T1 ORDER BY C1) AS S JOIN
               T2
ON S  ...
ORDER BY T2.C4;

In Drill, we separate logical planning from physical planning, where the
collation (together with distribution trait) will matter in physical
planing.




On Fri, Jul 31, 2015 at 7:27 AM, Milinda Pathirage <mpathira@umail.iu.edu>
wrote:

> Thanks Julian for looking in to this. Thanks Maryann for detecting the
> issue in CALCITE-783 patch.
>
> As I understand we only need input's (input to aggregate) order related
> metadata at the level of aggregate. I think I was wrong saying that
> LogicalAggregate discards collation metadata from input in CALCITE-784
> given that input is accessible from LogicalAggregate. We will only need to
> do some calculations on input's collation metadata (or something similar)
> if we need to infer something about LogicalAggregate to be use by operators
> which take aggregate as an input.
>
> Thanks
> Milinda
>
> On Thu, Jul 30, 2015 at 11:32 PM, Maryann Xue <maryann.xue@gmail.com>
> wrote:
>
> > Thanks Julian for taking time to sort out all these requirements and
> > rethink about the model!
> > Thank you Milinda! Really appreciate your quick response to the issue.
> >
> > On Thu, Jul 30, 2015 at 4:57 PM, Julian Hyde <jhyde@apache.org> wrote:
> >
> >> There are a few issues in play regarding collations (783, 784, 793; see
> >> links below) and they seem to be overlapping. Maryann and Milinda have
> been
> >> at odds with each other (in the politest possible way!)
> >>
> >> The cause is that they are both doing very interesting new work using
> >> collation:
> >> * Maryann is optimizing Phoenix plans to use secondary indexes. These
> are
> >> tables that are project-sort materializations of a base table, itself
> >> sorted.
> >> * Milinda is planning Samza streaming-aggregation queries. A plan can
> >> only be found if you know that the stream is sorted on one of the
> >> aggregation keys, usually a time column.
> >>
> >> I spoke with Maryann about this today. I think that logical plans should
> >> not have a sort order:
> >> * In 783 and 784, I think I was wrong to allow logical RelNodes
> >> (LogicalProject and LogicalAggregate) to have collations. Because they
> are
> >> logical, they are inherently un-sorted. (But they may be based on a
> table,
> >> say an ArrayTable, that does have a sort order.)
> >> * In 793, Maryann was right so say that we should not bake in the
> >> collation that a plan *happens to have* when the SQL is first
> translated,
> >> because trying to find a physical plan with the same collation restricts
> >> our options.
> >>
> >> But SQL ASTs should have a sort order (if the top node is an ORDER BY
> >> clause, or if a table referenced in the FROM clause is a stream) and
> >> physical RelNodes should also have a sort order.
> >>
> >> And Milinda’s logical plans need a concept similar to sorting. Maybe a
> >> piece of metadata that this RelNode *could be sorted by X, Y if
> desired*.
> >> Any table can, of course, be re-sorted into any order you like, but a
> >> stream, which is infinite, can only be re-sorted to an order that does
> not
> >> conflict with the order of the incoming data.
> >>
> >> I still need to roll up my sleeves and help these patient developers
> >> (especially Milinda) get something working, but I hope it helps to have
> a
> >> general direction.
> >>
> >> Julian
> >>
> >> * https://issues.apache.org/jira/browse/CALCITE-783 Infer collation of
> >> Project using monotonicity
> >> * https://issues.apache.org/jira/browse/CALCITE-784 LogicalAggregate's
> >> create method discards any collation traits from input
> >> * https://issues.apache.org/jira/browse/CALCITE-793 The compiler asks
> >> for unnecessary collation trait on plan with materialized view
> >> * https://issues.apache.org/jira/browse/CALCITE-825 Allow user to
> >> specify sort order of an ArrayTable
> >>
> >
> >
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message