calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Collation meets relational algebra
Date Thu, 30 Jul 2015 20:57:23 GMT
There are a few issues in play regarding collations (783, 784, 793; see links below) and they
seem to be overlapping. Maryann and Milinda have been at odds with each other (in the politest
possible way!)

The cause is that they are both doing very interesting new work using collation:
* Maryann is optimizing Phoenix plans to use secondary indexes. These are tables that are
project-sort materializations of a base table, itself sorted.
* Milinda is planning Samza streaming-aggregation queries. A plan can only be found if you
know that the stream is sorted on one of the aggregation keys, usually a time column.

I spoke with Maryann about this today. I think that logical plans should not have a sort order:
* In 783 and 784, I think I was wrong to allow logical RelNodes (LogicalProject and LogicalAggregate)
to have collations. Because they are logical, they are inherently un-sorted. (But they may
be based on a table, say an ArrayTable, that does have a sort order.)
* In 793, Maryann was right so say that we should not bake in the collation that a plan *happens
to have* when the SQL is first translated, because trying to find a physical plan with the
same collation restricts our options.

But SQL ASTs should have a sort order (if the top node is an ORDER BY clause, or if a table
referenced in the FROM clause is a stream) and physical RelNodes should also have a sort order.

And Milinda’s logical plans need a concept similar to sorting. Maybe a piece of metadata
that this RelNode *could be sorted by X, Y if desired*. Any table can, of course, be re-sorted
into any order you like, but a stream, which is infinite, can only be re-sorted to an order
that does not conflict with the order of the incoming data.

I still need to roll up my sleeves and help these patient developers (especially Milinda)
get something working, but I hope it helps to have a general direction.

Julian

* https://issues.apache.org/jira/browse/CALCITE-783 <https://issues.apache.org/jira/browse/CALCITE-783>
Infer collation of Project using monotonicity
* https://issues.apache.org/jira/browse/CALCITE-784 <https://issues.apache.org/jira/browse/CALCITE-784>
LogicalAggregate's create method discards any collation traits from input
* https://issues.apache.org/jira/browse/CALCITE-793 <https://issues.apache.org/jira/browse/CALCITE-793>
The compiler asks for unnecessary collation trait on plan with materialized view
* https://issues.apache.org/jira/browse/CALCITE-825 <https://issues.apache.org/jira/browse/CALCITE-825>
Allow user to specify sort order of an ArrayTable
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message