drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3910) Leverage Calcite's Clustered Collation
Date Wed, 07 Oct 2015 20:37:26 GMT
Jacques Nadeau created DRILL-3910:

             Summary: Leverage Calcite's Clustered Collation
                 Key: DRILL-3910
                 URL: https://issues.apache.org/jira/browse/DRILL-3910
             Project: Apache Drill
          Issue Type: Improvement
          Components: Query Planning & Optimization
            Reporter: Jacques Nadeau

Right now streaming aggregate requires full collation. I was just talking to [~julianhyde]
and he pointed out that Calcite has a version of Collation that is Clustered (similar to what
MSSQL calls Segment). Realistically, Streaming aggregate only requires a clustered collation
and we should switch to requiring this. We should also go through existing operators and make
sure we manage whether or not the operators maintain a clustered collation. We should then
be able to have flatten produce a clustered output against the carry-through fields. This
will allow us to do a better job taking advantage of the clustered-ness of data for doing
additional operations. Flatten should also produce data which exposes the distribution trait
on the carry-through fields. This means that a query like this:

select a, count(b) from (
  select a, flatten(x) as b from t
group by a

Should be executed without redistribution of data.

This message was sent by Atlassian JIRA

View raw message