drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <julianh...@gmail.com>
Subject Re: CUBE and ROLLUP?
Date Sun, 09 Nov 2014 16:15:08 GMT
FYI, I've started work on CALCITE-370 in branch https://github.com/julianhyde/incubator-calcite/tree/calcite-370.
I've only done the parser changes so far; I haven't changed the relational algebra.

My plan is to generalize the Aggregate class to have a list of grouping sets. If you use simple
GROUP BY x, y you will get just one grouping set as if you had written GROUP BY GROUPING SETS
(x, y). The output row type will include indicator columns, one for each distinct grouping
expression (see the GROUPING function, https://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm#i1007434).

public class Aggregate extends SingleRel {
  public final BitSet groupKey;
  public final List<BitSet> groupSets;
  public final GroupingType groupingType;

  enum GroupingType {
    SINGLE, // one grouping set
    ROLLUP, // roll up leading edge: (x, y, z), (x, y), (x), ()
    CUBE, // the full 2^n grouping sets
    OTHER // not one of the above
}

The row type for 'select k0, k1, sum(c) as a0, sum(d) as a1, sum(e) as a2 from t group k0,
k1' would be (k0, g0, k1, g1, a0, a1, a2). Note the indicator columns g0, g1. g0 evaluates
GROUPING(k0), saying whether this row is a roll up over all g0 values.

Existing rules will have to be changed to only fire if groupingType == SINGLE, and skip over
the indicator columns.

Julian


> On Nov 8, 2014, at 6:26 PM, Michael Johnson <mjjohnson.byu@gmail.com> wrote:
> 
> I've spent the last little bit researching different ways to implement the
> CUBE and ROLLUP operators efficiently, and I've been looking through the
> code base to get started. I'm having a little difficulty pinpointing the
> most relevant portions of the code.
> 
> Where would you all suggest would be the best place to get going, both in
> the Optiq/Calcite code and in the actual Drill code? (I've looked at the
> AggregateRelBase class in Optiq/Calcite based on Julian's tip, but wasn't
> sure where to proceed there. And in Drill, I definitely don't know.)
> 
> (I don't want to sound like I'm asking for a lot of hand-holding, since I'm
> trying to help rather than take up everyone's time, but any pointers to
> help me know which code to look at would be much appreciated!)
> 
> Thanks,
> Michael
> 
> 
> On Sun, Sep 14, 2014 at 8:40 PM, Julian Hyde <julianhyde@gmail.com> wrote:
> 
>> I agree with Ted that this would be a great feature.
>> 
>> You might need support from Optiq for parsing the SQL and representing the
>> relational algebra before it is translated to the Drill physical algebra
>> that you build. Unfortunately Optiq doesn't have that support yet (see
>> https://issues.apache.org/jira/browse/OPTIQ-370) but we could expedite it.
>> 
>> (You're welcome to come to the Optiq hackathon on Wednesday and work on it
>> there!)
>> 
>> CUBE and ROLLUP have a related feature, GROUPING SETS. GROUPING SETS allows
>> you to specify exactly which levels of aggregation you want. In my view,
>> CUBE and ROLLUP are just syntactic sugar to allow you to ask for a lot of
>> grouping sets at the same time (most of which you may not need).
>> 
>> But to keep things simple, just implement CUBE at first. Add a 'boolean
>> cube' field to the AggregationRelBase operator, so that GROUP BY CUBE(x, y,
>> z) passes through the parser, validator, translator very similarly to GROUP
>> BY x, y, z.
>> 
>> When you have made the changes to the physical operator and you have some
>> cube queries working correctly, circle back and implement GROUPING SETS,
>> specifying exactly which grouping sets you want.
>> 
>> Julian
>> 
>> 
>> 
>>>>>> 
>>>>>> 
>>>>>>> On 13/09/2014 18:57, Michael Johnson wrote:
>>>>>>> 
>>>>>>> For an advanced databases class project, I'm looking at adding
CUBE
>>> and
>>>>>>> ROLLUP operators to Drill. (I'll be working up to that by trying
out
>>> some
>>>>>>> smaller changes first to get a better understanding of Drill's
>> code.)
>>>>>>> 
>>>>>>> Does this sound like a feature that you might want to incorporate
>> into
>>>>>>> Drill? Any other thoughts about this idea?
>>>>>>> 
>>>>>>> Michael
>>>>>> 
>>> 
>> 


Mime
View raw message