db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3002) Add support for GROUP BY ROLLUP
Date Tue, 21 Aug 2007 03:07:39 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521308
] 

Bryan Pendleton commented on DERBY-3002:
----------------------------------------

Hi Manish thanks for reading the diff. You are absolutely right that it is possible to compute
the rollups with a single sort, but I haven't made that optimization yet. I shall work on
improving the wiki to make this clearer.

As I see it, there are basically 3 ways to perform grouped aggregation, and numerous ways
to combine these three variants.
1) If the records to be grouped are already in sorted order, perhaps due to a scan of an index,
or a previous merge-sort-join operation, or a duplicate-eliminating distinct sort, then the
grouping operation can be implemented by comparing each pair of records on the grouping attributes;
each time a new value of the grouping attributes is encountered, the computation of the previous
group can be finalized and emitted. This is the algorithm discussed in the wiki page, and
is currently implemented by Derby for a single set of grouping attributes; I have not yet
constructed a patch to extend this logic for a ROLLUP set of groups.
2) An unsorted set of records can be grouped by feeding them into the sorter, configured for
duplicate elimination, with a sort observer which aggregates each row's values into the retained
row's columns during duplicate processing. Each such sort groups the records by a specific
set of grouping attributes. The patch that I constructed uses N sorts to group the records
by N different sets of grouping attributes. The input rows are read only once, but then are
added to each of the N sorts. This is dramatically more efficient than the alternate query-rewriting
algorithm of constructing N GROUP BY statements and then UNION'ing their results together,
since in this implementation the input rows are materialized and read only once.
3) An unsorted set of records can also be grouped by feeding them into a hash table, whose
keys are the grouping attributes, with a hash-collision observer which aggregates each row's
values into the retained row's columns during duplicate key elimination. This is fundamentally
similar to the sort-based technique, with the important difference that at the end of the
processing, the rows will NOT be emitted in sorted order.

Derby currently implements techniques (1) and (2) for grouped aggregate computation; I am
not aware of any code which implements a hash-based technique for grouped aggregates in Derby.

As Manish notes, techniques (1) and (2) can be combined into an algorithm which computes the
N ROLLUP groups using a single sort, by first sorting (and aggregating) the rows on their
finest level of grouping detail. Then, as the rows are being retrieved in sorted order, the
code could detect when a new value of grouping attributes is being retrieved, and could automatically
compute that higher level group at that time. Thus if we are computing the GROUP BY ABCD,
ABC, AB, and A, we can:
 - sort (and group and aggregate) the data on ABCD
 - For each row we retrieve, we can propagate it up the pipeline to compute the appropriate
ABC group, the appropriate AB group, the appropriate A group, and the appropriate all-rows
group.

This algorithm seems to hold the potential for being quite efficient, but I haven't investigated
the actual coding yet, and for the time being as an experiment I investigated the easier-to-code
but less efficient multiple-sorts algorithm.


> Add support for GROUP BY ROLLUP
> -------------------------------
>
>                 Key: DERBY-3002
>                 URL: https://issues.apache.org/jira/browse/DERBY-3002
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Bryan Pendleton
>            Assignee: Bryan Pendleton
>            Priority: Minor
>         Attachments: prototypeChangeNoTests.diff
>
>
> Provide an implementation of the ROLLUP form of multi-dimensional grouping according
to the SQL standard.
> See http://wiki.apache.org/db-derby/OLAPRollupLists for some more detailed information
about this aspect of the SQL standard.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message