Return-Path: Delivered-To: apmail-db-derby-commits-archive@www.apache.org Received: (qmail 97739 invoked from network); 4 Aug 2007 16:30:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Aug 2007 16:30:40 -0000 Received: (qmail 57139 invoked by uid 500); 4 Aug 2007 16:30:40 -0000 Delivered-To: apmail-db-derby-commits-archive@db.apache.org Received: (qmail 57117 invoked by uid 500); 4 Aug 2007 16:30:40 -0000 Mailing-List: contact derby-commits-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: "Derby Development" List-Id: Delivered-To: mailing list derby-commits@db.apache.org Received: (qmail 57106 invoked by uid 99); 4 Aug 2007 16:30:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Aug 2007 09:30:40 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Aug 2007 16:30:24 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id E7FD659A07 for ; Sat, 4 Aug 2007 16:30:16 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: derby-commits@db.apache.org Date: Sat, 04 Aug 2007 16:30:16 -0000 Message-ID: <20070804163016.24741.47510@eos.apache.org> Subject: [Db-derby Wiki] Update of "OLAPRollupLists" by BryanPendleton X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification. The following page has been changed by BryanPendleton: http://wiki.apache.org/db-derby/OLAPRollupLists The comment on the change is: Start new page for ROLLUP New page: As part of the ["OLAPOperations"] effort, we'd like to add support for ROLLUP multi-level grouping. == Background == Consider a table of sales transaction data, which tracks information about particular sales: {{{ create table sales_history (region varchar(10), state char(2), product varchar(10), date_of_sale timestamp, sales int); }}} Such a table has one row for each sales transaction, recording the region and state where the sale occurred, the product that was sold, the date on which it was sold, and the amount of the sale. Now suppose that we'd like to analyze our sales data, to study the amount of sales that is occurring for different products, in different states and regions. Using the ROLLUP feature of SQL 2003, we could issue the query: {{{ select region, state, product, sum(sales) total_sales from sales_history group by rollup (region, state, product) }}} Semantically, the above query is equivalent to {{{ select region, state, product, sum(sales) total_sales from sales_history group by region, state, product union select region, state, null, sum(sales) total_sales from sales_history group by region, state union select region, null, null, sum(sales) total_sales from sales_history group by region union select null, null, null, sum(sales) total_sales from sales_history }}} The query might produce results that looked something like: {{{ REGION STATE PRODUCT TOTAL_SALES ------ ----- ------- ----------- null null null 6200 EAST MA BOATS 100 EAST MA CARS 1500 EAST MA null 1600 EAST NY BOATS 150 EAST NY CARS 1000 EAST NY null 1150 EAST null null 2750 WEST CA BOATS 750 WEST CA CARS 500 WEST CA null 1250 WEST AZ BOATS 2000 WEST AZ CARS 200 WEST AZ null 2200 WEST null null 3450 }}} We could actually compute the ROLLUP query by transforming it into the equivalent UNION query. However, one of the things that's very interesting about this query is that it can be computed using a '''single''' sort of the data, and a '''single''' pass over that sorted data. So the expense of computing this query is roughly equivalent to the expense of computing the single sub-query {{{ select region, state, product, sum(sales) total_sales from sales_history group by region, state, product }}} because the other aggregation levels can be computed at the same time. All we have to do is to extend GroupedAggregateResultSet so that instead of computing a single result row at a time, it maintains N pending result rows, where N is the number of levels of ROLLUP in the query (N == 4 in the query above). So, when GroupedAggregateResultSet is processing a row of data for (region=EAST,state=MA,product=BOATS), it should not only sum the sales for that level of aggregation, but should also at the same time sum the sales for the aggregation levels (region=EAST,state=MA), and (region=EAST), and (). Then, when GroupedAggregateResultSet encounters the end of the (EAST,MA,BOATS) data, and starts to encounter (EAST,MA,CARS) data, it should finalize the result row for (EAST,MA,BOATS) and start working on computing the result row for (EAST,MA,CARS), but it should simply continue aggregating the result rows for (EAST,MA), (EAST), and (). And when GroupedAggregateResultSet encounters the end of the (EAST,MA,CARS) data, and starts to encounter (EAST,NY,BOATS) data,it should finalize the result row for (EAST,MA,CARS), and it should also finalize the result row for (EAST,MA), and it should start working on computing the result row for (EAST,NY,BOATS), but it should continue aggregating the result rows for (EAST) and for (). == Syntax == In the SQL 2003 standard, the ROLLUP keyword impacts the following portions of the SQL grammar: {{{ ::= | | | | ::= ROLLUP ::= | | | | }}} The above grammar is just a tiny subset of the overall GROUP BY grammar, namely those parts that involve ROLLUP.