Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 49088 invoked from network); 18 Aug 2005 17:25:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Aug 2005 17:25:32 -0000 Received: (qmail 16299 invoked by uid 500); 18 Aug 2005 17:25:30 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 16250 invoked by uid 500); 18 Aug 2005 17:25:30 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Development" Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 16230 invoked by uid 99); 18 Aug 2005 17:25:29 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2005 10:25:29 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [192.18.98.36] (HELO brmea-mail-4.sun.com) (192.18.98.36) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2005 10:25:48 -0700 Received: from phys-d3-ha21sca-2 ([129.145.155.165]) by brmea-mail-4.sun.com (8.12.10/8.12.9) with ESMTP id j7IHPRTW001593 for ; Thu, 18 Aug 2005 11:25:27 -0600 (MDT) Received: from conversion-daemon.ha21sca-mail1.sfbay.sun.com by ha21sca-mail1.sfbay.sun.com (iPlanet Messaging Server 5.2 HotFix 1.24 (built Dec 19 2003)) id <0ILF00B01HMI8G@ha21sca-mail1.sfbay.sun.com> (original mail from Richard.Hillegas@Sun.COM) for derby-dev@db.apache.org; Thu, 18 Aug 2005 10:25:27 -0700 (PDT) Received: from [129.150.34.12] (vpn-129-150-34-12.Central.Sun.COM [129.150.34.12]) by ha21sca-mail1.sfbay.sun.com (iPlanet Messaging Server 5.2 HotFix 1.24 (built Dec 19 2003)) with ESMTP id <0ILF001HAHQF2Y@ha21sca-mail1.sfbay.sun.com> for derby-dev@db.apache.org; Thu, 18 Aug 2005 10:25:27 -0700 (PDT) Date: Thu, 18 Aug 2005 10:25:42 -0700 From: Rick Hillegas Subject: Re: What is the correct way to solve DERBY-504? In-reply-to: To: Derby Development Message-id: <4304C496.7000202@sun.com> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) References: X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi Knut, Hm, I'm not sure I understand what you've done. Pardon me if I have garbled your description. It sounds as though the optimizer has done two things: 1) Rewritten the original query by flattening the inner subquery into the outer SELECT. 2) Then tried to push the DISTINCT into the Store. Perhaps the optimzer has made a flattening mistake here. It may have rewritten the query to be SELECT DISTINCT name, id from names when it meant to rewrite the query as SELECT DISTINCT name from names What I'm saying is, it't not clear to me that what's broken is DISTINCT push-down. The bug may really be in the subquery flattening logic. I agree with your misgivings that your proposed solution is overbroad. I think the bug has to do with the agreement between the outer DISTINCT list and the inner SELECT list. They need to be identical. In this case the outer DISTINCT list has one column but the inner SELECT list has two. I think that the number of tables in the inner FROM list should not be relevant. Regards, -Rick Knut Anders Hatlen wrote: >Hi, > >I have found a solution to DERBY-504 >(http://issues.apache.org/jira/browse/DERBY-504), but I would like to >get some input, as I'm not sure the solution is quite optimal. > >The short version of the bug is that in queries like > > SELECT DISTINCT name FROM (SELECT name, id FROM names) AS n > >duplicates are not removed from the result. (I don't think anyone >would/should ever write such a query, but it's still a bug...) > >After some investigation I found out that the bug is caused by the >optimizer passing the distinct from the top level query down to the >subquery, thereby scanning the names table for DISTINCT(name, id) >instead of DISTINCT(name). > >This particular optimization is meant for cases where the duplicate >elimination could be done during the scanning of the table, and a >comment in org/apache/derby/impl/sql/compile/SelectNode.java mentions >the criteria for using this optimization: > > /* See if we can push duplicate elimination into the store > * via a hash scan. This is possible iff: > * o A single table query > * o We haven't merged the order by and distinct sorts. > * (Results do not have to be in a particular order.) > * o All entries in the select's RCL are ColumnReferences. > * o No predicates (This is because we currently do not > * differentiate between columns referenced in the select > * list and columns referenced in other clauses. In other > * words, the store will do duplicate elimination based on > * all referenced columns.) > >For the single-table-query criterion, the code checks whether the >query has only one source in its FROM clause, and then uses the same >test recursively on that source until the criterion is not met or the >source is a table. > >My bug fix works by disabling the recursive check. Instead of running >the test recursively, it just checks whether there is exactly one >source in the FROM clause and that the source is a table, not a >subquery. > >Now, this fix works, which is probably most important, but I'm not >sure it is the best solution. At least, I still have some questions. > > 1. The fix removes optimization in some cases where it could be > advantageous. It makes sure that the top-level distinct is never > pushed down to a subquery. Is this an acceptable trade-off? An > alternative to this would be to make the push more > intelligent. Right now the distinct scan is pushed to the subquery > through ResultSetNode.markForDistinctScan(), a method which takes > no arguments. Perhaps markForDistinctScan() could have the list of > distinct columns as an argument? > > 2. The query that exposed this bug is really a distinct select on a > projection of a projection. A projection of a projection could > easily be rewritten as a single projection. Should we extend the > optimizer to handle such cases? > >I would really like to hear your comments on these questions! > >Thanks! > > >