Return-Path: X-Original-To: apmail-db-derby-dev-archive@www.apache.org Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20608D2DD for ; Thu, 16 May 2013 06:53:20 +0000 (UTC) Received: (qmail 48816 invoked by uid 500); 16 May 2013 06:53:19 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 48467 invoked by uid 500); 16 May 2013 06:53:18 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 47868 invoked by uid 99); 16 May 2013 06:53:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 06:53:17 +0000 Date: Thu, 16 May 2013 06:53:17 +0000 (UTC) From: "Dag H. Wanvik (JIRA)" To: derby-dev@db.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (DERBY-6148) Peculiar sorting behaviour MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DERBY-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659298#comment-13659298 ] Dag H. Wanvik edited comment on DERBY-6148 at 5/16/13 6:52 AM: --------------------------------------------------------------- Uploading a patch which makes the failing query work and regressions passed. It is not for commit; I need add new regression tests at the minimum. The previous hypothesis about lacking reset/forgetting info from the first (discarded) join permutation proved to not be the reason for the error. I believe the root cause is an incorrect assumption in OrderByList#sortRequired(RowOrdering, JBitSet, OptimizableList). DERBY-3926 added the OptimizableList argument to this method. In the logic, that argument is treated as the join ordering under consideration, cf. the Javadoc of RequiredRowOrdering#sortRequired; the interface OrderByList's method implements (quote): * "@param optimizableList The current join order being considered by * the optimizer. We need to look into this to determine if the outer * optimizables are single row resultset if the order by column is * on an inner optimizable and that inner optimizable is not a one * row resultset. DERBY-3926 However, debugging, I see that the argument passed in from OptimizerImpl is always OptimizerImpl#optimizableList which is not permuted when join orders are considered. The permutation is described by OptimizerImpl#proposedJoinOrder, which is a map into OptimizerImpl#optimizableList. See for example usage ca line 556: ....optimizableList.getOptimizable(proposedJoinOrder[joinPosition]).getBestAccessPath().... Now in OrderByList#sortRequired, there is this piece of code: // If we have come across the optimizable for the order // by column in the join order, then we do not need to // look at the inner optimizables in the join order. As // long as the outer optimizables are one row // resultset, or is ordered on the order by column (see // below check), we are fine to consider sort // avoidance. if (considerOptimizable.getTableNumber() == cr.getTableNumber()) break; This test is meant to determine that the column reference cr (the current order by column under consideration) references the current (outer) optimizable. It will only get this this test for inner tables if the outer has only one row. And in the query that works as intended so when I textually reordered the FROM tables to match the chosen join order, that's exactly what happens: ITEMS_USAGE is the outer table and the cr's table number doesn't match it (both colums reference TESTS), so the break is not performed. However, in the failing case, this test actually checks against the *inner* table (TESTS), since the proposedJoinOrder map isn't used, so the break is executed. Then later in the same method, ca line 622, we check whether the order by columns are ordered against the rowOrdering of the current access path (the index on the unique constraint TESTS_1 which does contain both the order by columns), so sort avoidance is accepted. After I introduce the indirection map into the method, the check shown above will always fail for outer ITEMS_USAGE (no matter its textual order in the join): execution proceeds to the check for single row in the outer optimizable (ca. line 605), which fails, since the outer table (ITEMS_USAGE) may have more than one row with even with equality predicate on user ('TAMMY'), so sort avoidance is not accepted. was (Author: dagw): Uploading a patch which makes the failing query work and regressions passed. It is not for commit; I need add new regression tests at the minimum. The previous hypothesis about lacking reset/forgetting info from the first (discarded) join permutation proved to not be the reason for the error. I believe the root cause is an incorrect assumption in OrderByList#sortRequired(RowOrdering, JBitSet, OptimizableList). DERBY-3926 added the OptimizableList argument to this method. In the logic, that argument is treated as the join ordering under consideration, cf. the Javadoc of RequiredRowOrdering#sortRequired; the interface OrderByList's method implements (quote): * "@param optimizableList The current join order being considered by * the optimizer. We need to look into this to determine if the outer * optimizables are single row resultset if the order by column is * on an inner optimizable and that inner optimizable is not a one * row resultset. DERBY-3926 However, debugging, I see that the argument passed in from OptimizerImpl is always OptimizerImpl#optimizableList which is not permuted when join orders are considered. The permutation is described by OptimizerImpl#proposedJoinOrder, which is a map into OptimizerImpl#optimizableList. See for example usage ca line 556: ....optimizableList.getOptimizable(proposedJoinOrder[joinPosition]).getBestAccessPath().... Now in OrderByList#sortRequired, there is this piece of code: // If we have come across the optimizable for the order // by column in the join order, then we do not need to // look at the inner optimizables in the join order. As // long as the outer optimizables are one row // resultset, or is ordered on the order by column (see // below check), we are fine to consider sort // avoidance. if (considerOptimizable.getTableNumber() == cr.getTableNumber()) break; This test is meant to determine that the column reference cr (the current order by column under consideration) references the current (outer) optimizable. And in the query that works, when I reordered the FROM tables to match the chosen join order, that's exactly what happens: ITEMS_USAGE is the outer table and the cr's table number doesn't match it, so the break is not performed. However, in the failing case, this test actually checks against the *inner* table (TESTS), since the proposedJoinOrder map isn't used, so the break is executed. Then later in the same method, ca line 622, we check whether the order by columns are ordered against the rowOrdering of the current access path (the index on the unique constraint TESTS_1 which does contain both the order by column), so sort avoidance is accepted. After I introduce the indirection map into the method, the check shown above will always fail for outer ITEMS_USAGE (no matter its textual order in the join): execution proceeds to the check for single row in the outer optimizable (ca. line 605), which fails, since the outer table (ITEMS_USAGE) may have more than one row with even with equality predicate on user ('TAMMY'), so sort avoidance is not accepted. > Peculiar sorting behaviour > -------------------------- > > Key: DERBY-6148 > URL: https://issues.apache.org/jira/browse/DERBY-6148 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.9.1.0 > Environment: Windows 7 64-bit, Ubuntu 32-bit; JRE 1.7.0_11+ > Reporter: John English > Attachments: bug.zip, bug.zip, derby-6148-1.diff, derby-6148-1.status, preprocessed1.txt, preprocessed2.txt, q1.txt, q2.txt > > > I have a query that looks like this: > SELECT tests.id,tests.item,title FROM tests,item_usage > WHERE username=? AND user_role>? > AND item_usage.item=tests.item > ORDER BY tests.item,title > The result ordering is by item code followed by title, but the item codes are listed in the order in which they appear in the ITEMS table where they are the primary key rather than in ascending order as expected. If however I change the ORDER BY clause to sort by item_usage.item rather than tests.item, it works correctly, even though the two values are the same! > The same thing happens in another unrelated query involving item_usage, and the same workaround cures it. > The relevant tables are defined like so: > CREATE TABLE item_usage ( > username VARCHAR(15) NOT NULL, > item VARCHAR(15) NOT NULL, > value SMALLINT DEFAULT 0, > CONSTRAINT item_usage_pk PRIMARY KEY (username,item), > CONSTRAINT item_usage_1 FOREIGN KEY (username) > REFERENCES users(username) > ON DELETE CASCADE, > CONSTRAINT item_usage_2 FOREIGN KEY (item) > REFERENCES items(item) > ON DELETE CASCADE, > CONSTRAINT item_usage_3 CHECK (value BETWEEN 0 AND 4) > ); > CREATE TABLE tests ( > id INTEGER GENERATED ALWAYS AS IDENTITY, > item VARCHAR(15) NOT NULL, > title VARCHAR(255) NOT NULL, > disp SMALLINT NOT NULL DEFAULT 0, > starttime TIMESTAMP DEFAULT NULL, > endtime TIMESTAMP DEFAULT NULL, > offsetx INTEGER NOT NULL DEFAULT 0, > offsety INTEGER NOT NULL DEFAULT 0, > rate INTEGER NOT NULL DEFAULT 0, > duration INTEGER NOT NULL DEFAULT 0, > calibrate INTEGER NOT NULL DEFAULT 0, > deadline TIMESTAMP DEFAULT NULL, > stepsize INTEGER NOT NULL DEFAULT 0, > interval INTEGER NOT NULL DEFAULT 0, > stand CHAR(1) DEFAULT NULL, > hidden CHAR(1) DEFAULT NULL, > repeated CHAR(1) DEFAULT NULL, > private CHAR(1) DEFAULT NULL, > sequential CHAR(1) DEFAULT NULL, > final CHAR(1) DEFAULT NULL, > notes CLOB DEFAULT NULL, > testxml CLOB NOT NULL, > author VARCHAR(15) NOT NULL, > time TIMESTAMP NOT NULL, > CONSTRAINT tests_pk PRIMARY KEY (id), > CONSTRAINT tests_1 UNIQUE (item, title), > CONSTRAINT tests_2 FOREIGN KEY (item) > REFERENCES items(item) > ON DELETE CASCADE, > CONSTRAINT tests_3 CHECK (disp BETWEEN 0 AND 100), > CONSTRAINT tests_4 CHECK (rate BETWEEN 0 AND 100), > CONSTRAINT tests_5 CHECK (stepsize BETWEEN 0 AND 100) > ); > If I run the query manually I get this, as expected: > ID ITEM TITLE > 37 60001 Test 1 > 42 60001 Test 2 > 51 60001 Test 3 > 17 61303 Test 2a > 16 61303 Test 2b > 7 7205731 Test 2a > 8 7205731 Test 2b > Now, this is actually part of a web app that should turn this into a list of options in a