Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 94218 invoked from network); 14 Apr 2008 17:58:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Apr 2008 17:58:14 -0000 Received: (qmail 9312 invoked by uid 500); 14 Apr 2008 17:58:09 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 9284 invoked by uid 500); 14 Apr 2008 17:58:09 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 9270 invoked by uid 99); 14 Apr 2008 17:58:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2008 10:58:09 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2008 17:57:25 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1093D234C0CC for ; Mon, 14 Apr 2008 10:55:05 -0700 (PDT) Message-ID: <1786682183.1208195705066.JavaMail.jira@brutus> Date: Mon, 14 Apr 2008 10:55:05 -0700 (PDT) From: "A B (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected In-Reply-To: <1214723879.1207620925545.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588661#action_12588661 ] A B commented on DERBY-3603: ---------------------------- Thank you very much for picking this one up, Bryan. I greatly appreciate it. On the subqueryFlattening.diff: To see why the diff occurs, I stepped through the following exercise. I did not actually trace through the code, but I walked through what I *think* should be happening and it seems to have explained the diff... At the time of the statements in question the data in the DOCS and COLLS tables is as follows: DOCS: ID (VARCHAR) => ('124'), ('24'), ('25'), ('27'), ('36'), ('567') COLLS: ID (VARCHAR), COLLID (SMALLINT) => ('123', 2), ('124', -5), ('24', 1), ('26', -2), ('36', 1), ('37', 8) Now assume the query in question is: SELECT count(ID) FROM DOCS WHERE ID <> ANY (SELECT ID FROM COLLS WHERE COLLID IN (-2,1)) The ANY will be flattened giving us something to the effect of: SELECT count(ID) FROM DOCS, COLLS WHERE DOCS.ID <> COLLS.ID AND COLLS.COLLID IN (-2, 1) The IN list for the queries that we're talking about is (-2, 1). So per the explanation I posted on Friday, we'd end up with something like the following withOUT the change. Note: Lines preceded by "+" indicate that we read a row from the inner table, COLLS. Lines preced by "*' indicate that what we do for the line changes with the proposed patch for this issue. Lines preceded by an "x" indicate that we "filter" a row from COLLS, meaning we read it but then we discard it. - Get DOCS row ('124'). - Get first IN list value, -2. + Get row from COLLS where COLLID is -2, i.e. ('26', -2). - See if ID column of ('26', -2) is NOT EQUAL to '124'. They are not equal so count the '124' row from DOCS and quit the scan of COLLS. - Get DOCS row ('24'). * Get next IN list value, 1. + Get row from COLLS where COLLID is 1, i.e. ('24', 1). x See if ID column of ('24', 1) is NOT EQUAL to '24'. They are EQUAL so we skip the row ('24', 1) and continuing our scan of COLLS. + Get next row from COLLS where COLLID is 1, i.e. ('36', 1). - See if ID column of ('36', 1) is NOT EQUAL to '24'. They are not equal so count the '24' row from DOCS and quit the scan of COLLS. - Get DOCS row ('25'). - Implicitly RESET the probe state, then get the "next" IN list value, which will be -2. + Get row from COLLS where COLLID is -2, i.e. ('26', -2). - See if ID column of ('26', -2) is NOT EQUAL to '25'. They are not equal so count the '25' row from DOCS and quit the scan of COLLS. - Get DOCS row ('27'). * Get next IN list value, 1. + Get row from COLLS where COLLID is 1, i.e. ('24', 1). - See if ID column of ('24', 1) is NOT EQUAL to '27'. They are not equal so count the '27' row from DOCS and quit the scan of COLLS. - Get DOCS row ('36'). - Implicitly RESET the probe state, then get the "next" IN list value, which will be -2. + Get row from COLLS where COLLID is -2, i.e. ('26', -2). - See if ID column of ('26', -2) is NOT EQUAL to '36'. They are not equal so count the '36' row from DOCS and quit the scan of COLLS. - Get DOCS row ('567'). * Get next IN list value, 1. + Get row from COLLS where COLLID is 1, i.e. ('24', 1). - See if ID column of ('24', 1) is NOT EQUAL to '567'. They are not equal so count the '567' row from DOCS and quit the scan of COLLS. - No more rows from DOCS, so done. We can see from this that we get a row from COLLS a total of 7 times. In the process we "filtered" one row out, i.e. the row ('24', 1), because that row has the same ID has a row in DOCS. Or put in terms of the scan statistics, we have 7 "rows seen" and one (1) row filtered. And the query returns a count total of 6 rows from DOCS. But now if we do the same exercise with the patch applied, we see the following: - Get DOCS row ('124'). - Get first IN list value, -2. + Get row from COLLS where COLLID is -2, i.e. ('26', -2). - See if ID column of ('26', -2) is NOT EQUAL to '124'. They are not equal so count the '124' row from DOCS and quit the scan of COLLS. - Get DOCS row ('24'). * RESET probe state, then get "next" IN list value, which is -2 (again). + Get row from COLLS where COLLID is -2, i.e. ('26', 1). - See if ID column of ('26', -2) is NOT EQUAL to '24'. They are not equal so count the '24' row from DOCS and quit the scan of COLLS. - Get DOCS row ('25'). * RESET probe state, then get "next" IN list value, which is -2 (again). + Get row from COLLS where COLLID is -2, i.e. ('26', 1). - See if ID column of ('26', -2) is NOT EQUAL to '25'. They are not equal so count the '25' row from DOCS and quit the scan of COLLS. - Get DOCS row ('27'). * RESET probe state, then get "next" IN list value, which is -2 (again). + Get row from COLLS where COLLID is -2, i.e. ('26', 1). - See if ID column of ('26', -2) is NOT EQUAL to '27'. They are not equal so count the '27' row from DOCS and quit the scan of COLLS. - Get DOCS row ('36'). * RESET probe state, then get "next" IN list value, which is -2 (again). + Get row from COLLS where COLLID is -2, i.e. ('26', 1). - See if ID column of ('26', -2) is NOT EQUAL to '36'. They are not equal so count the '36' row from DOCS and quit the scan of COLLS. - Get DOCS row ('567'). * RESET probe state, then get "next" IN list value, which is -2 (again). + Get row from COLLS where COLLID is -2, i.e. ('26', 1). - See if ID column of ('26', -2) is NOT EQUAL to '567'. They are not equal so count the '567' row from DOCS and quit the scan of COLLS. - No more rows from DOCS, so done. Now the total number of rows we "see" from COLLS is 6 instead of 7. This is because we never fetched the row ('24', 1) from the table. Instead, we only had to fetch the one row, ('26', -2), over and over. Then, since we never fetched row ('24', 1) from the table we didn't have to filter it out. So we end up with a total of 6 "rows seen" and 0 rows filtered. Meanwhile the query still returns a count total of 6 rows from DOCS. I didn't run through this exercise for the other cases, but I imagine (hope!) things would work out in a similar fashion...Does that seem like a reasonable explanation to you? If so, then I think it's okay to update the master file for subqueryFlattening.sql for this issue. Oh, and thanks for writing the standalone repro, as well! > 'IN' clause ignores valid results, incorrect qualifier handling suspected > ------------------------------------------------------------------------- > > Key: DERBY-3603 > URL: https://issues.apache.org/jira/browse/DERBY-3603 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.3.2.1, 10.4.1.1 > Reporter: David Butterworth > Assignee: Bryan Pendleton > Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff > > > Derbys' 'IN' clause is returning different results depending on which side of a joined table > I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1. > This behaviour was also confirmed by Bryan Pendleton in this thread: > http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e > Using the test database attatched the following 2 queries produce the issue: > ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit, > spike.bookings booking > WHERE booking.child_id = 2 AND > admin_unit.admin_unit_id IN (1,21) AND > booking.booking_date_time_out >= 20080331000000 AND > booking.booking_date_time_in <= 20080406235900 AND > account.account_id = booking.account_id AND > admin_unit.admin_unit_id = account.admin_unit_id; > 1 > ----------- > 2 > 1 row selected > ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit, > spike.bookings booking > WHERE booking.child_id = 2 AND > account.admin_unit_id IN (1,21) AND > booking.booking_date_time_out >= 20080331000000 AND > booking.booking_date_time_in <= 20080406235900 AND > account.account_id = booking.account_id AND > admin_unit.admin_unit_id = account.admin_unit_id; > 1 > ----------- > 3 > 1 row selected > ij> > The only difference between the 2 statements is which side of a join the 'IN' clause is matched against. > Bryan performed some initial testing and stated the following: > --------------------- SNIP ------------------------ > Interestingly, although the actual results do NOT contain any values > for admin_unit_id = 21, if I change the query to: > admin_unit.admin_unit_id IN (1) > or > account.admin_unit_id IN (1) > then the problem disappears -- I get 3 rows for both queries. > I also ran query plans for both the queries (in the IN (1,21) case) > and have pasted the (simplified) query plans at the end of this message. > I notice that in the case where the query gives 2 rows, which is > when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id > index scan output in the query plan contains: > qualifiers: > Column[0][0] Id: 0 > Operator: = > Ordered nulls: false > Unknown return value: false > Negate comparison result: false > However, in the case where the query gives 3 rows, which is > when we specify account.admin_unit_id in (1,21), the admin_unit_id > index scan output in the query plan contains: > qualifiers: > None > I think it is the presence/absence of this qualifier on the query > scan which is causing the different results in the query, as in > the first case we see: > Number of rows qualified=2 > Number of rows visited=3 > but in the second case we see: > Number of rows qualified=3 > Number of rows visited=3 > I definitely don't have any explanation for why you are getting > this odd behavior; it certainly seems like a bug to me. > -------------END SNIP ----------------------- -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.