Return-Path: X-Original-To: apmail-db-derby-dev-archive@www.apache.org Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 28CB6D438 for ; Thu, 23 May 2013 20:23:21 +0000 (UTC) Received: (qmail 51713 invoked by uid 500); 23 May 2013 20:23:21 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 51689 invoked by uid 500); 23 May 2013 20:23:20 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 51681 invoked by uid 99); 23 May 2013 20:23:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 May 2013 20:23:20 +0000 Date: Thu, 23 May 2013 20:23:20 +0000 (UTC) From: "Tony Brusseau (JIRA)" To: derby-dev@db.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DERBY-6045) in list multi-probe by primary key not chosen on tables with >256 rows MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DERBY-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665612#comment-13665612 ] Tony Brusseau commented on DERBY-6045: -------------------------------------- I tried the SELECT version of the DELETE statement: SELECT * FROM kb.mt_gaf_top_level_term_counts WHERE (term = 1407374883554049) OR (pred = 1407374883554049) OR (mt = 1407374883554049) It looks like it is doing a table scan as well. Number of pages visited=67 Number of rows qualified=418 Number of rows visited=29498 Note: this was tried without the Mamta's patches for this bug so far. However, I am using table statics which would avoid this bug, so I'm pretty sure this is still a real issue. Here is the query plan: Thu May 23 15:17:22 CDT 2013 Thread[DRDAConnThread_3,5,main] (XID = 80859), (SESSIONID = 1), SELECT COUNT(*) FROM (SELECT * FROM kb.mt_gaf_top_level_term_counts WHERE (term = 1407374883554049) OR (pred = 1407374883554049) OR (mt = 1407374883554049)) C2668 ******* Project-Restrict ResultSet (4): Number of opens = 1 Rows seen = 1 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 1.00 optimizer estimated cost: 12365.71 Source result set: Scalar Aggregate ResultSet: Number of opens = 1 Rows input = 418 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 14751.50 optimizer estimated cost: 12365.71 Index Key Optimization = false Source result set: Project-Restrict ResultSet (3): Number of opens = 1 Rows seen = 418 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 14751.50 optimizer estimated cost: 12365.71 Source result set: Table Scan ResultSet for MT_GAF_TOP_LEVEL_TERM_COUNTS at read committed isolation level using share row locking chosen by the optimizer Number of opens = 1 Rows seen = 418 Rows filtered = 0 Fetch Size = 1 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched=All Number of columns fetched=6 Number of pages visited=67 Number of rows qualified=418 Number of rows visited=29498 Scan type=heap start position: null stop position: null qualifiers: Column[1][0] Id: 3 Operator: = Ordered nulls: false Unknown return value: false Negate comparison result: false Column[1][1] Id: 2 Operator: = Ordered nulls: false Unknown return value: false Negate comparison result: false Column[1][2] Id: 1 Operator: = Ordered nulls: false Unknown return value: false Negate comparison result: false optimizer estimated row count: 14751.50 optimizer estimated cost: 12365.71 > in list multi-probe by primary key not chosen on tables with >256 rows > ---------------------------------------------------------------------- > > Key: DERBY-6045 > URL: https://issues.apache.org/jira/browse/DERBY-6045 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.9.1.0, 10.10.1.1 > Environment: Linux Debian 6.0.5 > Reporter: Tony Brusseau > Assignee: Mamta A. Satoor > Fix For: 10.8.3.1, 10.9.2.2, 10.10.1.2, 10.11.0.0 > > Attachments: DERBY6045_patch1_diff.txt, DERBY6045_patch2_diff.txt, optimizerTraceOutputFor10_8.txt, optimizerTraceOutputForTrunk.txt, optimizerTraceOutputWithUpdateStatsFor10_8.txt, optimizerTraceOutputWithUpdateStatsForTrunk.txt, table.del > > > I have a table with a long integer primary key field and 11 million rows. I seem to be unable to load large chunks of rows via id in a reasonably efficient manner. > 1. If I do individual lookups via the primary key, then a fast indexed lookup occurs. However, if I do large numbers of such queries, then the time is overwhelmed by round-trip overhead which makes everything incredibly slow. > 2. If I use a single query with a disjunction of the primary keys of interest, then a table scan is performed (even if the clause only contains 1-3 items), which walks over 11 million rows...incredibly inefficient. > 3. If I use an IN clause, then a table scan is performed (even if the clause only contains 1-3 items), which walks over 11 million rows...incredibly inefficient. > I'm guessing that this might have something to do with the fact that I'm using large integers and really big numbers that don't start anywhere at or about 1 for my keys. Could this possibly be confusing the optimizer? > Here are the unlimited query plans for the 3 cases that I enumerated: > ********************************************************************************************* > [EL Fine]: 2013-01-17 11:09:53.384--ServerSession(582235416)--Connection(1430986883)--Thread(Thread["Initial Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID = ?) > bind => [2251799814033500] > Thu Jan 17 11:09:53 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread Group] (XID = 4711079), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID = ?) ******* Project-Restrict ResultSet (3): > Number of opens = 1 > Rows seen = 1 > Rows filtered = 0 > restriction = false > projection = true > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1.00 > optimizer estimated cost: 6.59 > Source result set: > Index Row to Base Row ResultSet for FORMULA_TERM: > Number of opens = 1 > Rows seen = 1 > Columns accessed from heap = {1, 2, 3, 4, 5, 6, 7, 8} > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > optimizer estimated row count: 1.00 > optimizer estimated cost: 6.59 > Index Scan ResultSet for FORMULA_TERM using constraint KB_FORMULA_TERM_TERM_ID_PK at read committed isolation level using share row locking chosen by the optimizer > Number of opens = 1 > Rows seen = 1 > Rows filtered = 0 > Fetch Size = 1 > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > next time in milliseconds/row = 0 > scan information: > Bit set of columns fetched=All > Number of columns fetched=2 > Number of deleted rows visited=0 > Number of pages visited=3 > Number of rows qualified=1 > Number of rows visited=1 > Scan type=btree > Tree height=-1 > start position: > >= on first 1 column(s). > Ordered null semantics on the following columns: > stop position: > > on first 1 column(s). > Ordered null semantics on the following columns: > qualifiers: > None > optimizer estimated row count: 1.00 > optimizer estimated cost: 6.59 > [EL Fine]: 2013-01-17 11:01:00.732--ServerSession(1237006689)--Connection(927179828)--Thread(Thread["Initial Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?)) > bind => [2251799814033500, 2251799814033501, 2251799814033499] > Thu Jan 17 11:01:10 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread Group] (XID = 4711078), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?)) ******* Project-Restrict ResultSet (3): > Number of opens = 1 > Rows seen = 3 > Rows filtered = 0 > restriction = false > projection = true > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Project-Restrict ResultSet (2): > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 11767295 > restriction = true > projection = false > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Table Scan ResultSet for FORMULA_TERM at read committed isolation level using instantaneous share row locking chosen by the optimizer > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 0 > Fetch Size = 16 > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > next time in milliseconds/row = 0 > scan information: > Bit set of columns fetched=All > Number of columns fetched=9 > Number of pages visited=34358 > Number of rows qualified=11767298 > Number of rows visited=11767298 > Scan type=heap > start position: > null > stop position: > null > qualifiers: > None > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > [EL Fine]: 2013-01-17 11:27:00.627--ServerSession(1237006689)--Connection(1688096771)--Thread(Thread["Initial Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?)) > bind => [2251799814033500, 2251799814033501, 2251799814033499] > Thu Jan 17 11:47:26 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread Group] (XID = 4711080), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?)) ******* Project-Restrict ResultSet (3): > Number of opens = 1 > Rows seen = 3 > Rows filtered = 0 > restriction = false > projection = true > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Project-Restrict ResultSet (2): > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 11767295 > restriction = true > projection = false > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Table Scan ResultSet for FORMULA_TERM at read committed isolation level using instantaneous share row locking chosen by the optimizer > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 0 > Fetch Size = 16 > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > next time in milliseconds/row = 0 > scan information: > Bit set of columns fetched=All > Number of columns fetched=9 > Number of pages visited=34358 > Number of rows qualified=11767298 > Number of rows visited=11767298 > Scan type=heap > start position: > null > stop position: > null > qualifiers: > None > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira