Return-Path: X-Original-To: apmail-db-derby-dev-archive@www.apache.org Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5CA39107E4 for ; Mon, 6 May 2013 16:10:17 +0000 (UTC) Received: (qmail 4414 invoked by uid 500); 6 May 2013 16:10:17 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 4403 invoked by uid 500); 6 May 2013 16:10:17 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 4395 invoked by uid 99); 6 May 2013 16:10:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 16:10:17 +0000 Date: Mon, 6 May 2013 16:10:17 +0000 (UTC) From: "Tony Brusseau (JIRA)" To: derby-dev@db.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DERBY-6045) in list multi-probe by primary key not chosen on tables with >256 rows MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DERBY-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649821#comment-13649821 ] Tony Brusseau commented on DERBY-6045: -------------------------------------- OK. I'll do some addtional testing and try to create a reproducible test case. It may be a couple of days till I can get to this though. > in list multi-probe by primary key not chosen on tables with >256 rows > ---------------------------------------------------------------------- > > Key: DERBY-6045 > URL: https://issues.apache.org/jira/browse/DERBY-6045 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.9.1.0, 10.10.1.1 > Environment: Linux Debian 6.0.5 > Reporter: Tony Brusseau > Fix For: 10.11.0.0 > > Attachments: DERBY6045_patch1_diff.txt, DERBY6045_patch2_diff.txt, optimizerTraceOutputFor10_8.txt, optimizerTraceOutputForTrunk.txt, optimizerTraceOutputWithUpdateStatsFor10_8.txt, optimizerTraceOutputWithUpdateStatsForTrunk.txt > > > I have a table with a long integer primary key field and 11 million rows. I seem to be unable to load large chunks of rows via id in a reasonably efficient manner. > 1. If I do individual lookups via the primary key, then a fast indexed lookup occurs. However, if I do large numbers of such queries, then the time is overwhelmed by round-trip overhead which makes everything incredibly slow. > 2. If I use a single query with a disjunction of the primary keys of interest, then a table scan is performed (even if the clause only contains 1-3 items), which walks over 11 million rows...incredibly inefficient. > 3. If I use an IN clause, then a table scan is performed (even if the clause only contains 1-3 items), which walks over 11 million rows...incredibly inefficient. > I'm guessing that this might have something to do with the fact that I'm using large integers and really big numbers that don't start anywhere at or about 1 for my keys. Could this possibly be confusing the optimizer? > Here are the unlimited query plans for the 3 cases that I enumerated: > ********************************************************************************************* > [EL Fine]: 2013-01-17 11:09:53.384--ServerSession(582235416)--Connection(1430986883)--Thread(Thread["Initial Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID = ?) > bind => [2251799814033500] > Thu Jan 17 11:09:53 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread Group] (XID = 4711079), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID = ?) ******* Project-Restrict ResultSet (3): > Number of opens = 1 > Rows seen = 1 > Rows filtered = 0 > restriction = false > projection = true > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1.00 > optimizer estimated cost: 6.59 > Source result set: > Index Row to Base Row ResultSet for FORMULA_TERM: > Number of opens = 1 > Rows seen = 1 > Columns accessed from heap = {1, 2, 3, 4, 5, 6, 7, 8} > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > optimizer estimated row count: 1.00 > optimizer estimated cost: 6.59 > Index Scan ResultSet for FORMULA_TERM using constraint KB_FORMULA_TERM_TERM_ID_PK at read committed isolation level using share row locking chosen by the optimizer > Number of opens = 1 > Rows seen = 1 > Rows filtered = 0 > Fetch Size = 1 > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > next time in milliseconds/row = 0 > scan information: > Bit set of columns fetched=All > Number of columns fetched=2 > Number of deleted rows visited=0 > Number of pages visited=3 > Number of rows qualified=1 > Number of rows visited=1 > Scan type=btree > Tree height=-1 > start position: > >= on first 1 column(s). > Ordered null semantics on the following columns: > stop position: > > on first 1 column(s). > Ordered null semantics on the following columns: > qualifiers: > None > optimizer estimated row count: 1.00 > optimizer estimated cost: 6.59 > [EL Fine]: 2013-01-17 11:01:00.732--ServerSession(1237006689)--Connection(927179828)--Thread(Thread["Initial Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?)) > bind => [2251799814033500, 2251799814033501, 2251799814033499] > Thu Jan 17 11:01:10 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread Group] (XID = 4711078), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?)) ******* Project-Restrict ResultSet (3): > Number of opens = 1 > Rows seen = 3 > Rows filtered = 0 > restriction = false > projection = true > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Project-Restrict ResultSet (2): > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 11767295 > restriction = true > projection = false > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Table Scan ResultSet for FORMULA_TERM at read committed isolation level using instantaneous share row locking chosen by the optimizer > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 0 > Fetch Size = 16 > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > next time in milliseconds/row = 0 > scan information: > Bit set of columns fetched=All > Number of columns fetched=9 > Number of pages visited=34358 > Number of rows qualified=11767298 > Number of rows visited=11767298 > Scan type=heap > start position: > null > stop position: > null > qualifiers: > None > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > [EL Fine]: 2013-01-17 11:27:00.627--ServerSession(1237006689)--Connection(1688096771)--Thread(Thread["Initial Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?)) > bind => [2251799814033500, 2251799814033501, 2251799814033499] > Thu Jan 17 11:47:26 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread Group] (XID = 4711080), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?)) ******* Project-Restrict ResultSet (3): > Number of opens = 1 > Rows seen = 3 > Rows filtered = 0 > restriction = false > projection = true > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Project-Restrict ResultSet (2): > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 11767295 > restriction = true > projection = false > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > restriction time (milliseconds) = 0 > projection time (milliseconds) = 0 > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 > Source result set: > Table Scan ResultSet for FORMULA_TERM at read committed isolation level using instantaneous share row locking chosen by the optimizer > Number of opens = 1 > Rows seen = 11767298 > Rows filtered = 0 > Fetch Size = 16 > constructor time (milliseconds) = 0 > open time (milliseconds) = 0 > next time (milliseconds) = 0 > close time (milliseconds) = 0 > next time in milliseconds/row = 0 > scan information: > Bit set of columns fetched=All > Number of columns fetched=9 > Number of pages visited=34358 > Number of rows qualified=11767298 > Number of rows visited=11767298 > Scan type=heap > start position: > null > stop position: > null > qualifiers: > None > optimizer estimated row count: 1176730.30 > optimizer estimated cost: 5931065.54 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira