Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 62077 invoked from network); 14 Nov 2005 16:13:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Nov 2005 16:13:09 -0000 Received: (qmail 12737 invoked by uid 500); 14 Nov 2005 16:13:08 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 12718 invoked by uid 500); 14 Nov 2005 16:13:08 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 12707 invoked by uid 99); 14 Nov 2005 16:13:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2005 08:13:08 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of kh@araxis.com designates 195.10.109.13 as permitted sender) Received: from [195.10.109.13] (HELO smtp.araxis.com) (195.10.109.13) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 14 Nov 2005 08:12:57 -0800 Received: (qmail 88943 invoked from network); 14 Nov 2005 16:12:42 -0000 Received: from unknown (HELO ?127.0.0.1?) (10.8.0.134) by smtp.araxis.com with SMTP; 14 Nov 2005 16:12:42 -0000 Message-ID: <4378B778.6010001@araxis.com> Date: Mon, 14 Nov 2005 16:12:40 +0000 From: Kevin Hore User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Derby Discussion Subject: Re: Poor query optimizer choices is making Derby unusable for large tables References: <4374B1E7.2050506@araxis.com> <437519F0.9090107@sun.com> In-Reply-To: <437519F0.9090107@sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi Rick, Thanks for your suggested re-write, but I'm really looking for a single query. Unfortunately, we have quite a number of queries affected by this and it would be significant work to re-written them all in this way. In addition, some are considerably more complex than the example I gave, and not so amenable to a relatively straightforward rewrite along the lines you suggest. Thanks again though, I appreciate the suggestion. Kind regards, Kevin Hore Rick Hillegas wrote: > Hi Kevin, > > You might also try using a temporary table to split your scan up into a > series of optimzable queries. Regards-Rick > > declare global temporary table session.accumulator > ( > ObjectId int NOT NULL, > WordLocation int NOT NULL > ) > not logged; > > insert into session.accumulator > SELECT ObjectId, WordLocation > FROM tblSearchDictionary > WHERE Word = 'CONTACT' > ; > insert into session.accumulator > SELECT ObjectId, WordLocation > FROM tblSearchDictionary > WHERE Word = 'ADD' > ; > > SELECT ObjectId, SUM(WordLocation) AS Score > FROM session.accumulator > GROUP BY ObjectId; > > > Mamta Satoor wrote: > >> Hi Kevin, >> >> I haven't investigated Derby-47 to know how to fix the problem but I >> do have an optimizer overrides patch waiting for review on the derby >> developer list which will let user specify their own optimizer >> properties to help the optimizer pick a specific plan. The JIRA entry >> for optimizer overrides is Derby-573. The patch is for Derby 10.2. >> >> These optimizer properties can be handy in a case like yours where >> until the optimizer is fixed, a user can tell the optimizer what plan >> to use. Just an FYI as I know it doesn't help with your particular >> case since there is no optimizer overrides support for 10.1. >> >> Mamta >> >> On 11/11/05, *Kevin Hore* > wrote: >> >> The Derby query optimizer (this is with 10.1.1.0 >> ) decides to perform an >> expensive table scan in certain circumstances, when it could make >> use of >> available indexes. The resulting poor performance is rendering Derby >> useless for our application. >> >> I believe that this behaviour might be related to DERBY-47 >> ( http://issues.apache.org/jira/browse/DERBY-47), which was opened in >> October 2004. However, this seems such a severe and fundamental >> problem >> that I'm surprised firstly that a significant portion of the Derby >> user >> base isn't affected by this and, secondly, that DERBY-47 hasn't been >> considered a critical defect to be fixed with the utmost urgency. >> >> I've described the problem in detail below, and I'd appreciate any >> assistance. Specifically: >> >> i) Does anyone have any plans to fix this problem? >> >> ii) In the meantime, are there any work-arounds? I'd appreciate any >> suggestions that would decrease the execution time of my second query >> below (the one with with two search terms). Likewise, any general >> strategies for avoiding this problem with IN clauses would be >> appreciated. >> >> >> ----PROBLEM DESCRIPTION---- >> >> Consider the table: >> >> CREATE TABLE tblSearchDictionary >> ( >> ObjectId int NOT NULL, >> ObjectType int NOT NULL, >> Word VARCHAR(64) NOT NULL, >> WordLocation int NOT NULL, >> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation) >> ); >> >> This table has an index on each of the four columns, it also has the >> unique index across all four columns as defined above: >> >> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary >> (ObjectId); >> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary >> (ObjectType); >> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word); >> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary >> (WordLocation); >> >> The table contains about 260,000 rows. >> >> The following query selects all rows that match instances of >> string in >> the Word column. It sums the WordLocation column having grouped by >> the >> ObjectId column. >> >> SELECT ObjectId, SUM(WordLocation) AS Score >> FROM tblSearchDictionary >> WHERE Word = 'CONTACT' >> GROUP BY ObjectId; >> >> On my machine this will usually complete in an acceptable time of >> around >> 200ms. >> >> Now consider the following query which adds a second search term >> on the >> same column. >> >> SELECT ObjectId, SUM(WordLocation) AS Score >> FROM tblSearchDictionary >> WHERE Word = 'CONTACT' OR Word = 'ADD' >> GROUP BY ObjectId; >> >> This second query usually takes around 10000ms on my machine. My >> understanding from the Derby optimizer docs and DERBY-47 is that >> this is >> because Derby is re-writing the query along the following lines, and >> then choosing to do a table scan: >> >> SELECT ObjectId, SUM(WordLocation) AS Score >> FROM tblSearchDictionary >> WHERE >> Word IN ('CONTACT', 'ADD') >> AND Word >= 'ADD' >> AND Word <= 'CONTACT' >> GROUP BY ObjectId; >> >> The plan for the first query indicates that the >> tblSearchDictionaryWord >> index is used to perform an index scan. However, the plan for the >> second >> query indicates that the majority of the additional time is taken >> performing a table scan over the entire table, instead of making >> use of >> the indexes available. Our application uses IN quite frequently, >> so this >> optimizer behaviour would seem to present a significant problem. >> >> ---QUERY PLAN FOR FIRST QUERY---- >> >> Statement Name: >> null >> Statement Text: >> SELECT >> ObjectId, >> SUM(WordLocation) AS Score >> FROM tblSearchDictionary >> WHERE >> Word = 'CONTACT' >> GROUP BY >> ObjectId >> >> Parse Time: 0 >> Bind Time: 0 >> Optimize Time: 16 >> Generate Time: 0 >> Compile Time: 16 >> Execute Time: 0 >> Begin Compilation Timestamp : 2005-11-11 12:28:52.765 >> End Compilation Timestamp : 2005-11-11 12:28: 52.781 >> Begin Execution Timestamp : 2005-11-11 13:08:15.406 >> End Execution Timestamp : 2005-11-11 13:08:15.406 >> Statement Execution Plan Text: >> Project-Restrict ResultSet (5): >> Number of opens = 1 >> Rows seen = 93 >> Rows filtered = 0 >> restriction = false >> projection = true >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 0 >> close time (milliseconds) = 0 >> restriction time (milliseconds) = 0 >> projection time (milliseconds) = 0 >> optimizer estimated row count: 1.00 >> optimizer estimated cost: 226.00 >> >> Source result set: >> Grouped Aggregate ResultSet: >> Number of opens = 1 >> Rows input = 113 >> Has distinct aggregate = false >> In sorted order = false >> Sort information: >> Number of rows input=113 >> Number of rows output=93 >> Sort type=internal >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 0 >> close time (milliseconds) = 0 >> optimizer estimated row count: 1.00 >> optimizer estimated cost: 226.00 >> >> Source result set: >> Project-Restrict ResultSet (4): >> Number of opens = 1 >> Rows seen = 113 >> Rows filtered = 0 >> restriction = false >> projection = true >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 0 >> close time (milliseconds) = 0 >> restriction time (milliseconds) = 0 >> projection time (milliseconds) = 0 >> optimizer estimated row count: 118.00 >> optimizer estimated cost: 226.00 >> >> Source result set: >> Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY: >> Number of opens = 1 >> Rows seen = 113 >> Columns accessed from heap = {0, 3} >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 0 >> close time (milliseconds) = 0 >> optimizer estimated row count: 118.00 >> optimizer estimated cost: 226.00 >> >> Index Scan ResultSet for TBLSEARCHDICTIONARY using >> index >> TBLSEARCHDICTIONARYWORD at read committed isolation level using share >> row locking chosen by the optimizer >> Number of opens = 1 >> Rows seen = 113 >> Rows filtered = 0 >> Fetch Size = 1 >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 0 >> close time (milliseconds) = 0 >> next time in milliseconds/row = 0 >> >> scan information: >> Bit set of columns fetched=All >> Number of columns fetched=2 >> Number of deleted rows visited=0 >> Number of pages visited=4 >> Number of rows qualified=113 >> Number of rows visited=114 >> Scan type=btree >> Tree height=3 >> start position: >> >= on first 1 column(s). >> Ordered null semantics on the following columns: >> 0 >> stop position: >> > on first 1 column(s). >> Ordered null semantics on the following columns: >> 0 >> qualifiers: >> None >> optimizer estimated row count: 118.00 >> optimizer estimated cost: 226.00 >> >> >> ---QUERY PLAN FOR SECOND QUERY---- >> >> Statement Name: >> null >> Statement Text: >> SELECT >> ObjectId, >> SUM(WordLocation) AS Score >> FROM tblSearchDictionary >> WHERE >> Word = 'CONTACT' OR Word = 'ADD' >> GROUP BY >> ObjectId >> >> Parse Time: 0 >> Bind Time: 0 >> Optimize Time: 0 >> Generate Time: 15 >> Compile Time: 15 >> Execute Time: 4250 >> Begin Compilation Timestamp : 2005-11-11 13:16:17.578 >> End Compilation Timestamp : 2005-11-11 13:16: 17.593 >> Begin Execution Timestamp : 2005-11-11 13:16:17.593 >> End Execution Timestamp : 2005-11-11 13:16:27.437 >> Statement Execution Plan Text: >> Project-Restrict ResultSet (5): >> Number of opens = 1 >> Rows seen = 100 >> Rows filtered = 0 >> restriction = false >> projection = true >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 4250 >> next time (milliseconds) = 0 >> close time (milliseconds) = 0 >> restriction time (milliseconds) = 0 >> projection time (milliseconds) = 0 >> optimizer estimated row count: 1.00 >> optimizer estimated cost: 82959.49 >> >> Source result set: >> Grouped Aggregate ResultSet: >> Number of opens = 1 >> Rows input = 712 >> Has distinct aggregate = false >> In sorted order = false >> Sort information: >> Number of rows input=712 >> Number of rows output=593 >> Sort type=internal >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 4250 >> next time (milliseconds) = 0 >> close time (milliseconds) = 0 >> optimizer estimated row count: 1.00 >> optimizer estimated cost: 82959.49 >> >> Source result set: >> Project-Restrict ResultSet (4): >> Number of opens = 1 >> Rows seen = 712 >> Rows filtered = 0 >> restriction = false >> projection = true >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 4219 >> close time (milliseconds) = 15 >> restriction time (milliseconds) = 0 >> projection time (milliseconds) = 0 >> optimizer estimated row count: 19200.45 >> optimizer estimated cost: 82959.49 >> >> Source result set: >> Project-Restrict ResultSet (3): >> Number of opens = 1 >> Rows seen = 40806 >> Rows filtered = 40094 >> restriction = true >> projection = false >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 4219 >> close time (milliseconds) = 15 >> restriction time (milliseconds) = 124 >> projection time (milliseconds) = 0 >> optimizer estimated row count: 19200.45 >> optimizer estimated cost: 82959.49 >> >> Source result set: >> Table Scan ResultSet for TBLSEARCHDICTIONARY at read >> committed >> isolation level using share row locking chosen by the optimizer >> Number of opens = 1 >> Rows seen = 40806 >> Rows filtered = 0 >> Fetch Size = 1 >> constructor time (milliseconds) = 0 >> open time (milliseconds) = 0 >> next time (milliseconds) = 4001 >> close time (milliseconds) = 15 >> next time in milliseconds/row = 0 >> >> scan information: >> Bit set of columns fetched={0, 2, 3} >> Number of columns fetched=3 >> Number of pages visited=2978 >> Number of rows qualified=40806 >> Number of rows visited=256001 >> Scan type=heap >> start position: >> null stop position: >> null qualifiers: >> Column[0][0] Id: 2 >> Operator: < >> Ordered nulls: false >> Unknown return value: true >> Negate comparison result: true >> Column[0][1] Id: 2 >> Operator: <= >> Ordered nulls: false >> Unknown return value: false >> Negate comparison result: false >> >> optimizer estimated row count: 19200.45 >> optimizer estimated cost: 82959.49 >> >> ---------- >> >> Thanks in advance for any help! >> >> Kind regards, >> >> >> Kevin Hore >> >> >