Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Derby Discussion" <derby-user@db.apache.org>
Received-SPF: pass (asf.osuosl.org: domain of kh@araxis.com designates
 195.10.109.13 as permitted sender)
Message-ID: <4378B778.6010001@araxis.com>
Date: Mon, 14 Nov 2005 16:12:40 +0000
From: Kevin Hore <kh@araxis.com>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: Derby Discussion <derby-user@db.apache.org>
Subject: Re: Poor query optimizer choices is making Derby unusable for large
 tables
References: <4374B1E7.2050506@araxis.com>
 <d9619e4a0511110833v5eec1382ia77f3852c94256f6@mail.gmail.com>
 <437519F0.9090107@sun.com>
In-Reply-To: <437519F0.9090107@sun.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi Rick,

Thanks for your suggested re-write, but I'm really looking for a single
query. Unfortunately, we have quite a number of queries affected by this
and it would be significant work to re-written them all in this way. In
addition, some are considerably more complex than the example I gave,
and not so amenable to a relatively straightforward rewrite along the
lines you suggest.

Thanks again though, I appreciate the suggestion.

Kind regards,

Kevin Hore

Rick Hillegas wrote:
> Hi Kevin,
> 
> You might also try using a temporary table to split your scan up into a 
> series of optimzable queries. Regards-Rick
> 
> declare global temporary table session.accumulator
> (
>   ObjectId int NOT NULL,
>   WordLocation int NOT NULL
> )
> not logged;
> 
> insert into session.accumulator
>    SELECT ObjectId, WordLocation
>    FROM tblSearchDictionary
>    WHERE Word = 'CONTACT'
> ;
> insert into session.accumulator
>    SELECT ObjectId, WordLocation
>    FROM tblSearchDictionary
>    WHERE Word = 'ADD'
> ;
> 
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM session.accumulator
> GROUP BY ObjectId;
> 
> 
> Mamta Satoor wrote:
> 
>> Hi Kevin,
>>  
>> I haven't investigated Derby-47 to know how to fix the problem but I 
>> do have an optimizer overrides patch waiting for review on the derby 
>> developer list which will let user specify their own optimizer 
>> properties to help the optimizer pick a specific plan. The JIRA entry 
>> for optimizer overrides is Derby-573. The patch is for Derby 10.2.
>>  
>> These optimizer properties can be handy in a case like yours where 
>> until the optimizer is fixed, a user can tell the optimizer what plan 
>> to use. Just an FYI as I know it doesn't help with your particular 
>> case since there is no optimizer overrides support for 10.1.
>>  
>> Mamta
>>  
>> On 11/11/05, *Kevin Hore* <kh@araxis.com <mailto:kh@araxis.com>> wrote:
>>
>>     The Derby query optimizer (this is with 10.1.1.0
>>     <http://10.1.1.0>) decides to perform an
>>     expensive table scan in certain circumstances, when it could make
>>     use of
>>     available indexes. The resulting poor performance is rendering Derby
>>     useless for our application.
>>
>>     I believe that this behaviour might be related to DERBY-47
>>     ( http://issues.apache.org/jira/browse/DERBY-47), which was opened in
>>     October 2004. However, this seems such a severe and fundamental
>>     problem
>>     that I'm surprised firstly that a significant portion of the Derby
>>     user
>>     base isn't affected by this and, secondly, that DERBY-47 hasn't been
>>     considered a critical defect to be fixed with the utmost urgency.
>>
>>     I've described the problem in detail below, and I'd appreciate any
>>     assistance. Specifically:
>>
>>     i) Does anyone have any plans to fix this problem?
>>
>>     ii) In the meantime, are there any work-arounds? I'd appreciate any
>>     suggestions that would decrease the execution time of my second query
>>     below (the one with with two search terms). Likewise, any general
>>     strategies for avoiding this problem with IN clauses would be
>>     appreciated.
>>
>>
>>     ----PROBLEM DESCRIPTION----
>>
>>     Consider the table:
>>
>>     CREATE TABLE tblSearchDictionary
>>     (
>>     ObjectId int NOT NULL,
>>     ObjectType int NOT NULL,
>>     Word VARCHAR(64) NOT NULL,
>>     WordLocation int NOT NULL,
>>     CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>>     );
>>
>>     This table has an index on each of the four columns, it also has the
>>     unique index across all four columns as defined above:
>>
>>     CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
>>     (ObjectId);
>>     CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>>     (ObjectType);
>>     CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>>     CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>>     (WordLocation);
>>
>>     The table contains about 260,000 rows.
>>
>>     The following query selects all rows that match instances of 
>> string in
>>     the Word column. It sums the WordLocation column having grouped by
>>     the
>>     ObjectId column.
>>
>>     SELECT ObjectId, SUM(WordLocation) AS Score
>>     FROM tblSearchDictionary
>>     WHERE Word = 'CONTACT'
>>     GROUP BY ObjectId;
>>
>>     On my machine this will usually complete in an acceptable time of
>>     around
>>     200ms.
>>
>>     Now consider the following query which adds a second search term
>>     on the
>>     same column.
>>
>>     SELECT ObjectId, SUM(WordLocation) AS Score
>>     FROM tblSearchDictionary
>>     WHERE Word = 'CONTACT' OR Word = 'ADD'
>>     GROUP BY ObjectId;
>>
>>     This second query usually takes around 10000ms on my machine. My
>>     understanding from the Derby optimizer docs and DERBY-47 is that
>>     this is
>>     because Derby is re-writing the query along the following lines, and
>>     then choosing to do a table scan:
>>
>>     SELECT ObjectId, SUM(WordLocation) AS Score
>>     FROM tblSearchDictionary
>>     WHERE
>>       Word IN ('CONTACT', 'ADD')
>>       AND Word >= 'ADD'
>>       AND Word <= 'CONTACT'
>>     GROUP BY ObjectId;
>>
>>     The plan for the first query indicates that the
>>     tblSearchDictionaryWord
>>     index is used to perform an index scan. However, the plan for the
>>     second
>>     query indicates that the majority of the additional time is taken
>>     performing a table scan over the entire table, instead of making
>>     use of
>>     the indexes available. Our application uses IN quite frequently,
>>     so this
>>     optimizer behaviour would seem to present a significant problem.
>>
>>     ---QUERY PLAN FOR FIRST QUERY----
>>
>>     Statement Name:
>>         null
>>     Statement Text:
>>         SELECT
>>         ObjectId,
>>         SUM(WordLocation) AS Score
>>     FROM tblSearchDictionary
>>     WHERE
>>             Word = 'CONTACT'
>>     GROUP BY
>>         ObjectId
>>
>>     Parse Time: 0
>>     Bind Time: 0
>>     Optimize Time: 16
>>     Generate Time: 0
>>     Compile Time: 16
>>     Execute Time: 0
>>     Begin Compilation Timestamp : 2005-11-11 12:28:52.765
>>     End Compilation Timestamp : 2005-11-11 12:28: 52.781
>>     Begin Execution Timestamp : 2005-11-11 13:08:15.406
>>     End Execution Timestamp : 2005-11-11 13:08:15.406
>>     Statement Execution Plan Text:
>>     Project-Restrict ResultSet (5):
>>     Number of opens = 1
>>     Rows seen = 93
>>     Rows filtered = 0
>>     restriction = false
>>     projection = true
>>         constructor time (milliseconds) = 0
>>         open time (milliseconds) = 0
>>         next time (milliseconds) = 0
>>         close time (milliseconds) = 0
>>         restriction time (milliseconds) = 0
>>         projection time (milliseconds) = 0
>>         optimizer estimated row count:            1.00
>>         optimizer estimated cost:          226.00
>>
>>     Source result set:
>>         Grouped Aggregate ResultSet:
>>         Number of opens = 1
>>         Rows input = 113
>>         Has distinct aggregate = false
>>         In sorted order = false
>>         Sort information:
>>             Number of rows input=113
>>             Number of rows output=93
>>             Sort type=internal
>>             constructor time (milliseconds) = 0
>>             open time (milliseconds) = 0
>>             next time (milliseconds) = 0
>>             close time (milliseconds) = 0
>>             optimizer estimated row count:             1.00
>>             optimizer estimated cost:          226.00
>>
>>         Source result set:
>>             Project-Restrict ResultSet (4):
>>             Number of opens = 1
>>             Rows seen = 113
>>             Rows filtered = 0
>>             restriction = false
>>             projection = true
>>                 constructor time (milliseconds) = 0
>>                 open time (milliseconds) = 0
>>                 next time (milliseconds) = 0
>>                 close time (milliseconds) = 0
>>                 restriction time (milliseconds) = 0
>>                 projection time (milliseconds) = 0
>>                 optimizer estimated row count:          118.00
>>                 optimizer estimated cost:          226.00
>>
>>             Source result set:
>>                 Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>>                 Number of opens = 1
>>                 Rows seen = 113
>>                 Columns accessed from heap = {0, 3}
>>                     constructor time (milliseconds) = 0
>>                     open time (milliseconds) = 0
>>                     next time (milliseconds) = 0
>>                     close time (milliseconds) = 0
>>                     optimizer estimated row count:           118.00
>>                     optimizer estimated cost:          226.00
>>
>>                     Index Scan ResultSet for TBLSEARCHDICTIONARY using
>>     index
>>     TBLSEARCHDICTIONARYWORD at read committed isolation level using share
>>     row locking chosen by the optimizer
>>                     Number of opens = 1
>>                     Rows seen = 113
>>                     Rows filtered = 0
>>                     Fetch Size = 1
>>                         constructor time (milliseconds) = 0
>>                         open time (milliseconds) = 0
>>                         next time (milliseconds) = 0
>>                         close time (milliseconds) = 0
>>                         next time in milliseconds/row = 0
>>
>>                     scan information:
>>                         Bit set of columns fetched=All
>>                         Number of columns fetched=2
>>                         Number of deleted rows visited=0
>>                         Number of pages visited=4
>>                         Number of rows qualified=113
>>                         Number of rows visited=114
>>                         Scan type=btree
>>                         Tree height=3
>>                         start position:
>>         >= on first 1 column(s).
>>         Ordered null semantics on the following columns:
>>     0
>>                         stop position:
>>         > on first 1 column(s).
>>         Ordered null semantics on the following columns:
>>     0
>>                         qualifiers:
>>     None
>>                         optimizer estimated row count:          118.00
>>                         optimizer estimated cost:          226.00
>>
>>
>>     ---QUERY PLAN FOR SECOND QUERY----
>>
>>     Statement Name:
>>         null
>>     Statement Text:
>>         SELECT
>>         ObjectId,
>>         SUM(WordLocation) AS Score
>>     FROM tblSearchDictionary
>>     WHERE
>>             Word = 'CONTACT' OR  Word = 'ADD'
>>     GROUP BY
>>         ObjectId
>>
>>     Parse Time: 0
>>     Bind Time: 0
>>     Optimize Time: 0
>>     Generate Time: 15
>>     Compile Time: 15
>>     Execute Time: 4250
>>     Begin Compilation Timestamp : 2005-11-11 13:16:17.578
>>     End Compilation Timestamp : 2005-11-11 13:16: 17.593
>>     Begin Execution Timestamp : 2005-11-11 13:16:17.593
>>     End Execution Timestamp : 2005-11-11 13:16:27.437
>>     Statement Execution Plan Text:
>>     Project-Restrict ResultSet (5):
>>     Number of opens = 1
>>     Rows seen = 100
>>     Rows filtered = 0
>>     restriction = false
>>     projection = true
>>         constructor time (milliseconds) = 0
>>         open time (milliseconds) = 4250
>>         next time (milliseconds) = 0
>>         close time (milliseconds) = 0
>>         restriction time (milliseconds) = 0
>>         projection time (milliseconds) = 0
>>         optimizer estimated row count:            1.00
>>         optimizer estimated cost:        82959.49
>>
>>     Source result set:
>>         Grouped Aggregate ResultSet:
>>         Number of opens = 1
>>         Rows input = 712
>>         Has distinct aggregate = false
>>         In sorted order = false
>>         Sort information:
>>             Number of rows input=712
>>             Number of rows output=593
>>             Sort type=internal
>>             constructor time (milliseconds) = 0
>>             open time (milliseconds) = 4250
>>             next time (milliseconds) = 0
>>             close time (milliseconds) = 0
>>             optimizer estimated row count:             1.00
>>             optimizer estimated cost:        82959.49
>>
>>         Source result set:
>>             Project-Restrict ResultSet (4):
>>             Number of opens = 1
>>             Rows seen = 712
>>             Rows filtered = 0
>>             restriction = false
>>             projection = true
>>                 constructor time (milliseconds) = 0
>>                 open time (milliseconds) = 0
>>                 next time (milliseconds) = 4219
>>                 close time (milliseconds) = 15
>>                 restriction time (milliseconds) = 0
>>                 projection time (milliseconds) = 0
>>                 optimizer estimated row count:        19200.45
>>                 optimizer estimated cost:        82959.49
>>
>>             Source result set:
>>                 Project-Restrict ResultSet (3):
>>                 Number of opens = 1
>>                 Rows seen = 40806
>>                 Rows filtered = 40094
>>                 restriction = true
>>                 projection = false
>>                     constructor time (milliseconds) = 0
>>                     open time (milliseconds) = 0
>>                     next time (milliseconds) = 4219
>>                     close time (milliseconds) = 15
>>                     restriction time (milliseconds) = 124
>>                     projection time (milliseconds) = 0
>>                     optimizer estimated row count:        19200.45
>>                     optimizer estimated cost:        82959.49
>>
>>                 Source result set:
>>                     Table Scan ResultSet for TBLSEARCHDICTIONARY at read
>>     committed
>>     isolation level using share row locking chosen by the optimizer
>>                     Number of opens = 1
>>                     Rows seen = 40806
>>                     Rows filtered = 0
>>                     Fetch Size = 1
>>                         constructor time (milliseconds) = 0
>>                         open time (milliseconds) = 0
>>                         next time (milliseconds) = 4001
>>                         close time (milliseconds) = 15
>>                         next time in milliseconds/row = 0
>>
>>                     scan information:
>>                         Bit set of columns fetched={0, 2, 3}
>>                         Number of columns fetched=3
>>                         Number of pages visited=2978
>>                         Number of rows qualified=40806
>>                         Number of rows visited=256001
>>                         Scan type=heap
>>                         start position:
>>     null                    stop position:
>>     null                    qualifiers:
>>     Column[0][0] Id: 2
>>     Operator: <
>>     Ordered nulls: false
>>     Unknown return value: true
>>     Negate comparison result: true
>>     Column[0][1] Id: 2
>>     Operator: <=
>>     Ordered nulls: false
>>     Unknown return value: false
>>     Negate comparison result: false
>>
>>                         optimizer estimated row count:        19200.45
>>                         optimizer estimated cost:         82959.49
>>
>>     ----------
>>
>>     Thanks in advance for any help!
>>
>>     Kind regards,
>>
>>
>>     Kevin Hore
>>
>>
>