db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Hore ...@araxis.com>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Mon, 14 Nov 2005 16:13:06 GMT
Hi Mamta,

Thank you for your suggestion, but this is for a production system and
so, as you suspected, we don't really want to use an alpha version of Derby.

Kind regards,

Kevin Hore

Mamta Satoor wrote:
> Hi Kevin,
>  I haven't investigated Derby-47 to know how to fix the problem but I do
> have an optimizer overrides patch waiting for review on the derby developer
> list which will let user specify their own optimizer properties to help the
> optimizer pick a specific plan. The JIRA entry for optimizer overrides is
> Derby-573. The patch is for Derby 10.2.
>  These optimizer properties can be handy in a case like yours where until
> the optimizer is fixed, a user can tell the optimizer what plan to use. Just
> an FYI as I know it doesn't help with your particular case since there is no
> optimizer overrides support for 10.1.
>  Mamta
>  On 11/11/05, Kevin Hore <kh@araxis.com> wrote:
> 
>>The Derby query optimizer (this is with 10.1.1.0 <http://10.1.1.0>)
>>decides to perform an
>>expensive table scan in certain circumstances, when it could make use of
>>available indexes. The resulting poor performance is rendering Derby
>>useless for our application.
>>
>>I believe that this behaviour might be related to DERBY-47
>>(http://issues.apache.org/jira/browse/DERBY-47), which was opened in
>>October 2004. However, this seems such a severe and fundamental problem
>>that I'm surprised firstly that a significant portion of the Derby user
>>base isn't affected by this and, secondly, that DERBY-47 hasn't been
>>considered a critical defect to be fixed with the utmost urgency.
>>
>>I've described the problem in detail below, and I'd appreciate any
>>assistance. Specifically:
>>
>>i) Does anyone have any plans to fix this problem?
>>
>>ii) In the meantime, are there any work-arounds? I'd appreciate any
>>suggestions that would decrease the execution time of my second query
>>below (the one with with two search terms). Likewise, any general
>>strategies for avoiding this problem with IN clauses would be appreciated.
>>
>>
>>----PROBLEM DESCRIPTION----
>>
>>Consider the table:
>>
>>CREATE TABLE tblSearchDictionary
>>(
>>ObjectId int NOT NULL,
>>ObjectType int NOT NULL,
>>Word VARCHAR(64) NOT NULL,
>>WordLocation int NOT NULL,
>>CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>>);
>>
>>This table has an index on each of the four columns, it also has the
>>unique index across all four columns as defined above:
>>
>>CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
>>(ObjectId);
>>CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>>(ObjectType);
>>CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>>CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>>(WordLocation);
>>
>>The table contains about 260,000 rows.
>>
>>The following query selects all rows that match instances of string in
>>the Word column. It sums the WordLocation column having grouped by the
>>ObjectId column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT'
>>GROUP BY ObjectId;
>>
>>On my machine this will usually complete in an acceptable time of around
>>200ms.
>>
>>Now consider the following query which adds a second search term on the
>>same column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY ObjectId;
>>
>>This second query usually takes around 10000ms on my machine. My
>>understanding from the Derby optimizer docs and DERBY-47 is that this is
>>because Derby is re-writing the query along the following lines, and
>>then choosing to do a table scan:
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>Word IN ('CONTACT', 'ADD')
>>AND Word >= 'ADD'
>>AND Word <= 'CONTACT'
>>GROUP BY ObjectId;
>>
>>The plan for the first query indicates that the tblSearchDictionaryWord
>>index is used to perform an index scan. However, the plan for the second
>>query indicates that the majority of the additional time is taken
>>performing a table scan over the entire table, instead of making use of
>>the indexes available. Our application uses IN quite frequently, so this
>>optimizer behaviour would seem to present a significant problem.
>>
>>---QUERY PLAN FOR FIRST QUERY----
>>
>>Statement Name:
>>null
>>Statement Text:
>>SELECT
>>ObjectId,
>>SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>Word = 'CONTACT'
>>GROUP BY
>>ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 16
>>Generate Time: 0
>>Compile Time: 16
>>Execute Time: 0
>>Begin Compilation Timestamp : 2005-11-11 12:28:52.765
>>End Compilation Timestamp : 2005-11-11 12:28:52.781
>>Begin Execution Timestamp : 2005-11-11 13:08:15.406
>>End Execution Timestamp : 2005-11-11 13:08:15.406
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 93
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 226.00
>>
>>Source result set:
>>Grouped Aggregate ResultSet:
>>Number of opens = 1
>>Rows input = 113
>>Has distinct aggregate = false
>>In sorted order = false
>>Sort information:
>>Number of rows input=113
>>Number of rows output=93
>>Sort type=internal
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 226.00
>>
>>Source result set:
>>Project-Restrict ResultSet (4):
>>Number of opens = 1
>>Rows seen = 113
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 118.00
>>optimizer estimated cost: 226.00
>>
>>Source result set:
>>Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>>Number of opens = 1
>>Rows seen = 113
>>Columns accessed from heap = {0, 3}
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>optimizer estimated row count: 118.00
>>optimizer estimated cost: 226.00
>>
>>Index Scan ResultSet for TBLSEARCHDICTIONARY using index
>>TBLSEARCHDICTIONARYWORD at read committed isolation level using share
>>row locking chosen by the optimizer
>>Number of opens = 1
>>Rows seen = 113
>>Rows filtered = 0
>>Fetch Size = 1
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>next time in milliseconds/row = 0
>>
>>scan information:
>>Bit set of columns fetched=All
>>Number of columns fetched=2
>>Number of deleted rows visited=0
>>Number of pages visited=4
>>Number of rows qualified=113
>>Number of rows visited=114
>>Scan type=btree
>>Tree height=3
>>start position:
>>
>>>= on first 1 column(s).
>>
>>Ordered null semantics on the following columns:
>>0
>>stop position:
>>
>>>on first 1 column(s).
>>
>>Ordered null semantics on the following columns:
>>0
>>qualifiers:
>>None
>>optimizer estimated row count: 118.00
>>optimizer estimated cost: 226.00
>>
>>
>>---QUERY PLAN FOR SECOND QUERY----
>>
>>Statement Name:
>>null
>>Statement Text:
>>SELECT
>>ObjectId,
>>SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY
>>ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 0
>>Generate Time: 15
>>Compile Time: 15
>>Execute Time: 4250
>>Begin Compilation Timestamp : 2005-11-11 13:16:17.578
>>End Compilation Timestamp : 2005-11-11 13:16:17.593
>>Begin Execution Timestamp : 2005-11-11 13:16:17.593
>>End Execution Timestamp : 2005-11-11 13:16:27.437
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 100
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 4250
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Grouped Aggregate ResultSet:
>>Number of opens = 1
>>Rows input = 712
>>Has distinct aggregate = false
>>In sorted order = false
>>Sort information:
>>Number of rows input=712
>>Number of rows output=593
>>Sort type=internal
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 4250
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Project-Restrict ResultSet (4):
>>Number of opens = 1
>>Rows seen = 712
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 4219
>>close time (milliseconds) = 15
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 19200.45
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Project-Restrict ResultSet (3):
>>Number of opens = 1
>>Rows seen = 40806
>>Rows filtered = 40094
>>restriction = true
>>projection = false
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 4219
>>close time (milliseconds) = 15
>>restriction time (milliseconds) = 124
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 19200.45
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Table Scan ResultSet for TBLSEARCHDICTIONARY at read
>>committed
>>isolation level using share row locking chosen by the optimizer
>>Number of opens = 1
>>Rows seen = 40806
>>Rows filtered = 0
>>Fetch Size = 1
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 4001
>>close time (milliseconds) = 15
>>next time in milliseconds/row = 0
>>
>>scan information:
>>Bit set of columns fetched={0, 2, 3}
>>Number of columns fetched=3
>>Number of pages visited=2978
>>Number of rows qualified=40806
>>Number of rows visited=256001
>>Scan type=heap
>>start position:
>>null stop position:
>>null qualifiers:
>>Column[0][0] Id: 2
>>Operator: <
>>Ordered nulls: false
>>Unknown return value: true
>>Negate comparison result: true
>>Column[0][1] Id: 2
>>Operator: <=
>>Ordered nulls: false
>>Unknown return value: false
>>Negate comparison result: false
>>
>>optimizer estimated row count: 19200.45
>>optimizer estimated cost: 82959.49
>>
>>----------
>>
>>Thanks in advance for any help!
>>
>>Kind regards,
>>
>>
>>Kevin Hore
>>
>>
> 
> 





Mime
View raw message