db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mamta Satoor <msat...@gmail.com>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Fri, 11 Nov 2005 16:33:54 GMT
Hi Kevin,
 I haven't investigated Derby-47 to know how to fix the problem but I do
have an optimizer overrides patch waiting for review on the derby developer
list which will let user specify their own optimizer properties to help the
optimizer pick a specific plan. The JIRA entry for optimizer overrides is
Derby-573. The patch is for Derby 10.2.
 These optimizer properties can be handy in a case like yours where until
the optimizer is fixed, a user can tell the optimizer what plan to use. Just
an FYI as I know it doesn't help with your particular case since there is no
optimizer overrides support for 10.1.
 Mamta
 On 11/11/05, Kevin Hore <kh@araxis.com> wrote:
>
> The Derby query optimizer (this is with 10.1.1.0 <http://10.1.1.0>)
> decides to perform an
> expensive table scan in certain circumstances, when it could make use of
> available indexes. The resulting poor performance is rendering Derby
> useless for our application.
>
> I believe that this behaviour might be related to DERBY-47
> (http://issues.apache.org/jira/browse/DERBY-47), which was opened in
> October 2004. However, this seems such a severe and fundamental problem
> that I'm surprised firstly that a significant portion of the Derby user
> base isn't affected by this and, secondly, that DERBY-47 hasn't been
> considered a critical defect to be fixed with the utmost urgency.
>
> I've described the problem in detail below, and I'd appreciate any
> assistance. Specifically:
>
> i) Does anyone have any plans to fix this problem?
>
> ii) In the meantime, are there any work-arounds? I'd appreciate any
> suggestions that would decrease the execution time of my second query
> below (the one with with two search terms). Likewise, any general
> strategies for avoiding this problem with IN clauses would be appreciated.
>
>
> ----PROBLEM DESCRIPTION----
>
> Consider the table:
>
> CREATE TABLE tblSearchDictionary
> (
> ObjectId int NOT NULL,
> ObjectType int NOT NULL,
> Word VARCHAR(64) NOT NULL,
> WordLocation int NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
> );
>
> This table has an index on each of the four columns, it also has the
> unique index across all four columns as defined above:
>
> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
> (ObjectId);
> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
> (ObjectType);
> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
> (WordLocation);
>
> The table contains about 260,000 rows.
>
> The following query selects all rows that match instances of string in
> the Word column. It sums the WordLocation column having grouped by the
> ObjectId column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> GROUP BY ObjectId;
>
> On my machine this will usually complete in an acceptable time of around
> 200ms.
>
> Now consider the following query which adds a second search term on the
> same column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;
>
> This second query usually takes around 10000ms on my machine. My
> understanding from the Derby optimizer docs and DERBY-47 is that this is
> because Derby is re-writing the query along the following lines, and
> then choosing to do a table scan:
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word IN ('CONTACT', 'ADD')
> AND Word >= 'ADD'
> AND Word <= 'CONTACT'
> GROUP BY ObjectId;
>
> The plan for the first query indicates that the tblSearchDictionaryWord
> index is used to perform an index scan. However, the plan for the second
> query indicates that the majority of the additional time is taken
> performing a table scan over the entire table, instead of making use of
> the indexes available. Our application uses IN quite frequently, so this
> optimizer behaviour would seem to present a significant problem.
>
> ---QUERY PLAN FOR FIRST QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 16
> Generate Time: 0
> Compile Time: 16
> Execute Time: 0
> Begin Compilation Timestamp : 2005-11-11 12:28:52.765
> End Compilation Timestamp : 2005-11-11 12:28:52.781
> Begin Execution Timestamp : 2005-11-11 13:08:15.406
> End Execution Timestamp : 2005-11-11 13:08:15.406
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 93
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 113
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=113
> Number of rows output=93
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
> Number of opens = 1
> Rows seen = 113
> Columns accessed from heap = {0, 3}
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Index Scan ResultSet for TBLSEARCHDICTIONARY using index
> TBLSEARCHDICTIONARYWORD at read committed isolation level using share
> row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=2
> Number of deleted rows visited=0
> Number of pages visited=4
> Number of rows qualified=113
> Number of rows visited=114
> Scan type=btree
> Tree height=3
> start position:
> >= on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> stop position:
> > on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> qualifiers:
> None
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
>
> ---QUERY PLAN FOR SECOND QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 0
> Generate Time: 15
> Compile Time: 15
> Execute Time: 4250
> Begin Compilation Timestamp : 2005-11-11 13:16:17.578
> End Compilation Timestamp : 2005-11-11 13:16:17.593
> Begin Execution Timestamp : 2005-11-11 13:16:17.593
> End Execution Timestamp : 2005-11-11 13:16:27.437
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 100
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 712
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=712
> Number of rows output=593
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 712
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 40094
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 124
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Table Scan ResultSet for TBLSEARCHDICTIONARY at read
> committed
> isolation level using share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4001
> close time (milliseconds) = 15
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched={0, 2, 3}
> Number of columns fetched=3
> Number of pages visited=2978
> Number of rows qualified=40806
> Number of rows visited=256001
> Scan type=heap
> start position:
> null stop position:
> null qualifiers:
> Column[0][0] Id: 2
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> Column[0][1] Id: 2
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
>
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> ----------
>
> Thanks in advance for any help!
>
> Kind regards,
>
>
> Kevin Hore
>
>

Mime
View raw message