db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satheesh Bandaram <sathe...@Sourcery.Org>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Fri, 11 Nov 2005 20:50:06 GMT
Hi Kevin,

Kevin Hore wrote:

> i) Does anyone have any plans to fix this problem?

Can you file an enhancement request for this? I think Derby could
improve it's handling of OR/IN clauses. Many databases don't optimize OR
clauses as much as possible, though some do better than others. It would
be great if Derby could internally process this as two different scans
(one for 'CONTACT' and another for 'ADD') and then combine the results.
Some databases can do this. However, the workaround suggested by Jeff L.
does this, though you have to rewrite the query.

Satheesh

> ii) In the meantime, are there any work-arounds? I’d appreciate any
> suggestions that would decrease the execution time of my second query
> below (the one with with two search terms). Likewise, any general
> strategies for avoiding this problem with IN clauses would be
> appreciated.
>
>
> ----PROBLEM DESCRIPTION----
>
> Consider the table:
>
> CREATE TABLE tblSearchDictionary
> (
> ObjectId int NOT NULL,
> ObjectType int NOT NULL,
> Word VARCHAR(64) NOT NULL,
> WordLocation int NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
> );
>
> This table has an index on each of the four columns, it also has the
> unique index across all four columns as defined above:
>
> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
> (ObjectId);
> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
> (ObjectType);
> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
> (WordLocation);
>
> The table contains about 260,000 rows.
>
> The following query selects all rows that match instances of string in
> the Word column. It sums the WordLocation column having grouped by the
> ObjectId column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> GROUP BY ObjectId;
>
> On my machine this will usually complete in an acceptable time of
> around 200ms.
>
> Now consider the following query which adds a second search term on
> the same column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;
>
> This second query usually takes around 10000ms on my machine. My
> understanding from the Derby optimizer docs and DERBY-47 is that this
> is because Derby is re-writing the query along the following lines,
> and then choosing to do a table scan:
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
>   Word IN ('CONTACT', 'ADD')
>   AND Word >= 'ADD'
>   AND Word <= 'CONTACT'
> GROUP BY ObjectId;
>
> The plan for the first query indicates that the tblSearchDictionaryWord
> index is used to perform an index scan. However, the plan for the second
> query indicates that the majority of the additional time is taken
> performing a table scan over the entire table, instead of making use of
> the indexes available. Our application uses IN quite frequently, so
> this optimizer behaviour would seem to present a significant problem.
>
> ---QUERY PLAN FOR FIRST QUERY----
>
> Statement Name:
>     null
> Statement Text:
>     SELECT
>     ObjectId,
>     SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
>         Word = 'CONTACT'
> GROUP BY
>     ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 16
> Generate Time: 0
> Compile Time: 16
> Execute Time: 0
> Begin Compilation Timestamp : 2005-11-11 12:28:52.765
> End Compilation Timestamp : 2005-11-11 12:28:52.781
> Begin Execution Timestamp : 2005-11-11 13:08:15.406
> End Execution Timestamp : 2005-11-11 13:08:15.406
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 93
> Rows filtered = 0
> restriction = false
> projection = true
>     constructor time (milliseconds) = 0
>     open time (milliseconds) = 0
>     next time (milliseconds) = 0
>     close time (milliseconds) = 0
>     restriction time (milliseconds) = 0
>     projection time (milliseconds) = 0
>     optimizer estimated row count:            1.00
>     optimizer estimated cost:          226.00
>
> Source result set:
>     Grouped Aggregate ResultSet:
>     Number of opens = 1
>     Rows input = 113
>     Has distinct aggregate = false
>     In sorted order = false
>     Sort information:
>         Number of rows input=113
>         Number of rows output=93
>         Sort type=internal
>         constructor time (milliseconds) = 0
>         open time (milliseconds) = 0
>         next time (milliseconds) = 0
>         close time (milliseconds) = 0
>         optimizer estimated row count:            1.00
>         optimizer estimated cost:          226.00
>
>     Source result set:
>         Project-Restrict ResultSet (4):
>         Number of opens = 1
>         Rows seen = 113
>         Rows filtered = 0
>         restriction = false
>         projection = true
>             constructor time (milliseconds) = 0
>             open time (milliseconds) = 0
>             next time (milliseconds) = 0
>             close time (milliseconds) = 0
>             restriction time (milliseconds) = 0
>             projection time (milliseconds) = 0
>             optimizer estimated row count:          118.00
>             optimizer estimated cost:          226.00
>
>         Source result set:
>             Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>             Number of opens = 1
>             Rows seen = 113
>             Columns accessed from heap = {0, 3}
>                 constructor time (milliseconds) = 0
>                 open time (milliseconds) = 0
>                 next time (milliseconds) = 0
>                 close time (milliseconds) = 0
>                 optimizer estimated row count:          118.00
>                 optimizer estimated cost:          226.00
>
>                 Index Scan ResultSet for TBLSEARCHDICTIONARY using index
> TBLSEARCHDICTIONARYWORD at read committed isolation level using share
> row locking chosen by the optimizer
>                 Number of opens = 1
>                 Rows seen = 113
>                 Rows filtered = 0
>                 Fetch Size = 1
>                     constructor time (milliseconds) = 0
>                     open time (milliseconds) = 0
>                     next time (milliseconds) = 0
>                     close time (milliseconds) = 0
>                     next time in milliseconds/row = 0
>
>                 scan information:
>                     Bit set of columns fetched=All
>                     Number of columns fetched=2
>                     Number of deleted rows visited=0
>                     Number of pages visited=4
>                     Number of rows qualified=113
>                     Number of rows visited=114
>                     Scan type=btree
>                     Tree height=3
>                     start position:
>     >= on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                     stop position:
>     > on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                     qualifiers:
> None
>                     optimizer estimated row count:          118.00
>                     optimizer estimated cost:          226.00
>
>
> ---QUERY PLAN FOR SECOND QUERY----
>
> Statement Name:
>     null
> Statement Text:
>     SELECT
>     ObjectId,
>     SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
>         Word = 'CONTACT' OR  Word = 'ADD'
> GROUP BY
>     ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 0
> Generate Time: 15
> Compile Time: 15
> Execute Time: 4250
> Begin Compilation Timestamp : 2005-11-11 13:16:17.578
> End Compilation Timestamp : 2005-11-11 13:16:17.593
> Begin Execution Timestamp : 2005-11-11 13:16:17.593
> End Execution Timestamp : 2005-11-11 13:16:27.437
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 100
> Rows filtered = 0
> restriction = false
> projection = true
>     constructor time (milliseconds) = 0
>     open time (milliseconds) = 4250
>     next time (milliseconds) = 0
>     close time (milliseconds) = 0
>     restriction time (milliseconds) = 0
>     projection time (milliseconds) = 0
>     optimizer estimated row count:            1.00
>     optimizer estimated cost:        82959.49
>
> Source result set:
>     Grouped Aggregate ResultSet:
>     Number of opens = 1
>     Rows input = 712
>     Has distinct aggregate = false
>     In sorted order = false
>     Sort information:
>         Number of rows input=712
>         Number of rows output=593
>         Sort type=internal
>         constructor time (milliseconds) = 0
>         open time (milliseconds) = 4250
>         next time (milliseconds) = 0
>         close time (milliseconds) = 0
>         optimizer estimated row count:            1.00
>         optimizer estimated cost:        82959.49
>
>     Source result set:
>         Project-Restrict ResultSet (4):
>         Number of opens = 1
>         Rows seen = 712
>         Rows filtered = 0
>         restriction = false
>         projection = true
>             constructor time (milliseconds) = 0
>             open time (milliseconds) = 0
>             next time (milliseconds) = 4219
>             close time (milliseconds) = 15
>             restriction time (milliseconds) = 0
>             projection time (milliseconds) = 0
>             optimizer estimated row count:        19200.45
>             optimizer estimated cost:        82959.49
>
>         Source result set:
>             Project-Restrict ResultSet (3):
>             Number of opens = 1
>             Rows seen = 40806
>             Rows filtered = 40094
>             restriction = true
>             projection = false
>                 constructor time (milliseconds) = 0
>                 open time (milliseconds) = 0
>                 next time (milliseconds) = 4219
>                 close time (milliseconds) = 15
>                 restriction time (milliseconds) = 124
>                 projection time (milliseconds) = 0
>                 optimizer estimated row count:        19200.45
>                 optimizer estimated cost:        82959.49
>
>             Source result set:
>                 Table Scan ResultSet for TBLSEARCHDICTIONARY at read
> committed
> isolation level using share row locking chosen by the optimizer
>                 Number of opens = 1
>                 Rows seen = 40806
>                 Rows filtered = 0
>                 Fetch Size = 1
>                     constructor time (milliseconds) = 0
>                     open time (milliseconds) = 0
>                     next time (milliseconds) = 4001
>                     close time (milliseconds) = 15
>                     next time in milliseconds/row = 0
>
>                 scan information:
>                     Bit set of columns fetched={0, 2, 3}
>                     Number of columns fetched=3
>                     Number of pages visited=2978
>                     Number of rows qualified=40806
>                     Number of rows visited=256001
>                     Scan type=heap
>                     start position:
> null                    stop position:
> null                    qualifiers:
> Column[0][0] Id: 2
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> Column[0][1] Id: 2
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
>
>                     optimizer estimated row count:        19200.45
>                     optimizer estimated cost:        82959.49
>
> ----------
>
> Thanks in advance for any help!
>
> Kind regards,
>
>
> Kevin Hore
>
>
>


Mime
View raw message