db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Hore ...@araxis.com>
Subject Poor query optimizer choices is making Derby unusable for large tables
Date Fri, 11 Nov 2005 14:59:51 GMT
The Derby query optimizer (this is with 10.1.1.0) decides to perform an 
expensive table scan in certain circumstances, when it could make use of 
available indexes. The resulting poor performance is rendering Derby 
useless for our application.

I believe that this behaviour might be related to DERBY-47 
(http://issues.apache.org/jira/browse/DERBY-47), which was opened in 
October 2004. However, this seems such a severe and fundamental problem 
that I'm surprised firstly that a significant portion of the Derby user 
base isn't affected by this and, secondly, that DERBY-47 hasn't been 
considered a critical defect to be fixed with the utmost urgency.

I've described the problem in detail below, and I'd appreciate any 
assistance. Specifically:

i) Does anyone have any plans to fix this problem?

ii) In the meantime, are there any work-arounds? I’d appreciate any 
suggestions that would decrease the execution time of my second query 
below (the one with with two search terms). Likewise, any general 
strategies for avoiding this problem with IN clauses would be appreciated.


----PROBLEM DESCRIPTION----

Consider the table:

CREATE TABLE tblSearchDictionary
(
ObjectId int NOT NULL,
ObjectType int NOT NULL,
Word VARCHAR(64) NOT NULL,
WordLocation int NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
);

This table has an index on each of the four columns, it also has the
unique index across all four columns as defined above:

CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
(ObjectType);
CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
(WordLocation);

The table contains about 260,000 rows.

The following query selects all rows that match instances of string in
the Word column. It sums the WordLocation column having grouped by the
ObjectId column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
GROUP BY ObjectId;

On my machine this will usually complete in an acceptable time of around 
200ms.

Now consider the following query which adds a second search term on the 
same column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;

This second query usually takes around 10000ms on my machine. My 
understanding from the Derby optimizer docs and DERBY-47 is that this is 
because Derby is re-writing the query along the following lines, and 
then choosing to do a table scan:

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
   Word IN ('CONTACT', 'ADD')
   AND Word >= 'ADD'
   AND Word <= 'CONTACT'
GROUP BY ObjectId;

The plan for the first query indicates that the tblSearchDictionaryWord
index is used to perform an index scan. However, the plan for the second
query indicates that the majority of the additional time is taken
performing a table scan over the entire table, instead of making use of
the indexes available. Our application uses IN quite frequently, so this 
optimizer behaviour would seem to present a significant problem.

---QUERY PLAN FOR FIRST QUERY----

Statement Name:
     null
Statement Text:
     SELECT
     ObjectId,
     SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
         Word = 'CONTACT'
GROUP BY
     ObjectId

Parse Time: 0
Bind Time: 0
Optimize Time: 16
Generate Time: 0
Compile Time: 16
Execute Time: 0
Begin Compilation Timestamp : 2005-11-11 12:28:52.765
End Compilation Timestamp : 2005-11-11 12:28:52.781
Begin Execution Timestamp : 2005-11-11 13:08:15.406
End Execution Timestamp : 2005-11-11 13:08:15.406
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 93
Rows filtered = 0
restriction = false
projection = true
     constructor time (milliseconds) = 0
     open time (milliseconds) = 0
     next time (milliseconds) = 0
     close time (milliseconds) = 0
     restriction time (milliseconds) = 0
     projection time (milliseconds) = 0
     optimizer estimated row count:            1.00
     optimizer estimated cost:          226.00

Source result set:
     Grouped Aggregate ResultSet:
     Number of opens = 1
     Rows input = 113
     Has distinct aggregate = false
     In sorted order = false
     Sort information:
         Number of rows input=113
         Number of rows output=93
         Sort type=internal
         constructor time (milliseconds) = 0
         open time (milliseconds) = 0
         next time (milliseconds) = 0
         close time (milliseconds) = 0
         optimizer estimated row count:            1.00
         optimizer estimated cost:          226.00

     Source result set:
         Project-Restrict ResultSet (4):
         Number of opens = 1
         Rows seen = 113
         Rows filtered = 0
         restriction = false
         projection = true
             constructor time (milliseconds) = 0
             open time (milliseconds) = 0
             next time (milliseconds) = 0
             close time (milliseconds) = 0
             restriction time (milliseconds) = 0
             projection time (milliseconds) = 0
             optimizer estimated row count:          118.00
             optimizer estimated cost:          226.00

         Source result set:
             Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
             Number of opens = 1
             Rows seen = 113
             Columns accessed from heap = {0, 3}
                 constructor time (milliseconds) = 0
                 open time (milliseconds) = 0
                 next time (milliseconds) = 0
                 close time (milliseconds) = 0
                 optimizer estimated row count:          118.00
                 optimizer estimated cost:          226.00

                 Index Scan ResultSet for TBLSEARCHDICTIONARY using index
TBLSEARCHDICTIONARYWORD at read committed isolation level using share
row locking chosen by the optimizer
                 Number of opens = 1
                 Rows seen = 113
                 Rows filtered = 0
                 Fetch Size = 1
                     constructor time (milliseconds) = 0
                     open time (milliseconds) = 0
                     next time (milliseconds) = 0
                     close time (milliseconds) = 0
                     next time in milliseconds/row = 0

                 scan information:
                     Bit set of columns fetched=All
                     Number of columns fetched=2
                     Number of deleted rows visited=0
                     Number of pages visited=4
                     Number of rows qualified=113
                     Number of rows visited=114
                     Scan type=btree
                     Tree height=3
                     start position:
     >= on first 1 column(s).
     Ordered null semantics on the following columns:
0
                     stop position:
     > on first 1 column(s).
     Ordered null semantics on the following columns:
0
                     qualifiers:
None
                     optimizer estimated row count:          118.00
                     optimizer estimated cost:          226.00


---QUERY PLAN FOR SECOND QUERY----

Statement Name:
     null
Statement Text:
     SELECT
     ObjectId,
     SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
         Word = 'CONTACT' OR  Word = 'ADD'
GROUP BY
     ObjectId

Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 15
Compile Time: 15
Execute Time: 4250
Begin Compilation Timestamp : 2005-11-11 13:16:17.578
End Compilation Timestamp : 2005-11-11 13:16:17.593
Begin Execution Timestamp : 2005-11-11 13:16:17.593
End Execution Timestamp : 2005-11-11 13:16:27.437
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 100
Rows filtered = 0
restriction = false
projection = true
     constructor time (milliseconds) = 0
     open time (milliseconds) = 4250
     next time (milliseconds) = 0
     close time (milliseconds) = 0
     restriction time (milliseconds) = 0
     projection time (milliseconds) = 0
     optimizer estimated row count:            1.00
     optimizer estimated cost:        82959.49

Source result set:
     Grouped Aggregate ResultSet:
     Number of opens = 1
     Rows input = 712
     Has distinct aggregate = false
     In sorted order = false
     Sort information:
         Number of rows input=712
         Number of rows output=593
         Sort type=internal
         constructor time (milliseconds) = 0
         open time (milliseconds) = 4250
         next time (milliseconds) = 0
         close time (milliseconds) = 0
         optimizer estimated row count:            1.00
         optimizer estimated cost:        82959.49

     Source result set:
         Project-Restrict ResultSet (4):
         Number of opens = 1
         Rows seen = 712
         Rows filtered = 0
         restriction = false
         projection = true
             constructor time (milliseconds) = 0
             open time (milliseconds) = 0
             next time (milliseconds) = 4219
             close time (milliseconds) = 15
             restriction time (milliseconds) = 0
             projection time (milliseconds) = 0
             optimizer estimated row count:        19200.45
             optimizer estimated cost:        82959.49

         Source result set:
             Project-Restrict ResultSet (3):
             Number of opens = 1
             Rows seen = 40806
             Rows filtered = 40094
             restriction = true
             projection = false
                 constructor time (milliseconds) = 0
                 open time (milliseconds) = 0
                 next time (milliseconds) = 4219
                 close time (milliseconds) = 15
                 restriction time (milliseconds) = 124
                 projection time (milliseconds) = 0
                 optimizer estimated row count:        19200.45
                 optimizer estimated cost:        82959.49

             Source result set:
                 Table Scan ResultSet for TBLSEARCHDICTIONARY at read 
committed
isolation level using share row locking chosen by the optimizer
                 Number of opens = 1
                 Rows seen = 40806
                 Rows filtered = 0
                 Fetch Size = 1
                     constructor time (milliseconds) = 0
                     open time (milliseconds) = 0
                     next time (milliseconds) = 4001
                     close time (milliseconds) = 15
                     next time in milliseconds/row = 0

                 scan information:
                     Bit set of columns fetched={0, 2, 3}
                     Number of columns fetched=3
                     Number of pages visited=2978
                     Number of rows qualified=40806
                     Number of rows visited=256001
                     Scan type=heap
                     start position:
null                    stop position:
null                    qualifiers:
Column[0][0] Id: 2
Operator: <
Ordered nulls: false
Unknown return value: true
Negate comparison result: true
Column[0][1] Id: 2
Operator: <=
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

                     optimizer estimated row count:        19200.45
                     optimizer estimated cost:        82959.49

----------

Thanks in advance for any help!

Kind regards,


Kevin Hore


Mime
View raw message