db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Fri, 11 Nov 2005 22:23:44 GMT
Hi Kevin,

You might also try using a temporary table to split your scan up into a 
series of optimzable queries. Regards-Rick

declare global temporary table session.accumulator
(
   ObjectId int NOT NULL,
   WordLocation int NOT NULL
)
not logged;

insert into session.accumulator
    SELECT ObjectId, WordLocation
    FROM tblSearchDictionary
    WHERE Word = 'CONTACT'
;
insert into session.accumulator
    SELECT ObjectId, WordLocation
    FROM tblSearchDictionary
    WHERE Word = 'ADD'
;

SELECT ObjectId, SUM(WordLocation) AS Score
FROM session.accumulator
GROUP BY ObjectId;


Mamta Satoor wrote:

> Hi Kevin,
>  
> I haven't investigated Derby-47 to know how to fix the problem but I 
> do have an optimizer overrides patch waiting for review on the derby 
> developer list which will let user specify their own optimizer 
> properties to help the optimizer pick a specific plan. The JIRA entry 
> for optimizer overrides is Derby-573. The patch is for Derby 10.2.
>  
> These optimizer properties can be handy in a case like yours where 
> until the optimizer is fixed, a user can tell the optimizer what plan 
> to use. Just an FYI as I know it doesn't help with your particular 
> case since there is no optimizer overrides support for 10.1.
>  
> Mamta
>  
> On 11/11/05, *Kevin Hore* <kh@araxis.com <mailto:kh@araxis.com>> wrote:
>
>     The Derby query optimizer (this is with 10.1.1.0
>     <http://10.1.1.0>) decides to perform an
>     expensive table scan in certain circumstances, when it could make
>     use of
>     available indexes. The resulting poor performance is rendering Derby
>     useless for our application.
>
>     I believe that this behaviour might be related to DERBY-47
>     ( http://issues.apache.org/jira/browse/DERBY-47), which was opened in
>     October 2004. However, this seems such a severe and fundamental
>     problem
>     that I'm surprised firstly that a significant portion of the Derby
>     user
>     base isn't affected by this and, secondly, that DERBY-47 hasn't been
>     considered a critical defect to be fixed with the utmost urgency.
>
>     I've described the problem in detail below, and I'd appreciate any
>     assistance. Specifically:
>
>     i) Does anyone have any plans to fix this problem?
>
>     ii) In the meantime, are there any work-arounds? I'd appreciate any
>     suggestions that would decrease the execution time of my second query
>     below (the one with with two search terms). Likewise, any general
>     strategies for avoiding this problem with IN clauses would be
>     appreciated.
>
>
>     ----PROBLEM DESCRIPTION----
>
>     Consider the table:
>
>     CREATE TABLE tblSearchDictionary
>     (
>     ObjectId int NOT NULL,
>     ObjectType int NOT NULL,
>     Word VARCHAR(64) NOT NULL,
>     WordLocation int NOT NULL,
>     CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>     );
>
>     This table has an index on each of the four columns, it also has the
>     unique index across all four columns as defined above:
>
>     CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
>     (ObjectId);
>     CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>     (ObjectType);
>     CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>     CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>     (WordLocation);
>
>     The table contains about 260,000 rows.
>
>     The following query selects all rows that match instances of string in
>     the Word column. It sums the WordLocation column having grouped by
>     the
>     ObjectId column.
>
>     SELECT ObjectId, SUM(WordLocation) AS Score
>     FROM tblSearchDictionary
>     WHERE Word = 'CONTACT'
>     GROUP BY ObjectId;
>
>     On my machine this will usually complete in an acceptable time of
>     around
>     200ms.
>
>     Now consider the following query which adds a second search term
>     on the
>     same column.
>
>     SELECT ObjectId, SUM(WordLocation) AS Score
>     FROM tblSearchDictionary
>     WHERE Word = 'CONTACT' OR Word = 'ADD'
>     GROUP BY ObjectId;
>
>     This second query usually takes around 10000ms on my machine. My
>     understanding from the Derby optimizer docs and DERBY-47 is that
>     this is
>     because Derby is re-writing the query along the following lines, and
>     then choosing to do a table scan:
>
>     SELECT ObjectId, SUM(WordLocation) AS Score
>     FROM tblSearchDictionary
>     WHERE
>       Word IN ('CONTACT', 'ADD')
>       AND Word >= 'ADD'
>       AND Word <= 'CONTACT'
>     GROUP BY ObjectId;
>
>     The plan for the first query indicates that the
>     tblSearchDictionaryWord
>     index is used to perform an index scan. However, the plan for the
>     second
>     query indicates that the majority of the additional time is taken
>     performing a table scan over the entire table, instead of making
>     use of
>     the indexes available. Our application uses IN quite frequently,
>     so this
>     optimizer behaviour would seem to present a significant problem.
>
>     ---QUERY PLAN FOR FIRST QUERY----
>
>     Statement Name:
>         null
>     Statement Text:
>         SELECT
>         ObjectId,
>         SUM(WordLocation) AS Score
>     FROM tblSearchDictionary
>     WHERE
>             Word = 'CONTACT'
>     GROUP BY
>         ObjectId
>
>     Parse Time: 0
>     Bind Time: 0
>     Optimize Time: 16
>     Generate Time: 0
>     Compile Time: 16
>     Execute Time: 0
>     Begin Compilation Timestamp : 2005-11-11 12:28:52.765
>     End Compilation Timestamp : 2005-11-11 12:28: 52.781
>     Begin Execution Timestamp : 2005-11-11 13:08:15.406
>     End Execution Timestamp : 2005-11-11 13:08:15.406
>     Statement Execution Plan Text:
>     Project-Restrict ResultSet (5):
>     Number of opens = 1
>     Rows seen = 93
>     Rows filtered = 0
>     restriction = false
>     projection = true
>         constructor time (milliseconds) = 0
>         open time (milliseconds) = 0
>         next time (milliseconds) = 0
>         close time (milliseconds) = 0
>         restriction time (milliseconds) = 0
>         projection time (milliseconds) = 0
>         optimizer estimated row count:            1.00
>         optimizer estimated cost:          226.00
>
>     Source result set:
>         Grouped Aggregate ResultSet:
>         Number of opens = 1
>         Rows input = 113
>         Has distinct aggregate = false
>         In sorted order = false
>         Sort information:
>             Number of rows input=113
>             Number of rows output=93
>             Sort type=internal
>             constructor time (milliseconds) = 0
>             open time (milliseconds) = 0
>             next time (milliseconds) = 0
>             close time (milliseconds) = 0
>             optimizer estimated row count:             1.00
>             optimizer estimated cost:          226.00
>
>         Source result set:
>             Project-Restrict ResultSet (4):
>             Number of opens = 1
>             Rows seen = 113
>             Rows filtered = 0
>             restriction = false
>             projection = true
>                 constructor time (milliseconds) = 0
>                 open time (milliseconds) = 0
>                 next time (milliseconds) = 0
>                 close time (milliseconds) = 0
>                 restriction time (milliseconds) = 0
>                 projection time (milliseconds) = 0
>                 optimizer estimated row count:          118.00
>                 optimizer estimated cost:          226.00
>
>             Source result set:
>                 Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>                 Number of opens = 1
>                 Rows seen = 113
>                 Columns accessed from heap = {0, 3}
>                     constructor time (milliseconds) = 0
>                     open time (milliseconds) = 0
>                     next time (milliseconds) = 0
>                     close time (milliseconds) = 0
>                     optimizer estimated row count:           118.00
>                     optimizer estimated cost:          226.00
>
>                     Index Scan ResultSet for TBLSEARCHDICTIONARY using
>     index
>     TBLSEARCHDICTIONARYWORD at read committed isolation level using share
>     row locking chosen by the optimizer
>                     Number of opens = 1
>                     Rows seen = 113
>                     Rows filtered = 0
>                     Fetch Size = 1
>                         constructor time (milliseconds) = 0
>                         open time (milliseconds) = 0
>                         next time (milliseconds) = 0
>                         close time (milliseconds) = 0
>                         next time in milliseconds/row = 0
>
>                     scan information:
>                         Bit set of columns fetched=All
>                         Number of columns fetched=2
>                         Number of deleted rows visited=0
>                         Number of pages visited=4
>                         Number of rows qualified=113
>                         Number of rows visited=114
>                         Scan type=btree
>                         Tree height=3
>                         start position:
>         >= on first 1 column(s).
>         Ordered null semantics on the following columns:
>     0
>                         stop position:
>         > on first 1 column(s).
>         Ordered null semantics on the following columns:
>     0
>                         qualifiers:
>     None
>                         optimizer estimated row count:          118.00
>                         optimizer estimated cost:          226.00
>
>
>     ---QUERY PLAN FOR SECOND QUERY----
>
>     Statement Name:
>         null
>     Statement Text:
>         SELECT
>         ObjectId,
>         SUM(WordLocation) AS Score
>     FROM tblSearchDictionary
>     WHERE
>             Word = 'CONTACT' OR  Word = 'ADD'
>     GROUP BY
>         ObjectId
>
>     Parse Time: 0
>     Bind Time: 0
>     Optimize Time: 0
>     Generate Time: 15
>     Compile Time: 15
>     Execute Time: 4250
>     Begin Compilation Timestamp : 2005-11-11 13:16:17.578
>     End Compilation Timestamp : 2005-11-11 13:16: 17.593
>     Begin Execution Timestamp : 2005-11-11 13:16:17.593
>     End Execution Timestamp : 2005-11-11 13:16:27.437
>     Statement Execution Plan Text:
>     Project-Restrict ResultSet (5):
>     Number of opens = 1
>     Rows seen = 100
>     Rows filtered = 0
>     restriction = false
>     projection = true
>         constructor time (milliseconds) = 0
>         open time (milliseconds) = 4250
>         next time (milliseconds) = 0
>         close time (milliseconds) = 0
>         restriction time (milliseconds) = 0
>         projection time (milliseconds) = 0
>         optimizer estimated row count:            1.00
>         optimizer estimated cost:        82959.49
>
>     Source result set:
>         Grouped Aggregate ResultSet:
>         Number of opens = 1
>         Rows input = 712
>         Has distinct aggregate = false
>         In sorted order = false
>         Sort information:
>             Number of rows input=712
>             Number of rows output=593
>             Sort type=internal
>             constructor time (milliseconds) = 0
>             open time (milliseconds) = 4250
>             next time (milliseconds) = 0
>             close time (milliseconds) = 0
>             optimizer estimated row count:             1.00
>             optimizer estimated cost:        82959.49
>
>         Source result set:
>             Project-Restrict ResultSet (4):
>             Number of opens = 1
>             Rows seen = 712
>             Rows filtered = 0
>             restriction = false
>             projection = true
>                 constructor time (milliseconds) = 0
>                 open time (milliseconds) = 0
>                 next time (milliseconds) = 4219
>                 close time (milliseconds) = 15
>                 restriction time (milliseconds) = 0
>                 projection time (milliseconds) = 0
>                 optimizer estimated row count:        19200.45
>                 optimizer estimated cost:        82959.49
>
>             Source result set:
>                 Project-Restrict ResultSet (3):
>                 Number of opens = 1
>                 Rows seen = 40806
>                 Rows filtered = 40094
>                 restriction = true
>                 projection = false
>                     constructor time (milliseconds) = 0
>                     open time (milliseconds) = 0
>                     next time (milliseconds) = 4219
>                     close time (milliseconds) = 15
>                     restriction time (milliseconds) = 124
>                     projection time (milliseconds) = 0
>                     optimizer estimated row count:        19200.45
>                     optimizer estimated cost:        82959.49
>
>                 Source result set:
>                     Table Scan ResultSet for TBLSEARCHDICTIONARY at read
>     committed
>     isolation level using share row locking chosen by the optimizer
>                     Number of opens = 1
>                     Rows seen = 40806
>                     Rows filtered = 0
>                     Fetch Size = 1
>                         constructor time (milliseconds) = 0
>                         open time (milliseconds) = 0
>                         next time (milliseconds) = 4001
>                         close time (milliseconds) = 15
>                         next time in milliseconds/row = 0
>
>                     scan information:
>                         Bit set of columns fetched={0, 2, 3}
>                         Number of columns fetched=3
>                         Number of pages visited=2978
>                         Number of rows qualified=40806
>                         Number of rows visited=256001
>                         Scan type=heap
>                         start position:
>     null                    stop position:
>     null                    qualifiers:
>     Column[0][0] Id: 2
>     Operator: <
>     Ordered nulls: false
>     Unknown return value: true
>     Negate comparison result: true
>     Column[0][1] Id: 2
>     Operator: <=
>     Ordered nulls: false
>     Unknown return value: false
>     Negate comparison result: false
>
>                         optimizer estimated row count:        19200.45
>                         optimizer estimated cost:         82959.49
>
>     ----------
>
>     Thanks in advance for any help!
>
>     Kind regards,
>
>
>     Kevin Hore
>
>


Mime
View raw message