db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Hore ...@araxis.com>
Subject Re: Poor query optimizer choices is making Derby unusable for large tables
Date Mon, 14 Nov 2005 16:12:53 GMT
Hi Satheesh

Satheesh Bandaram wrote:
> Hi Kevin,
> 
> Kevin Hore wrote:
> 
> 
>>i) Does anyone have any plans to fix this problem?
> 
> 
> Can you file an enhancement request for this? I think Derby could
> improve it's handling of OR/IN clauses. Many databases don't optimize OR
> clauses as much as possible, though some do better than others. It would
> be great if Derby could internally process this as two different scans
> (one for 'CONTACT' and another for 'ADD') and then combine the results.
> Some databases can do this. However, the workaround suggested by Jeff L.
> does this, though you have to rewrite the query.
I've commented on the re-write solution elsewhere. Regarding an
enhancement, I think DERBY-47 pretty much covers the problem, but I'll
add a comment to that describing my circumstances.

> 
> Satheesh
> 
> 
>>ii) In the meantime, are there any work-arounds? I’d appreciate any
>>suggestions that would decrease the execution time of my second query
>>below (the one with with two search terms). Likewise, any general
>>strategies for avoiding this problem with IN clauses would be
>>appreciated.
>>
>>
>>----PROBLEM DESCRIPTION----
>>
>>Consider the table:
>>
>>CREATE TABLE tblSearchDictionary
>>(
>>ObjectId int NOT NULL,
>>ObjectType int NOT NULL,
>>Word VARCHAR(64) NOT NULL,
>>WordLocation int NOT NULL,
>>CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>>);
>>
>>This table has an index on each of the four columns, it also has the
>>unique index across all four columns as defined above:
>>
>>CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
>>(ObjectId);
>>CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>>(ObjectType);
>>CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>>CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>>(WordLocation);
>>
>>The table contains about 260,000 rows.
>>
>>The following query selects all rows that match instances of string in
>>the Word column. It sums the WordLocation column having grouped by the
>>ObjectId column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT'
>>GROUP BY ObjectId;
>>
>>On my machine this will usually complete in an acceptable time of
>>around 200ms.
>>
>>Now consider the following query which adds a second search term on
>>the same column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY ObjectId;
>>
>>This second query usually takes around 10000ms on my machine. My
>>understanding from the Derby optimizer docs and DERBY-47 is that this
>>is because Derby is re-writing the query along the following lines,
>>and then choosing to do a table scan:
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>  Word IN ('CONTACT', 'ADD')
>>  AND Word >= 'ADD'
>>  AND Word <= 'CONTACT'
>>GROUP BY ObjectId;
>>
>>The plan for the first query indicates that the tblSearchDictionaryWord
>>index is used to perform an index scan. However, the plan for the second
>>query indicates that the majority of the additional time is taken
>>performing a table scan over the entire table, instead of making use of
>>the indexes available. Our application uses IN quite frequently, so
>>this optimizer behaviour would seem to present a significant problem.
>>
>>---QUERY PLAN FOR FIRST QUERY----
>>
>>Statement Name:
>>    null
>>Statement Text:
>>    SELECT
>>    ObjectId,
>>    SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>        Word = 'CONTACT'
>>GROUP BY
>>    ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 16
>>Generate Time: 0
>>Compile Time: 16
>>Execute Time: 0
>>Begin Compilation Timestamp : 2005-11-11 12:28:52.765
>>End Compilation Timestamp : 2005-11-11 12:28:52.781
>>Begin Execution Timestamp : 2005-11-11 13:08:15.406
>>End Execution Timestamp : 2005-11-11 13:08:15.406
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 93
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>    constructor time (milliseconds) = 0
>>    open time (milliseconds) = 0
>>    next time (milliseconds) = 0
>>    close time (milliseconds) = 0
>>    restriction time (milliseconds) = 0
>>    projection time (milliseconds) = 0
>>    optimizer estimated row count:            1.00
>>    optimizer estimated cost:          226.00
>>
>>Source result set:
>>    Grouped Aggregate ResultSet:
>>    Number of opens = 1
>>    Rows input = 113
>>    Has distinct aggregate = false
>>    In sorted order = false
>>    Sort information:
>>        Number of rows input=113
>>        Number of rows output=93
>>        Sort type=internal
>>        constructor time (milliseconds) = 0
>>        open time (milliseconds) = 0
>>        next time (milliseconds) = 0
>>        close time (milliseconds) = 0
>>        optimizer estimated row count:            1.00
>>        optimizer estimated cost:          226.00
>>
>>    Source result set:
>>        Project-Restrict ResultSet (4):
>>        Number of opens = 1
>>        Rows seen = 113
>>        Rows filtered = 0
>>        restriction = false
>>        projection = true
>>            constructor time (milliseconds) = 0
>>            open time (milliseconds) = 0
>>            next time (milliseconds) = 0
>>            close time (milliseconds) = 0
>>            restriction time (milliseconds) = 0
>>            projection time (milliseconds) = 0
>>            optimizer estimated row count:          118.00
>>            optimizer estimated cost:          226.00
>>
>>        Source result set:
>>            Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>>            Number of opens = 1
>>            Rows seen = 113
>>            Columns accessed from heap = {0, 3}
>>                constructor time (milliseconds) = 0
>>                open time (milliseconds) = 0
>>                next time (milliseconds) = 0
>>                close time (milliseconds) = 0
>>                optimizer estimated row count:          118.00
>>                optimizer estimated cost:          226.00
>>
>>                Index Scan ResultSet for TBLSEARCHDICTIONARY using index
>>TBLSEARCHDICTIONARYWORD at read committed isolation level using share
>>row locking chosen by the optimizer
>>                Number of opens = 1
>>                Rows seen = 113
>>                Rows filtered = 0
>>                Fetch Size = 1
>>                    constructor time (milliseconds) = 0
>>                    open time (milliseconds) = 0
>>                    next time (milliseconds) = 0
>>                    close time (milliseconds) = 0
>>                    next time in milliseconds/row = 0
>>
>>                scan information:
>>                    Bit set of columns fetched=All
>>                    Number of columns fetched=2
>>                    Number of deleted rows visited=0
>>                    Number of pages visited=4
>>                    Number of rows qualified=113
>>                    Number of rows visited=114
>>                    Scan type=btree
>>                    Tree height=3
>>                    start position:
>>    >= on first 1 column(s).
>>    Ordered null semantics on the following columns:
>>0
>>                    stop position:
>>    > on first 1 column(s).
>>    Ordered null semantics on the following columns:
>>0
>>                    qualifiers:
>>None
>>                    optimizer estimated row count:          118.00
>>                    optimizer estimated cost:          226.00
>>
>>
>>---QUERY PLAN FOR SECOND QUERY----
>>
>>Statement Name:
>>    null
>>Statement Text:
>>    SELECT
>>    ObjectId,
>>    SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>        Word = 'CONTACT' OR  Word = 'ADD'
>>GROUP BY
>>    ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 0
>>Generate Time: 15
>>Compile Time: 15
>>Execute Time: 4250
>>Begin Compilation Timestamp : 2005-11-11 13:16:17.578
>>End Compilation Timestamp : 2005-11-11 13:16:17.593
>>Begin Execution Timestamp : 2005-11-11 13:16:17.593
>>End Execution Timestamp : 2005-11-11 13:16:27.437
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 100
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>    constructor time (milliseconds) = 0
>>    open time (milliseconds) = 4250
>>    next time (milliseconds) = 0
>>    close time (milliseconds) = 0
>>    restriction time (milliseconds) = 0
>>    projection time (milliseconds) = 0
>>    optimizer estimated row count:            1.00
>>    optimizer estimated cost:        82959.49
>>
>>Source result set:
>>    Grouped Aggregate ResultSet:
>>    Number of opens = 1
>>    Rows input = 712
>>    Has distinct aggregate = false
>>    In sorted order = false
>>    Sort information:
>>        Number of rows input=712
>>        Number of rows output=593
>>        Sort type=internal
>>        constructor time (milliseconds) = 0
>>        open time (milliseconds) = 4250
>>        next time (milliseconds) = 0
>>        close time (milliseconds) = 0
>>        optimizer estimated row count:            1.00
>>        optimizer estimated cost:        82959.49
>>
>>    Source result set:
>>        Project-Restrict ResultSet (4):
>>        Number of opens = 1
>>        Rows seen = 712
>>        Rows filtered = 0
>>        restriction = false
>>        projection = true
>>            constructor time (milliseconds) = 0
>>            open time (milliseconds) = 0
>>            next time (milliseconds) = 4219
>>            close time (milliseconds) = 15
>>            restriction time (milliseconds) = 0
>>            projection time (milliseconds) = 0
>>            optimizer estimated row count:        19200.45
>>            optimizer estimated cost:        82959.49
>>
>>        Source result set:
>>            Project-Restrict ResultSet (3):
>>            Number of opens = 1
>>            Rows seen = 40806
>>            Rows filtered = 40094
>>            restriction = true
>>            projection = false
>>                constructor time (milliseconds) = 0
>>                open time (milliseconds) = 0
>>                next time (milliseconds) = 4219
>>                close time (milliseconds) = 15
>>                restriction time (milliseconds) = 124
>>                projection time (milliseconds) = 0
>>                optimizer estimated row count:        19200.45
>>                optimizer estimated cost:        82959.49
>>
>>            Source result set:
>>                Table Scan ResultSet for TBLSEARCHDICTIONARY at read
>>committed
>>isolation level using share row locking chosen by the optimizer
>>                Number of opens = 1
>>                Rows seen = 40806
>>                Rows filtered = 0
>>                Fetch Size = 1
>>                    constructor time (milliseconds) = 0
>>                    open time (milliseconds) = 0
>>                    next time (milliseconds) = 4001
>>                    close time (milliseconds) = 15
>>                    next time in milliseconds/row = 0
>>
>>                scan information:
>>                    Bit set of columns fetched={0, 2, 3}
>>                    Number of columns fetched=3
>>                    Number of pages visited=2978
>>                    Number of rows qualified=40806
>>                    Number of rows visited=256001
>>                    Scan type=heap
>>                    start position:
>>null                    stop position:
>>null                    qualifiers:
>>Column[0][0] Id: 2
>>Operator: <
>>Ordered nulls: false
>>Unknown return value: true
>>Negate comparison result: true
>>Column[0][1] Id: 2
>>Operator: <=
>>Ordered nulls: false
>>Unknown return value: false
>>Negate comparison result: false
>>
>>                    optimizer estimated row count:        19200.45
>>                    optimizer estimated cost:        82959.49
>>
>>----------
>>
>>Thanks in advance for any help!
>>
>>Kind regards,
>>
>>
>>Kevin Hore
>>
>>
>>
> 
> 




Mime
View raw message