Hi Kevin,
 
I haven't investigated Derby-47 to know how to fix the problem but I do have an optimizer overrides patch waiting for review on the derby developer list which will let user specify their own optimizer properties to help the optimizer pick a specific plan. The JIRA entry for optimizer overrides is Derby-573. The patch is for Derby 10.2.
 
These optimizer properties can be handy in a case like yours where until the optimizer is fixed, a user can tell the optimizer what plan to use. Just an FYI as I know it doesn't help with your particular case since there is no optimizer overrides support for 10.1.
 
Mamta
 
On 11/11/05, Kevin Hore <kh@araxis.com> wrote:
The Derby query optimizer (this is with 10.1.1.0) decides to perform an
expensive table scan in certain circumstances, when it could make use of
available indexes. The resulting poor performance is rendering Derby
useless for our application.

I believe that this behaviour might be related to DERBY-47
( http://issues.apache.org/jira/browse/DERBY-47), which was opened in
October 2004. However, this seems such a severe and fundamental problem
that I'm surprised firstly that a significant portion of the Derby user
base isn't affected by this and, secondly, that DERBY-47 hasn't been
considered a critical defect to be fixed with the utmost urgency.

I've described the problem in detail below, and I'd appreciate any
assistance. Specifically:

i) Does anyone have any plans to fix this problem?

ii) In the meantime, are there any work-arounds? I'd appreciate any
suggestions that would decrease the execution time of my second query
below (the one with with two search terms). Likewise, any general
strategies for avoiding this problem with IN clauses would be appreciated.


----PROBLEM DESCRIPTION----

Consider the table:

CREATE TABLE tblSearchDictionary
(
ObjectId int NOT NULL,
ObjectType int NOT NULL,
Word VARCHAR(64) NOT NULL,
WordLocation int NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
);

This table has an index on each of the four columns, it also has the
unique index across all four columns as defined above:

CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
(ObjectType);
CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
(WordLocation);

The table contains about 260,000 rows.

The following query selects all rows that match instances of string in
the Word column. It sums the WordLocation column having grouped by the
ObjectId column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
GROUP BY ObjectId;

On my machine this will usually complete in an acceptable time of around
200ms.

Now consider the following query which adds a second search term on the
same column.

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;

This second query usually takes around 10000ms on my machine. My
understanding from the Derby optimizer docs and DERBY-47 is that this is
because Derby is re-writing the query along the following lines, and
then choosing to do a table scan:

SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
  Word IN ('CONTACT', 'ADD')
  AND Word >= 'ADD'
  AND Word <= 'CONTACT'
GROUP BY ObjectId;

The plan for the first query indicates that the tblSearchDictionaryWord
index is used to perform an index scan. However, the plan for the second
query indicates that the majority of the additional time is taken
performing a table scan over the entire table, instead of making use of
the indexes available. Our application uses IN quite frequently, so this
optimizer behaviour would seem to present a significant problem.

---QUERY PLAN FOR FIRST QUERY----

Statement Name:
    null
Statement Text:
    SELECT
    ObjectId,
    SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
        Word = 'CONTACT'
GROUP BY
    ObjectId

Parse Time: 0
Bind Time: 0
Optimize Time: 16
Generate Time: 0
Compile Time: 16
Execute Time: 0
Begin Compilation Timestamp : 2005-11-11 12:28:52.765
End Compilation Timestamp : 2005-11-11 12:28: 52.781
Begin Execution Timestamp : 2005-11-11 13:08:15.406
End Execution Timestamp : 2005-11-11 13:08:15.406
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 93
Rows filtered = 0
restriction = false
projection = true
    constructor time (milliseconds) = 0
    open time (milliseconds) = 0
    next time (milliseconds) = 0
    close time (milliseconds) = 0
    restriction time (milliseconds) = 0
    projection time (milliseconds) = 0
    optimizer estimated row count:            1.00
    optimizer estimated cost:          226.00

Source result set:
    Grouped Aggregate ResultSet:
    Number of opens = 1
    Rows input = 113
    Has distinct aggregate = false
    In sorted order = false
    Sort information:
        Number of rows input=113
        Number of rows output=93
        Sort type=internal
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        optimizer estimated row count:             1.00
        optimizer estimated cost:          226.00

    Source result set:
        Project-Restrict ResultSet (4):
        Number of opens = 1
        Rows seen = 113
        Rows filtered = 0
        restriction = false
        projection = true
            constructor time (milliseconds) = 0
            open time (milliseconds) = 0
            next time (milliseconds) = 0
            close time (milliseconds) = 0
            restriction time (milliseconds) = 0
            projection time (milliseconds) = 0
            optimizer estimated row count:          118.00
            optimizer estimated cost:          226.00

        Source result set:
            Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
            Number of opens = 1
            Rows seen = 113
            Columns accessed from heap = {0, 3}
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                optimizer estimated row count:           118.00
                optimizer estimated cost:          226.00

                Index Scan ResultSet for TBLSEARCHDICTIONARY using index
TBLSEARCHDICTIONARYWORD at read committed isolation level using share
row locking chosen by the optimizer
                Number of opens = 1
                Rows seen = 113
                Rows filtered = 0
                Fetch Size = 1
                    constructor time (milliseconds) = 0
                    open time (milliseconds) = 0
                    next time (milliseconds) = 0
                    close time (milliseconds) = 0
                    next time in milliseconds/row = 0

                scan information:
                    Bit set of columns fetched=All
                    Number of columns fetched=2
                    Number of deleted rows visited=0
                    Number of pages visited=4
                    Number of rows qualified=113
                    Number of rows visited=114
                    Scan type=btree
                    Tree height=3
                    start position:
    >= on first 1 column(s).
    Ordered null semantics on the following columns:
0
                    stop position:
    > on first 1 column(s).
    Ordered null semantics on the following columns:
0
                    qualifiers:
None
                    optimizer estimated row count:          118.00
                    optimizer estimated cost:          226.00


---QUERY PLAN FOR SECOND QUERY----

Statement Name:
    null
Statement Text:
    SELECT
    ObjectId,
    SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
        Word = 'CONTACT' OR  Word = 'ADD'
GROUP BY
    ObjectId

Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 15
Compile Time: 15
Execute Time: 4250
Begin Compilation Timestamp : 2005-11-11 13:16:17.578
End Compilation Timestamp : 2005-11-11 13:16: 17.593
Begin Execution Timestamp : 2005-11-11 13:16:17.593
End Execution Timestamp : 2005-11-11 13:16:27.437
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 100
Rows filtered = 0
restriction = false
projection = true
    constructor time (milliseconds) = 0
    open time (milliseconds) = 4250
    next time (milliseconds) = 0
    close time (milliseconds) = 0
    restriction time (milliseconds) = 0
    projection time (milliseconds) = 0
    optimizer estimated row count:            1.00
    optimizer estimated cost:        82959.49

Source result set:
    Grouped Aggregate ResultSet:
    Number of opens = 1
    Rows input = 712
    Has distinct aggregate = false
    In sorted order = false
    Sort information:
        Number of rows input=712
        Number of rows output=593
        Sort type=internal
        constructor time (milliseconds) = 0
        open time (milliseconds) = 4250
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        optimizer estimated row count:             1.00
        optimizer estimated cost:        82959.49

    Source result set:
        Project-Restrict ResultSet (4):
        Number of opens = 1
        Rows seen = 712
        Rows filtered = 0
        restriction = false
        projection = true
            constructor time (milliseconds) = 0
            open time (milliseconds) = 0
            next time (milliseconds) = 4219
            close time (milliseconds) = 15
            restriction time (milliseconds) = 0
            projection time (milliseconds) = 0
            optimizer estimated row count:        19200.45
            optimizer estimated cost:        82959.49

        Source result set:
            Project-Restrict ResultSet (3):
            Number of opens = 1
            Rows seen = 40806
            Rows filtered = 40094
            restriction = true
            projection = false
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 4219
                close time (milliseconds) = 15
                restriction time (milliseconds) = 124
                projection time (milliseconds) = 0
                optimizer estimated row count:        19200.45
                optimizer estimated cost:        82959.49

            Source result set:
                Table Scan ResultSet for TBLSEARCHDICTIONARY at read
committed
isolation level using share row locking chosen by the optimizer
                Number of opens = 1
                Rows seen = 40806
                Rows filtered = 0
                Fetch Size = 1
                    constructor time (milliseconds) = 0
                    open time (milliseconds) = 0
                    next time (milliseconds) = 4001
                    close time (milliseconds) = 15
                    next time in milliseconds/row = 0

                scan information:
                    Bit set of columns fetched={0, 2, 3}
                    Number of columns fetched=3
                    Number of pages visited=2978
                    Number of rows qualified=40806
                    Number of rows visited=256001
                    Scan type=heap
                    start position:
null                    stop position:
null                    qualifiers:
Column[0][0] Id: 2
Operator: <
Ordered nulls: false
Unknown return value: true
Negate comparison result: true
Column[0][1] Id: 2
Operator: <=
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

                    optimizer estimated row count:        19200.45
                    optimizer estimated cost:         82959.49

----------

Thanks in advance for any help!

Kind regards,


Kevin Hore