One could argue that in the first case (2>1), the optimizer is free to choose whatever index it wants because the query can provably return no rows.  However, in the second case (1=1), were the DateTimestamp condition must be evaluated, it would appear that the optimizer indeed selected the wrong index.

However, I don't think you can point to 2ms vs. 6000ms as proof of a bad query plan.  In the first case, the optimizer can prove (to itself) that the query can return no rows (because of the 2>1 condition along with all AND predicates) -- and therefore it probably doesn't do any index scan at all.  Therefore, 2ms.

In the second case, because the 1=1 condition is satisfied, Derby must *actually* do the index scan as per plan.  Therefore, 6000ms.  Whether using the DESC index actually provides significant improvement over ASC remains to be seen, as the comparison currently isn't apples-to-apples.

I would be interested in seeing the result of the DESC vs. the ASC index in the (1=1) scenario.  You can override the optimizer with a hint, like so:

SELECT * FROM table --DERBY-PROPERTIES index=TBLEVENTS_DATETIMESTAMP_DESC
WHERE
1=1 AND
1=1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

Note, the --DERBY-PROPERTIES must come at the end of a literal line (i.e. there must be a linefeed after that cause) otherwise everything after that will be taken as a comment.

Let us know the result.  Either way it may be considered a bug in the optimizer, but depending on the number of rows returned, may not make a significant difference in performance.

-Brett

On Wed, Sep 23, 2009 at 11:18 PM, Gavin Matthews <gjm@araxis.com> wrote:

Hi,

We have a WHERE clause:

WHERE
2>1 AND
2>1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

which reasonably takes 2ms to execute on a table that has 357254 rows, and
does so by selecting the most appropriate index:

Index Scan ResultSet for TBLEVENTS using index TBLEVENTS_DATETIMESTAMP_DESC
at serializable isolation level using share row locking chosen by the
optimizer

However, when we change the WHERE clause to:

WHERE
1=1 AND
1=1 AND
DateTimestamp >= ? AND
DateTimestamp <= ? AND
ORDER BY DateTimestamp DESC

the execution time increases to 6000ms, and appears to do so because the
query planner hasn't selected the most appropriate index:

Index Scan ResultSet for TBLEVENTS using index TBLEVENTS_DATETIMESTAMP_ASC
at serializable isolation level using share row locking chosen by the
optimizer

The indexes are specified as follows:

CREATE INDEX tblEventsDateTimestampASC ON tblEvents (DateTimestamp ASC);
CREATE INDEX tblEventsDateTimestampDESC ON tblEvents (DateTimestamp DESC);

Do people agree that this appears to be a bug with the query planner, as the
selection of the most appropriate index should not be influenced by static
conditions such as 1=1 or 2>1?

Regards,

Gavin

--
View this message in context: http://www.nabble.com/Badly-performing-WHERE-caluse-tp25531166p25531166.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.