db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Doran <matt.do...@papercut.com>
Subject Pathologically poor query performance - advice required
Date Wed, 25 Jun 2008 05:35:55 GMT
Hi there,

We use Derby in our print monitoring product, PaperCut NG.  We've 
generally had a great experience with it (fast, reliable, etc), but I've 
encountered a very strange performance problem that I'm unable to 
explain and would love some advice from Derby gurus or devs.

We're using the latest Derby (10.4.1.3), and the queries were initially 
generated by Hibernate ORM.

When displaying print logs in our application we do a query that joins 5 
tables, and selects many fields, and sorts by date descending to show 
the latest logs first (see original-slow-query.sql).   We also apply a 
JDBC limit to only return the first 25 rows.  To handle pagination we 
first do a count(*) using the same join and where clause to determine 
how many rows, and then perform the actual query.   We have a sample  
database that shows the following performance:
    * Count query - takes in the order of 1-2 seconds
    * Select query - takes about 1 hour !!  (Even when returning only 4 
rows)


The key table in this query (tbl_printer_usage_log) has about 130,000 
rows.  The where clause applies a filter to the date column (which is 
indexed) such that only 4 columns are returned.   This should be a fast 
operation, but the optimizer is choosing a poor execution plan. 

I did further investigation to find the cause of the slowness ... that's 
where things got stranger.  I removed 2 of the join table and associated 
fields but the performance didn't change significantly.   So I then 
started removing the selected columns from the original. This produced 
some dramatic and surprising results.  I tried various combinations of 
fields selected, but couldn't establish any real patterns.  But I've 
attached 2 of these modified queries and associated query plans.
    * modified-slow-query.sql - removed all but 10 of the selected fields
    * modified-fast-query.sql - removed a single field from 
"modified-slow-query.sql"

The slow version of this modified query still takes about 1 hour to run 
(for query plan see derby-log-modified-slow.log).  The fast version runs 
in seconds and returns 4 results (for query plan see 
derby-log-modified-fast.log).

Both of these queries only return fields from 2 of the joined tables.  
The slow one selects 5 fields from each of these tables.  The fast one 
selects 5 fields from one table and 4 fields from the other.   Strange 
indeed!!

So something is going on to cause the optimizer to choose completely the 
wrong plan.  I wonder why it isn't eliminating 99.99% of the rows using 
the obvious indexed where clause.  Is this example hitting some 
edge-case in the optimizer making it choose the wrong path?

You can download a copy of this database from the link below (17MB 
zipped).  Also, to make sure things were in order and statistics were 
updated, etc I ran "SYSCS_UTIL.SYSCS_COMPRESS_TABLE" on all the tables.

    http://www.papercut.com/tmp/derby.zip


So does anybody understand what might be happening here?  How I could go 
about improving the performance?   Is this something in the derby 
optimizer that needs looking into?

I look forward to any insights you can provide!!

Regards,
Matt

Mime
View raw message