db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Segel" <mse...@segel.com>
Subject RE: FW: Advice on *very* badly performing query
Date Tue, 04 Dec 2007 01:00:09 GMT
Sorry to day is a really busy day.

 

I meant to say you should reflect on what you want to accomplish in your
query.





 

  _____  

From: Matt Doran [mailto:matt.doran@papercut.com] 
Sent: Monday, December 03, 2007 4:34 PM
To: derby-user@db.apache.org
Cc: msegel@segel.com
Subject: Re: FW: Advice on *very* badly performing query

 

Hi Michael,

Michael Segel wrote: 

The short simple answer. You get what you pay for.

 

The longer answer. Query optimization is a black art. Cloudscape was
designed as a lightweight no frills embeddable DB.

Now you Cloudscape morphed in to Derby and JavaDB. But you lose the input
from the folks at IBM who handle Query Optimization.

But I'm more than happy to work with the constraints of Derby, if only I
could understand them.   And that's the help I was looking for here.



 

I have to run to a customer site, but using one of your examples. you
noticed that the query performance changed when you had the field in the
select columns as well as the where clause, but you didn't when you had the
field just in the where clause. So keep it in the selected fields. You could
also try and change the order of the tables you're joining.

I didn't change the where clause, just changing the select fields causes the
dramatic query plan change.   I have a feeling it might be the fact that I'm
selecting an attribute from the 5th join table ... but I'd like a better
understanding of what's triggering the change so I can avoid it if possible.




And you may want to reflect your 5 table join.  Depending on the database
and its tuning. Joining more than 3-4 tables can have a drastic negative
impact on its performance. 

I'm not sure what you mean to "reflect your 5 table join"?   

The fundamental issue here is that in this poor performing case, derby is
not looking at the index on the very large table that would immediately
reduce the dataset.   For whatever reason the optimizer is making a the
worst possible case decision.



 And if speed really is important look at Informix (IDS 11) now offered by
IBM.

Unfortunately as an off the shelf Java application that runs on Windows, Mac
and Linux ... we really need a simple embedded DB that we can ship as the
default.   Unfortunately Derby's query optimizer let's it down badly
sometimes.



 

HTH

 

 

  _____  

From: Matt Doran [mailto:matt.doran@papercut.com] 
Sent: Sunday, December 02, 2007 11:06 PM
To: derby-user@db.apache.org
Subject: Advice on *very* badly performing query (with reproduction recipe)

 

Hi there,

We use Apache Derby in our commercial application, PaperCut NG
<http://www.papercut.com/> .  It's proven to be very reliable, however we
occasionally get reports of very bad performance in some areas.  We haven't
had the time to investigate them fully previously (usually upgrading to an
external DB like Postgres or SQL Server fixes the issue).  This time we had
a look in more detail with a recent report, and we've found some very
strange performance characteristics ... and would love some advice and
assistance.

We have a query that is doing inner joins to 5 tables.  It's quite a simple
query, but the core table has about 300,000 rows, and where limiting the
results based on a date in that table that is indexed.  Here's a summary of
my situation/findings:

*	Using the latest Derby release 10.3.1.4, with a Java 1.5 VM on
Windows.
*	We only have a single WHERE clause, which is on the indexed date
field is restricting the data such that no data is returned.  e.g. log_date
> (latest log date).  So derby should quickly detect there is litte/no data
to return.
*	Running the original query takes 22 minutes running 100% CPU.
*	Running a count(*) for the same query is quick (< 1 sec).
*	Removing the ORDER BY and changing the select list to just include a
single field from each table and it still takes 22 minutes.
*	Changing the select list to retrieve only a single field from 2 of
the table and it still takes 22 minutes (I have a log of the query and the
runtime stats for this attached "derby-slow.log").
*	Changing the select list to a single field from 1 of the tables
makes the query run fast - less than a second. (I have a log of the query
and the runtime stats for this attached "derby-fast.log").
*	Running the original query on the same dataset in PostgreSQL or SQL
Server is very fast (less than a second).  This is why we often recommend
customers upsize to Postgres or SQL Server.
*	Also the SQL is generated via Hibernate ORM, so we have some
limitations in how we can modify the SQL.


>From the query plan it seems that seems that it stops using the date index
on the "tbl_printer_usage_log" log table, and changes from Hash joins to
Nested Loop joins.  On a large table like this when providing a where clause
that on a field that is indexed .... we have to ensure that derby uses the
index.

If I increase the pageCacheSize to 100,000 pages, it reduces the time of the
query to about 2-3 minutes, but it's still very slow compared to when the
correct index is used.

Can anyone please help me understand the following:

*	Why does the query plan change dramatically, just by changing the
fields that are retrieved?
*	Why is derby avoiding the most obvious index?  The date field in the
300,000 row table (the date field is the only field in our where clause).
*	Is there anyway to avoid this behavior?

If we can understand what's causing this, we'll be able to make a much more
effective use of Derby.  At the moment, on customers with large datasets, we
currently just recommend they "upsize" to Postgres or SQL Server and the
problem goes away.  However, we'd much prefer to fix this and have our Derby
database behave better.

I'd be happy to provide the derby database that exhibits these problems if
someone would like to see what's going on.  The database is from a customer,
so I don't want to post it publicly, but if you send me an email off-list
I'd be happy to provide it.

Regards.




-- 
Matt Doran
PaperCut Software International Pty. Ltd.
Phone:   +61 (3) 9807 5767
E-mail:  matt.doran@papercut.com
Profile: http://www.papercut.com/about/#matt
Blog:    http://www.papercut.com/blog/

Mime
View raw message