db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <Knut.Hat...@Sun.COM>
Subject Re: Performance Tuning Problem ?
Date Mon, 25 May 2009 09:52:19 GMT
ganest <ganest@ced.tuc.gr> writes:

> Hi all, 
>
> In an attempt to investigate the problem I mentioned in my previous post, I
> run additional tests and some interesting and weird results were appeared.
> May be someone who has good knowledge of Derby's internal operation can give
> some explanation. Any suggestion is welcomed. I describe what I have done:
>
> Fresh Start of ij to ensure that no cache is used:
>
> java -Dderby.language.maxMemoryPerTable=128
> -Dderby.storage.pageCacheSize=4000 -Xms256m -Xmx256m -jar
> $DERBY_HOME/lib/derbyrun.jar ij
>
> prepare ps1 as 'select count(*) from  ganest1.big inner join ganest1.bigref
> on ganest1.big.id=ganest1.bigref.bigid and ganest1.big.id > 4000000 and
> ganest1.big.id < 4100000';
> ELAPSED TIME = 334 milliseconds
> ij> execute ps1;
> 1          
> -----------
> 10128      
>
> 1 row selected
> ELAPSED TIME = 440 milliseconds
>
> From the query plan we can see that a nested loop join strategy is used and
> it
> opens 10128 index scans on the inner table. The situation looks similar to
> the one
> I described in my previous post, but the execution time is quite
> satisfactory!
> (milliseconds vs 20 seconds see below)

The accessed data has higher locality in this query than in the previous
query, and therefore the page cache is used more efficiently.

The previous query used the indexes on big.name and bigref.bigid, which
are ordered differently. So even if the index on big.name was scanned
sequentially, the index on bigref.bigid was accessed almost completely
randomly, and a large number of pages all over the table had to be
visited.

In the latest query, the indexes on big.id and bigref.bigid are
used. Those indexes are ordered the same way, and all the interesting
rows are located consecutively on a relatively small number of pages. So
even if the number of page accesses is higher, the number of different
pages accessed is probably much lower. And since the rows in both
indexes are accessed in the same order that they are stored, the
requested page is found in the page cache most of the time.

Another difference is that the previous query had to access the base
table (big) to get the id column, whereas the latest query finds all the
columns it needs in the indexes. You may want to try to add an extra
index to speed up the previous query:

  CREATE UNIQUE INDEX BIGNAMEID ON BIG(NAME, ID)

-- 
Knut Anders

Mime
View raw message