db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <de...@segel.com>
Subject RE: Index perfomance
Date Thu, 26 Mar 2009 13:57:40 GMT
Tim,

Look, sorry if this is a bit of a rant.
I don't think that your question is a bad question. In fact it's a good
question. What concerns me is that the responses to your question missed the
obvious.

This isn't a bug in Derby. So opening a Jira would be a waste of time.
Since you just created the index after the table was loaded, the statistics
should be fine. Maybe I'm making an assumption that the data was static for
the sake of this test...

I realize that a lot of people who are using and investigating JavaDB/Derby
are not 'classically' trained DBAs and some view the database as nothing
more than a persistence of their java objects. 

My point is that you have to think about what you're asking the database to
do and how indexes work. When you have a 'tight' cluster of data around a
set of values, you're not going to get good performance out of a B-Tree
index. (What happens when your index contains non-unique values? Do you then
walk through the set of non-unique values sequentially? )

I don't believe that Derby offers a Bitmap Index, which may be a better fit.
Usually it's for columns which have a low number of distinct values. Tim's
column is a decimal column which could have a lot of different values,
however, in his specific instance, he has a lot of data tightly clustered in
a small set of values. 

For those not familiar with a Bitmap Index, here's a reference that might
help.
http://www.oracle.com/technology/pub/articles/sharma_indexes.html

I've added some comments below, and I again apologize for this mini-rant. 

HTH

-Mikey

> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf Of Tim Dudgeon
> Sent: Thursday, March 26, 2009 7:38 AM
> To: derby-user@db.apache.org
> Subject: Re: Index perfomance
> 
> Thanks for the comments. Here is some more info.
> I attach the DDL for the table concerned, the simple test program I use
> and the execution strategy with and without an index.
> 
> Some additional points:
> 
[SNIP] 
> 4. the index was added after the rows were added.
> 
This is a non issue. Actually when loading a large table, you would want to
drop the index load the rows and then re-add the index for better
performance. (Assuming that you're not talking about a cluster index...)

> 5. making the index also have the pk_column as the second indexed field
> makes it go like lightning! search runs in 2 secs, about 14x faster.
> 
No Duh!

I'm sorry, but you have to stop and think about what you were trying to do.
As I said in my earlier response, you had something like 20,000 rows with
the same indexed value. Think about what your index tree looked like.

By adding the pk_column as the second column of the index, you then had a
unique pair in your index.

Remember that your index is a B-tree index!
> 
> So in summary it seems like an index will be of no help to me in this
> situation, unless I make it an index of both columns.
> 






Mime
View raw message