db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Index perfomance
Date Thu, 26 Mar 2009 16:14:29 GMT

The technique of adding extra columns to indexes works well for derby if
it matching your application needs.  The docs usually refer to this as a
covering index and the optimizer is pretty good at looking for cases 
where it can use a covering index and avoid going to the base table. 
Hopefully this will help your application.

 From info posted, the query with the index is definitely doing an index
to base row lookup for every qualifying row. It looks like for this
data distribution and number of qualifying rows this a worst plan than
just doing a full table scan.
The optimizer should have picked the base table scan
rather than the index to base row given how the 2 performed. I think
this is another case showing the costs for the optimizer need to be 
updated to reflect current technology.  There has been a lot of work to
make scans go fast and that is not reflected in the current costing.

The optimizer estimated 203425 rows for the index qualification and got
271136 which seems not too bad (5% vs. 6.8%).  This info comes from the
with-index.txt query plan.  It assumes equal distribution of values 
across all the values so maybe this range was a
little hotter than others.  Since the row count estimate looks close I
lean toward the base costing as the problem.

It would be interesting to know how much of the index/baserow 
performance issue is that it keeps getting cache misses vs. the cpu 
overhead of just doing the base row look up for every row.  For this db
it would take a 50,000 page cache just to cache the base row plus 
whatever it takes to cache the index.

For this kind of distribution I have seen db's gather all the row 
pointers from the index and then do ordered probes into the base table.
This insures good caching for the lookups.  Derby does not have this
technique available.

Tim Dudgeon wrote:
> Thanks for the comments. Here is some more info.
> I attach the DDL for the table concerned, the simple test program I use 
> and the execution strategy with and without an index.
> Some additional points:
> 1. the query returning 7% of the table is certainly not an extreme case. 
> The exact query criteria are specified by the user in the UI, and can be 
> much worse than this case. I have no control over the natur eof the 
> query that the user specifies.
> 2. Yes, if the query is much more selective the index can be a help.
> 3. The biggest data file in seg0 is 1452572672 bytes in size (e.g. 1.4GB).
> 4. the index was added after the rows were added.
> 5. making the index also have the pk_column as the second indexed field 
> makes it go like lightning! search runs in 2 secs, about 14x faster.
> So in summary it seems like an index will be of no help to me in this 
> situation, unless I make it an index of both columns.
> Many thanks
> Tim
> Tim Dudgeon wrote:
>> I found an interesting preformace problem that I'd welcome some help 
>> in inderstanding.
>> I have a table of 4,000,000 rows that has a DOUBLE column containing 
>> floating point numbers.
>> I run a query like this:
>> select pk_column from the_table where the_double_column > 300 and 
>> the_double_column < 320
>> 271136 rows are returned.
>> I then go through the ResultSet and extract all the id_column values.
>> All of this is done using standard JDBC.
>> When I do this it takes 23 seconds, which I though was not 
>> unreasonable as a full table scan was involved and the table was 
>> pretty big.
>> But of course I thought an index would help, so I added an index to 
>> the the_double_column and repeated the process. It took 720 seconds, 
>> 31x slower! I thought this was strange, but thought it might be 
>> because I was using standard settings and the 4MB page cache was much 
>> too small to hold the index.
>> So I increased the page cache size (derby.storage.pageCacheSize 
>> property) to 10x the size (10,000) and repeated the process. There was 
>> only a very minor improvement in speed.
>> In all cases the memory usage, as reported by:
>> Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
>> really no differnt, and the used memory was much less that the maximum 
>> available specified by the -Xmx setting.
>> Any ideas what to do?
>> Tim

View raw message