Thank you - as I suspected. Query #2 was improved only with the creation of an index on 'someString'
From: Mike Matrigali <email@example.com>
To: Derby Discussion <firstname.lastname@example.org>
Sent: Wednesday, September 9, 2009 3:59:11 PM
Subject: Re: Table of 13 rows still needs an index to make it performant?
T K wrote:
> I have a table with a blob and 13 rows in it; in the following, please notice the execution times:
> 1) Selecting the primary key:
> SELECT primkey from sometable
> Query 1 of 1 elapsed time (seconds) - Total: 0.032, SQL query: 0.016, Building output: 0.016
> 2) Select the primary key with WHERE clause:
> SELECT primkey from sometable WHERE someString = 'someValue'
> Query 1 of 1 elapsed time (seconds) - Total: 0.843, SQL query: 0.828, Building output: 0.015
> 3) Here's how many rows I have:
> select count(*) from sometable
> Does the sheer fact that the table has a blob, albeit not selected, affect query execution THAT much? Any suggestions to improve it?
Derby stores blobs in the same space as the other data, and without indexes the only way that derby can find the "next" row
is by reading
through all the pages after the current row. It stores indexes in
separate files where the structure of the index lets it do direct lookups in the base table. The Blobs are stored
as linked lists of pages. So performance is dependent on how the rows
get laid out in the table. The worst case would be 1 row on page one
followed by 2 gig of blob data, and then followed by another row.
When an index is used Derby can jump directly to the page containing
the beginning of the row. And when processing a single row it need not
read the blob if it is not part of the query.
In the case of query 1 it probably never even looked at the base table,
just used the index which given the number of rows is probably a one page cached index.
For query 2 it would be interesting to know if it is picking to use the index to do probes or not.
printing the query plan for both would
tell exactly what it is doing.