db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Indexing speed in presence of BLOB columns
Date Mon, 10 Jan 2011 19:11:38 GMT
Trejkaz wrote:
> Hi all.
> 
> I have been doing some performance testing of the time it takes to
> create an index on a table with BLOB data.  The schema:
> 
>     CREATE TABLE binary (
>         id INTEGER NOT NULL,
>         binary BLOB(1G) NOT NULL
>     )
> 
> Each time, 10,000 rows are inserted with data of a given size and then
> an index is created at the end:
> 
>     CREATE INDEX binary_id ON binary (id)
> 
> Times are an average of three runs after some warmup runs which are not counted.
> 
> 1 kB blobs:
>   Index: 567 ms
>   Insert: 78 ms
> 
> 10 kB blobs:
>   Insert: 3954 ms
>   Index: 515 ms
> 
> 100 kB blobs:
>   Insert: 56307 ms
>   Index: 20591 ms
> 
> 1 MB blobs:
>   Insert: 521904 ms
>   Index: 122527 ms
> 
> Surprisingly, the larger the data which is present in the BLOB column,
> the longer it takes to index.  Since the indexing process shouldn't
> need to read the BLOB data in order to index the row, I am surprised
> that it slows down when more data is present.  What is going on here
> exactly?  Is Derby physically copying the BLOB data to a new location
> and then deleting the original copy instead of performing some kind of
> cheap move operation?  Is there some way to avoid this (which won't
> require changing the schema)?
> 
> Daniel
> 
This is the expected behavior.  Derby base tables are very "basic" and 
thus the need for indexes.  In order to do a scan of the entire base 
table to build the index every page in the base table needs to be read
from disk.  Even though we don't actually "read" the blob data for 
creating the index the system still needs to bring in each page from 
disk to memory to see if it has any rows.  Basically the scan of the 
base table is to loop from page 1 to the last page and check if each 
page is a head page with rows and read the rows.  The blob data is 
stored in the same file as the main pages.

Mime
View raw message