db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From The Wogster <wogste...@yahoo.ca>
Subject Re: Behaviour of SYSCS_COMPRESS_TABLE
Date Wed, 01 Jun 2005 15:28:39 GMT
Øystein Grøvlen wrote:
>>>>>>"TW" == The Wogster <wogsterca@yahoo.ca> writes:
>     TW> Øystein Grøvlen wrote:
>     >> Is this also true for B-tree indexes?  I would imagine that if you
>     >> have a index on a monotocally increasing key (e.g., a timestamp) and
>     >> where you regularly delete old records, there may be a lot of empty
>     >> B-tree pages that will never be possible to reuse.
>     >> 
>     TW> What happens in most databases. is  that the database has a fixed page
>     TW> size, say 8K, when  an index page is full, it splits  that page into 2
>     TW> half pages. When an index page  is empty it's dropped from the index,
>     TW> and  added to  the  empty page  pool.  Many will  merge almost  empty
>     TW> neighbouring pages, but that doesn't matter for this discussion.
> I know this.  The reason I asked was because I have got the impression
> that in Derby the only way to drop empty index pages is to do
> compression.

Derby should work in a similar way to other databases, the techniques 
for developing a database were established years ago.  When it comes to 
indexes there are 3 technologies:

ISAM: The index is loaded into an array in memory, it adds and deletes 
index members through moving pointers around, then dumps the index back 
to disk when the database is closed:  Dbase worked this way at one time.

B-Tree, traditional B-tree isn't used much anymore, because it's easy to 
get an off-balance tree, which ends up very inefficient.

Balanced B-Tree, most databases use this one, logic in the indexing code 
  is meant to keep shifting the tree around so that it stays in balance. 
  Requires that the indexes at least be page based, so most page based 
databases use this one.  Most efficient in reads, can be slower at adds 
and deletes because of the shifting around, but for most databases there 
are 100 reads for every add or delete.

For dropping index pages, look at the source code and see what it does 
on a delete, if it releases the page to the empty page pool (more like a 
cache actually), then it doesn't matter whether you compress or not.

One thing you need to remember, to shorten (if possible) or lengthen a 
file are expensive operations, this is why most databases add a bunch of 
pages to the empty page pool, rather then add a single page at a time. 
They also reuse empty pages within the file, because to reuse an 
existing page means changing a pointer, adding to or deleting from the 
file means kernel operations, which are much slower.

Compression can be good though, as a database ages, an index can have 
the first page on page 25, the next on page 234 535 the next on page 43,
and these pages can be all over the disk volume as well, so your moving 
the heads all over the place to find where these pages are. Compress a 
database, follow this by a disk defrag can make the database faster.


View raw message