db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Behaviour of SYSCS_COMPRESS_TABLE
Date Wed, 01 Jun 2005 17:33:08 GMT
Wogster makes a good point, SYSCS_COMPRESS_TABLE may help get data
in tables to be more contiguous.

The Wogster wrote:
> Øystein Grøvlen wrote:
> 
>>>>>>> "TW" == The Wogster <wogsterca@yahoo.ca> writes:
>>
>>
>>
>>     TW> Øystein Grøvlen wrote:
>>
>>     >> Is this also true for B-tree indexes?  I would imagine that if you
>>     >> have a index on a monotocally increasing key (e.g., a 
>> timestamp) and
>>     >> where you regularly delete old records, there may be a lot of 
>> empty
>>     >> B-tree pages that will never be possible to reuse.
>>     >>
>>
>>     TW> What happens in most databases. is  that the database has a 
>> fixed page
>>     TW> size, say 8K, when  an index page is full, it splits  that 
>> page into 2
>>     TW> half pages. When an index page  is empty it's dropped from the 
>> index,
>>     TW> and  added to  the  empty page  pool.  Many will  merge 
>> almost  empty
>>     TW> neighbouring pages, but that doesn't matter for this discussion.
>>
>> I know this.  The reason I asked was because I have got the impression
>> that in Derby the only way to drop empty index pages is to do
>> compression.
>>
> 
> Derby should work in a similar way to other databases, the techniques 
> for developing a database were established years ago.  When it comes to 
> indexes there are 3 technologies:
> 
> ISAM: The index is loaded into an array in memory, it adds and deletes 
> index members through moving pointers around, then dumps the index back 
> to disk when the database is closed:  Dbase worked this way at one time.
> 
> B-Tree, traditional B-tree isn't used much anymore, because it's easy to 
> get an off-balance tree, which ends up very inefficient.
> 
> Balanced B-Tree, most databases use this one, logic in the indexing code 
>  is meant to keep shifting the tree around so that it stays in balance. 
>  Requires that the indexes at least be page based, so most page based 
> databases use this one.  Most efficient in reads, can be slower at adds 
> and deletes because of the shifting around, but for most databases there 
> are 100 reads for every add or delete.
> 
> For dropping index pages, look at the source code and see what it does 
> on a delete, if it releases the page to the empty page pool (more like a 
> cache actually), then it doesn't matter whether you compress or not.
> 
> One thing you need to remember, to shorten (if possible) or lengthen a 
> file are expensive operations, this is why most databases add a bunch of 
> pages to the empty page pool, rather then add a single page at a time. 
> They also reuse empty pages within the file, because to reuse an 
> existing page means changing a pointer, adding to or deleting from the 
> file means kernel operations, which are much slower.
> 
> Compression can be good though, as a database ages, an index can have 
> the first page on page 25, the next on page 234 535 the next on page 43,
> and these pages can be all over the disk volume as well, so your moving 
> the heads all over the place to find where these pages are. Compress a 
> database, follow this by a disk defrag can make the database faster.
> 
> W
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Mime
View raw message