db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Question/observations on purged rows and insert rates
Date Fri, 06 Aug 2010 16:59:33 GMT
Bergquist, Brett wrote:
> I posted the following to the users list but saw no response, so I 
> thought I would post here as it may be more relevant to developers.
> After some testing, it seems that Derby is reusing the space of deleted 
> records before allocating more space from the file system.  If this is 
> the case then what use does the call:
>                 call 
> syscs_util.syscs_inplace_compress_table(SCHEMA,TABLE, 1, 0, 0);
> have?  Basically what does the PURGE_ROWS option do that is above and 
> beyond what is being done by Derby already to reuse the space of deleted 
> records?
If you have no clobs/blobs in your table then derby by default will only 
reclaim space once it thinks it can get a whole page.  So it waits
until the LAST row on a page has been deleted before attempting to
reclaim the space and make a full page available for subsequent use.
The purge option of compress instead trys to reclaim the space of every
single deleted row in the table.

 From what you have writing it seems like most of your rows are likely
to be in order in the file and you tend to delete them in chunks that
are in order, so I would assume that the default space reclamation would
work well in your case.
> Also after testing we are seeing the following. With a database with no 
> deleted rows, my test application is inserting about 150 records/second 
> into a table.  I let this run for about 2 million records and the insert 
> rate is consistent.  Now I purge out 1.5 million records and run the 
> test again.   
Does the above sentence mean you sql delete the first 1.5 million records?
The insert rate is now about 25 records/second.  Running
> the above compress with the PURGE_ROWS option and rerun the test and 
> still about 25 records/second.  Run full SYSCS_UTIL.SYSCS_COMPRESS_TABLE 
> and rerun the test and the insert rate is back to 150 records/second.
I am not sure what is going on.  The best thing you could do if you want
more help is to post a repro with as much detail as possible, this just
answers a lot a questions about what you are doing rather than having
to guess (where shutdowns happen, page size, blobs/clobs?, log on 
separated disk or not, where commits happen, inserts single threaded, 
deletes single threaded, ...).  We don't need your real data, but table 
and index ddl that match your app would be best - with a program that 
generated dummy data for inserts that reproduces the issue.  Go ahead 
and log performance JIRA issue
if you can reproduce this is a simple java program.
> The reduced insert rate because of deleted records is very much a 
> problem.  We have a table that gets about 700K records inserted per day 
> and purges out 30 days old data at about 700K records per day.  This has 
> a great effect on our insert rate.  Why the big hit because of deleted 
> records and can anything other than a compress help?  This process has 
> no downtime so running a compress can only be done maybe once a month.

I will just note that space reclamation of deleted records does generate
"background" work.  And this background work can affect concurrent
insert peformance.

If your application can handle the complications with using multiple 
tables, I would suggest you go that way.  That is the approach I have
seen most applications like this use across a wide range of databases.
Inserts then are going to be best case performace, and reclaiming the
space can be done "online" without affecting the data you might need
to access.  Dropping an entire table is always going to be less overhead
on the system then deleting the same number of rows from a table, as in
the later case each row will be logged, and an number of cpu actions are
taken for manipulating the bytes on the pages for each delete.  In the
case of a drop we simply log the drop table and delete the file from
the file system.

You could use table functions or views to make it easier to query the 
"current" set of data.
> Thanks for an feedback.
> Brett

View raw message