db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bergquist, Brett" <Br...@canoga.com>
Subject RE: Question/observations on purged rows and insert rates
Date Fri, 06 Aug 2010 17:18:47 GMT
Thanks for the response Mike, next week I will put together a test program that I can use to
attach to a Jira issue on this.  

This info on the PURGE_ROWS option is go.  This should probably go into the manual as to why
one would want to use it.  Now I understand better.  

In my case, the table in question has mostly integer data types with no clobs/blobs.  Rows
are being added to the table in increasing time order and older rows (older than 30 days for
example) are being deleted.

Just for a little more background, for the test that I did, only this table is updated, database
and logging are being done on the same file system, multiple threads are inserting rows into
the table simultaneously.  After inserting a few million rows and observing the insert rate,
the insertion process was suspended, 1.5 million rows were removed by a single thread, and
the insertion process was restarted.  No shutdown of the database occurred.  Resuming the
insertion process saw the dramatic decrease in insertion speed.

A shutdown on the database was done, a compact was performed using the embedded driver to
access the database, and the above test was performed.  After the compact, the speed, returned
to normal.

I agree with your approach on using separate tables.  More research lead me to find that this
"partitioning" is supported native in some databases.  I prefer not to move to another database
as so far over the last 4 or 5 years, Derby has worked well for us in a "zero administration"
kind of way.  I would hate to give this up.  I probably will look into doing this "partitioning"
via procedure and convention.

Thanks for your help.


-----Original Message-----
From: Mike Matrigali [mailto:mikem_app@sbcglobal.net] 
Sent: Friday, August 06, 2010 1:00 PM
To: derby-dev@db.apache.org
Subject: Re: Question/observations on purged rows and insert rates

Bergquist, Brett wrote:
> I posted the following to the users list but saw no response, so I 
> thought I would post here as it may be more relevant to developers.
> After some testing, it seems that Derby is reusing the space of deleted 
> records before allocating more space from the file system.  If this is 
> the case then what use does the call:
>                 call 
> syscs_util.syscs_inplace_compress_table(SCHEMA,TABLE, 1, 0, 0);
> have?  Basically what does the PURGE_ROWS option do that is above and 
> beyond what is being done by Derby already to reuse the space of deleted 
> records?
If you have no clobs/blobs in your table then derby by default will only 
reclaim space once it thinks it can get a whole page.  So it waits
until the LAST row on a page has been deleted before attempting to
reclaim the space and make a full page available for subsequent use.
The purge option of compress instead trys to reclaim the space of every
single deleted row in the table.

 From what you have writing it seems like most of your rows are likely
to be in order in the file and you tend to delete them in chunks that
are in order, so I would assume that the default space reclamation would
work well in your case.
> Also after testing we are seeing the following. With a database with no 
> deleted rows, my test application is inserting about 150 records/second 
> into a table.  I let this run for about 2 million records and the insert 
> rate is consistent.  Now I purge out 1.5 million records and run the 
> test again.   
Does the above sentence mean you sql delete the first 1.5 million records?
The insert rate is now about 25 records/second.  Running
> the above compress with the PURGE_ROWS option and rerun the test and 
> still about 25 records/second.  Run full SYSCS_UTIL.SYSCS_COMPRESS_TABLE 
> and rerun the test and the insert rate is back to 150 records/second.
I am not sure what is going on.  The best thing you could do if you want
more help is to post a repro with as much detail as possible, this just
answers a lot a questions about what you are doing rather than having
to guess (where shutdowns happen, page size, blobs/clobs?, log on 
separated disk or not, where commits happen, inserts single threaded, 
deletes single threaded, ...).  We don't need your real data, but table 
and index ddl that match your app would be best - with a program that 
generated dummy data for inserts that reproduces the issue.  Go ahead 
and log performance JIRA issue
if you can reproduce this is a simple java program.
> The reduced insert rate because of deleted records is very much a 
> problem.  We have a table that gets about 700K records inserted per day 
> and purges out 30 days old data at about 700K records per day.  This has 
> a great effect on our insert rate.  Why the big hit because of deleted 
> records and can anything other than a compress help?  This process has 
> no downtime so running a compress can only be done maybe once a month.

I will just note that space reclamation of deleted records does generate
"background" work.  And this background work can affect concurrent
insert peformance.

If your application can handle the complications with using multiple 
tables, I would suggest you go that way.  That is the approach I have
seen most applications like this use across a wide range of databases.
Inserts then are going to be best case performace, and reclaiming the
space can be done "online" without affecting the data you might need
to access.  Dropping an entire table is always going to be less overhead
on the system then deleting the same number of rows from a table, as in
the later case each row will be logged, and an number of cpu actions are
taken for manipulating the bytes on the pages for each delete.  In the
case of a drop we simply log the drop table and delete the file from
the file system.

You could use table functions or views to make it easier to query the 
"current" set of data.
> Thanks for an feedback.
> Brett

View raw message