db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Newsham" <jnews...@referentia.com>
Subject RE: excessive disk space allocation
Date Tue, 21 Oct 2008 21:20:56 GMT


> -----Original Message-----
> From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
> Sent: Monday, October 20, 2008 9:27 PM
> To: Derby Discussion
> Subject: Re: excessive disk space allocation
> 
> Jim Newsham <jnewsham@referentia.com> writes:
> 
> > Hi,
> >
> > I'm doing some benchmarking of our application which stores data in
> derby.
> > The parts of the application which I am exercising only perform inserts,
> not
> > deletes.  The results suggest that derby disk space allocation is
> excessive,
> > particularly because compressing the tables reduces the size of the
> database *
> > substantially*.  For example, here are the results of several databases,
> both
> > before and after compression.
> >
> > Application running time.  original -> compressed
> >
> > 0.5 days.  178.2mb -> 63.1mb
> >
> > 1 day.  559.3mb -> 82.8mb
> >
> > 2 days.  1,879.1mb -> 120.8mb
> >
> > 4 days.  5,154.4mb -> 190.5mb
> >
> > 8 days. 11,443.7mb -> 291.6mb
> >
> > 16 days.  23,706.7mb -> 519.3mb
> >
> > Plotting the data, I observe that both uncompressed and compressed sizes
> > appear to grow linearly, but the growth factor (slope of the linear
> equation)
> > is 53 times as large for the uncompressed database.  Needless to say.
> this is
> > huge.
> >
> > I expected that with only inserts and no deletes, there should be little
> or no
> > wasted space (and no need for table compression).  Is this assumption
> > incorrect?
> 
> Hi Jim,
> 
> You may have come across a known issue with multi-threaded inserts to
> the same table:
> 
> http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
> https://issues.apache.org/jira/browse/DERBY-2337
> https://issues.apache.org/jira/browse/DERBY-2338

Thanks for those links.  I used the diagnostic dump program from the
mentioned discussion thread to see how much the individual tables in my
database are compacting.

The "multi-threaded inserts to the same table" theory doesn't quite jive
here.  In my case, I have multiple threads inserting into the database, but
most of the data goes into tables which are only inserted into by a single
thread for the duration of the application.  

There are only two tables inserted into by more than one thread, and the
data they contain is relatively small (a few percent).  For a test database
I'm looking at right now, these two tables compress to 50% and 90% of
original size, respectively... not much at all.

By contrast, I am seeing most of the other tables (which aren't inserted
into by more than one thread) compress to between 0.5% and 3.8% of original
size.  For example, I see one table go from 783 pages to 4 pages.

Jim




Mime
View raw message