db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Re: excessive disk space allocation
Date Thu, 23 Oct 2008 15:26:19 GMT
On 23.10.08 02:26, Jim Newsham wrote:
> Hasn't been a lot of response to this thread.  I have a 23gb database
> holding only 500mb of data, all created with just inserts (no deletes).  For
> our app, this is a serious problem.

Hi Jim,

I would say this is a serious problem for any app...
Do you have a way to reproduce this behavior?
If so, it would be very helpful if you could attach the repro to a 
Jira[1] issue.


> 
> Someone suggested the problem is caused by multi-threaded inserts, but the
> tables which exhibit the problem were only inserted into by a single thread,
> each.  
> 
> Any suggestions? 
> 
> Is there a way to tell, before compacting, how much space would be saved by
> compacting a table?  With this information, at least I would be able to
> periodically compact just those tables which merit being compacted, as a
> workaround to the real problem.

I don't remember the details, but if you have a look at the source for 
"DerbyDiskSpaceDiag.java" attached to DERBY-2549[2] you should be able 
to figure out the SQL required. I'm sure it's in the manual somewhere too.
You can also just compile and run the program (preferably on a copy / 
non-production db) and see what it says. With 23 GB of data, I think it 
will chew for a while before you see any output.



-- 
Kristian

[1] https://issues.apache.org/jira/browse/DERBY
[2] https://issues.apache.org/jira/browse/DERBY-2549


> 
> Thanks,
> Jim
> 
>> -----Original Message-----
>> From: Jim Newsham [mailto:jnewsham@referentia.com]
>> Sent: Tuesday, October 21, 2008 11:21 AM
>> To: 'Derby Discussion'
>> Subject: RE: excessive disk space allocation
>>
>>
>>
>>> -----Original Message-----
>>> From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
>>> Sent: Monday, October 20, 2008 9:27 PM
>>> To: Derby Discussion
>>> Subject: Re: excessive disk space allocation
>>>
>>> Jim Newsham <jnewsham@referentia.com> writes:
>>>
>>>> Hi,
>>>>
>>>> I'm doing some benchmarking of our application which stores data in
>>> derby.
>>>> The parts of the application which I am exercising only perform
>> inserts,
>>> not
>>>> deletes.  The results suggest that derby disk space allocation is
>>> excessive,
>>>> particularly because compressing the tables reduces the size of the
>>> database *
>>>> substantially*.  For example, here are the results of several
>> databases,
>>> both
>>>> before and after compression.
>>>>
>>>> Application running time.  original -> compressed
>>>>
>>>> 0.5 days.  178.2mb -> 63.1mb
>>>>
>>>> 1 day.  559.3mb -> 82.8mb
>>>>
>>>> 2 days.  1,879.1mb -> 120.8mb
>>>>
>>>> 4 days.  5,154.4mb -> 190.5mb
>>>>
>>>> 8 days. 11,443.7mb -> 291.6mb
>>>>
>>>> 16 days.  23,706.7mb -> 519.3mb
>>>>
>>>> Plotting the data, I observe that both uncompressed and compressed
>> sizes
>>>> appear to grow linearly, but the growth factor (slope of the linear
>>> equation)
>>>> is 53 times as large for the uncompressed database.  Needless to say.
>>> this is
>>>> huge.
>>>>
>>>> I expected that with only inserts and no deletes, there should be
>> little
>>> or no
>>>> wasted space (and no need for table compression).  Is this assumption
>>>> incorrect?
>>> Hi Jim,
>>>
>>> You may have come across a known issue with multi-threaded inserts to
>>> the same table:
>>>
>>> http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
>>> https://issues.apache.org/jira/browse/DERBY-2337
>>> https://issues.apache.org/jira/browse/DERBY-2338
>> Thanks for those links.  I used the diagnostic dump program from the
>> mentioned discussion thread to see how much the individual tables in my
>> database are compacting.
>>
>> The "multi-threaded inserts to the same table" theory doesn't quite jive
>> here.  In my case, I have multiple threads inserting into the database,
>> but
>> most of the data goes into tables which are only inserted into by a single
>> thread for the duration of the application.
>>
>> There are only two tables inserted into by more than one thread, and the
>> data they contain is relatively small (a few percent).  For a test
>> database
>> I'm looking at right now, these two tables compress to 50% and 90% of
>> original size, respectively... not much at all.
>>
>> By contrast, I am seeing most of the other tables (which aren't inserted
>> into by more than one thread) compress to between 0.5% and 3.8% of
>> original
>> size.  For example, I see one table go from 783 pages to 4 pages.
>>
>> Jim
>>
>>
>>
> 
> 
> 


Mime
View raw message