couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Clary <...@bclary.com>
Subject Re: Compaction Strategies
Date Tue, 08 Mar 2011 12:43:58 GMT
On 3/8/11 1:09 AM, Ian Hobson wrote:
> On 02/03/2011 19:33, Wayne Conrad wrote:
>> We run a compaction script that compacts every database every night.
>> Compaction of our biggest (0.6 TB) database took about 10 hours today.
>> Granted, the hardware has poor I/O bandwidth, but even if we improve
>> the hardware, a change in strategy could be good. Along with splitting
>> that database into more manageable pieces, I hope to write a
>> compaction script that only compacts a database sometimes (a la
>> Postgresql's autovacuum). To do that, I want some way to estimate
>> whether there's anything to gain from compacting any given database.
>>
>> I thought I could use the doc_del_count returned by GET
>> /<database-name> as a gauge of whether to compact or not, but in my
>> tests doc_del_count remained the same after compaction. Are there any
>> statistics, however imperfect, that could help my code guess when
>> compaction ought to be done?
>>
> Just a thought.
>
> After compacting, the database will have a given size on disk. Would it
> be possible to test, and compact if this grew by (say) 15%?
>
> Its not perfect - but it might be better than time.

Wayne,

You say that your database size is 0.6 TB. What is the change in size 
during the day? What is the change in size after the compaction? If your 
database is not increasing appreciably in size during the day and if the 
compacted database size is not appreciably smaller than the 
pre-compaction size, I don't think you are gaining much by compacting 
once per day. In fact, you are taking a significant performance hit if 
your compaction is running for 10 hours every day.

Perhaps a simple change in compaction schedule from once per day to once 
per N days will help in the short term. Similarly to Robert's 
suggestion, I keep track of the initial size of the database as well as 
the initial sizes of each of the views and compact them whenever they 
double in size.

Of course you will need to tailor the trigger point so as to not reach a 
point where you do not have sufficient disk space to complete the 
compaction.

Until Bob Dionne's patch is released I think that is the best you can 
achieve.

Unless compaction performance is significantly improved, you also need 
to consider the case where once your database grows to a sufficiently 
large size, your database will reach a state where it is always being 
compacted, i.e., it will take so long to complete compaction that it 
will need to be compacted again immediately after it finishes.

/bc

Mime
View raw message