couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: runaway compaction
Date Tue, 23 Dec 2008 01:29:03 GMT
It's a known issue that compaction maybe cannot complete under heavy  
write load. At some point maybe we should implement a mechanism to  
throttle writes if the compaction isn't making enough progress during  
updates.

-Damien


On Dec 22, 2008, at 7:32 PM, Adam Kocoloski wrote:

> Hi, I ran into an odd failure mode last week and I thought I'd ask  
> around here to see if anyone has seen something similar.  I have a  
> CouchDB server (recent trunk) on a large EC2 instance with a DB that  
> sees a constant update rate of ~50 Hz.  I triggered a compaction  
> when the DB had reached ~27M update sequences (80 GB in total).  The  
> first pass finished after 7h40m, but of course another 1.4M updates  
> had been written to the original DB.  So far, so good.
>
> Unfortunately, the subsequent iterations of copy_compact() ran much  
> slower than that original pass.  After a few passes, the compactor  
> rate was equal to the new write rate, so it effectively entered a  
> runaway mode.  The stats looked like
>
> Pass 1:  7h40m    27870955 docs   1010 Hz
> Pass 2:  3h44m     1473387 docs    110 Hz
> Pass 3:  2h58m      617008 docs     58 Hz
> Pass 4:  2h44m      450607 docs     46 Hz
> .....
> Pass 23: 4h08m      719541 docs     48 Hz
> Pass 24: 1h04m      436105 docs    113 Hz
> Pass 25: 21 seconds -- done.
>
> We stopped the new write load sometime after the end of Pass 23, and  
> the compaction finished soon after that.
>
> We turned the write load back on and have been compacting the DB  
> once/day ever since.  We haven't seen this runaway mode again.  I've  
> reviewed the compaction code a couple of times, but I can't figure  
> out what would cause such a dramatic slowdown.  Our system  
> monitoring wasn't able to turn up any red flags, either -- in  
> particular, all the latency/throughput/IOPS stats for the disk  
> hosting the database were pretty much constant throughout the  
> lifetime of the compaction.
>
> Best, Adam


Mime
View raw message