couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: _compact on 0.10.0 <> availability
Date Tue, 13 Apr 2010 18:01:17 GMT
On Apr 13, 2010, at 1:48 PM, till wrote:

> On Tue, Apr 13, 2010 at 7:28 PM, Adam Kocoloski <kocolosk@apache.org> wrote:
>> On Apr 13, 2010, at 12:39 PM, J Chris Anderson wrote:
>> 
>>> 
>>> On Apr 13, 2010, at 9:31 AM, till wrote:
>>> 
>>>> Hey devs,
>>>> 
>>>> I'm trying to compact a production database here (in hope to recover
>>>> some space), and made the following observations:
>>>> 
>>>> * the set is 212+ million docs
>>>> * currently 0.8 TB in size
>>>> * the instance (XL) has 2 cores, one is idle, the other maybe utilized at
10%
>>>> * memory - 2 of 15 GB taken, no spikes
>>>> * io - well it's EBS :(
>>>> 
>>>> When I started _compact read operations slowed down (I'll give you 20
>>>> Mississippi's for something that loads instantly otherwise).
>>>> Everything "eventually" worked, but it slowed down tremendously.
>>>> 
>>>> I restarted the CouchDB process and everything is back to "snap".
>>>> 
>>>> Does anyone have any insight on why that is the case?
>>> 
>>> I'm guessing this is an EBS / EC2 issue. You are probably saturating the IO pipeline.
It's too bad there's not an easy way to 'nice' the compaction IO.
>>> 
>>> If you got unlucky and are on a particularly bad EBS / EC2 instance, you might
do best to start up a new Couch in the same availability zone and replicate across to it.
This will accomplish more-or-less the same effect as compaction.
>>> 
>>>> 
>>>> Till
>>> 
>> 
>> I'm surprised it's _that_ bad.  The compactor only submits one I/O to EBS at a time,
so I wouldn't expect other reads to be starved too much.  On the other hand, I'll bet compacting
a DB that large takes at least a month, especially if you used random IDs.
>> 
>> On the other hand, when you compact you're messing with the page cache something
fierce.  At 212M docs you need every one of those 16GB of RAM to keep the btree nodes cached.
 The compactor a) reads nodes that your client app may not have been touching and b) writes
to a new file and the kernel starts to cache that too.  So it's a fairly brutal process from
the perspective of the page cache.
> 
> I was looking at my fancy htop when it started to slow down and
> neither RAM or CPUs were fully utilized. I mean, not even 50%. That's
> what surprises me.

That's not surprising to me.  CouchDB doesn't do much active caching, but it relies extensively
on the page cache.  Presumably "cat /proc/meminfo | grep ^Cached" shows a value near 14 GB.

>> Does anyone have a sense of how deep a btree with 212M entries will be?  That is,
how many pread calls are required to pull up a doc?
>> 
>> Till, do you have iostat numbers from the compaction run?
> 
> root@box:~# iostat
> Linux 2.6.21.7-2.fc8xen (couchdb01.east1.aws.easybib.com) 	04/13/2010
> 	_x86_64_	(4 CPU)
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           0.91    0.00    0.08    8.43    1.38   89.21
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sdb               0.00         0.00         0.00       4578       1008
> sdc               0.00         0.00         0.00        776          0
> sdd               0.00         0.00         0.00        776          0
> sde               0.00         0.00         0.00        776          0
> sda1              0.16         1.09         4.12    3552818   13408432
> sdg              13.63       133.43       106.81  433818674  347266448
> sdh              13.54        94.30       212.11  306595821  689630885
> sdi              13.38        94.23       212.93  306366410  692284040
> sdk               1.91        46.01        73.82  149575695  239999486
> md0              27.04       188.53       425.04  612960367 1381916061

Those are the numbers since system boot.  You'd need to specify an interval afterwards to
get the report during the compaction.  I also like the -x flag.  So something like

iostat -x 4

will give you one report from system boot, then subsequent reports at 4 second intervals showing
the stats from that time period.  Best,

Adam


Mime
View raw message