incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kemkes <a5s...@yahoo.com>
Subject Re: How many filtered replications is too many?
Date Fri, 06 Jul 2012 22:19:41 GMT
Hi Mathias:

Yes, you're analysis looks like it is spot on.  There are advantages though to use the replication
feature - looking at the _changes feed, I'm not immediately clear on how I would achieve the
same behavior (e.g., deleted documents) - maybe due to my lack of exposure to couch details
- even the _changes feed was new to me.

Benoit's suggestion is along the same lines and very interesting but requires an even newer
couch installation than I currently have in place.

That said, I also looked into if it is disk seek time that causes it, but iostat and its numbers
for iowait suggest otherwise:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          90.36    0.00    3.71    0.10    5.19    0.64

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdf             49.05       799.60        31.57       8004        316

I see many couchjs processes running that require a constant CPU load of 6-9% each in top.
Is that expected?

Tailing the log files for POSTs also makes me wonder about scheduling fairness among the replications.
 I see activity mostly for a small number of the target databases.  Do you know how this
is being handled?

Thanks,

Andreas



________________________________
 From: Mathias Leppich <mleppich@muhqu.de>
To: user@couchdb.apache.org; Andreas Kemkes <a5sk4s@yahoo.com> 
Sent: Friday, July 6, 2012 2:37 AM
Subject: Re: How many filtered replications is too many?
 
Hi Andreas,

If you say you want to split one large database in many smaller ones as a one-time task, its
probably more efficient to write a script that reads the _changes feed of the large database
and then decides where to put each document. Compared to the 200 filtered replications you
will only need to read the changes feed 1 time instead of 200 times in parallel which will
result in very poor performance because of disk seek times…

Such a migration script is only a few lines of code. And the _changes feed also lets you catchup
after an initial split, you just need to log the passed seq number to know where you left
and start over.

- mathias

On Jul 6, 2012, at 3:37 , Andreas Kemkes wrote:

> I'm trying to split up a monolithic database into smaller ones using filtered continuous
replications in couchdb 1.2.
> 
> I need about 200 of these replications (on a single server) and would like to parallelize
as much as possible.  Yet, when I do, the cpu load gets very high and the system seems to
be crawling, replication seems to be slow, and I'm seeing timeout and other errors.
> 
> How can I best determine what the bottleneck is?
> 
> Are there suggestions on how to configure couchdb to handle it better (I've increased
max_dbs_open to 200)?
> 
> How do I best achieve good throughput?
> 
> This will be a one-time task, so any large measurement / monitoring effort is probably
overkill.
> 
> Any suggestions are much appreciated (including suggestions for different approaches).
> 
> Thanks,
> 
> Andreas
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message