incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Bartell <snbart...@gmail.com>
Subject Re: tinkering with limits while replicating
Date Tue, 05 Feb 2013 18:56:03 GMT
Nathan, 

I dropped the pool size down to 500 and still the same story.  I also tried lower the number
of replicator processes down to 1 per replicator.  Again same thing.  

All the while, I keep an eye on how much memory beam.smp consumes during one replication "wave"
and it never exceeds 2%.  So Im reluctant to think that the os is running out of memory. 
It does seem like theres some sort of process contention however.  The error code that the
replicators are reporting while trying to POST is 503.  I assume that this is for the web
server being unavailable.

Yes Im going to add filtering on top of this, and I think Im going to need to do those in
eel, although Id like to try first to avoid it.

This is probably a dumb question, do I need to restart couch after changes with these settings?


On Feb 5, 2013, at 10:22 AM, Nathan Vander Wilt <nate-lists@calftrail.com> wrote:

> Hi Stephen,
> 
> I've been doing some tests related to replication lately too (continuous+filtered in
my case). I suspect the reason Futon hangs is because your whole VM is running out of RAM
due to your very high os_process_limit. I went in to a bit more detail in http://mail-archives.apache.org/mod_mbox/couchdb-dev/201302.mbox/%3c70278F4A-FD08-4818-89B7-EA1B0AF846F5@gmail.com%3e
but this setting basically determines the size of the couchjs worker pool — you'd probably
rather have a bit of contention for the pool at a reasonable size (maybe ~100 per GB free,
tops?) than start paging.
> 
> hth,
> -natevw
> 
> 
> 
> On Feb 4, 2013, at 5:15 PM, Stephen Bartell wrote:
> 
>> Hi all,
>> 
>> I'm hitting some limits while replicating , I'm hoping someone could advise.  
>> Im running this in a VM on my macbook with the following allocated resources:
>> ubuntu 11.04
>> 4 cores @ 2.3ghz
>> 8 gb mem
>> 
>> I'm doing a one-to-many replication.  
>> 1) I create one db named test. 
>> 2) Then create [test_0 .. test_99] databases.  
>> 3) I then set up replications from test -> [test_0 .. test_99].  100 replications
total.
>> 4) I finally go to test and create a doc, hit save.
>> 
>> When I hit save, futon becomes completely unresponsive for around 10sec.  It eventually
returns to normal behavior.
>> 
>> Tailing the couchdb log I find waves of the following errors:
>> [Tue, 05 Feb 2013 00:46:26 GMT] [info] [<0.6936.1>] Retrying POST request to
http://admin:*****@localhost:5984/test_25/_revs_diff in 1.0 seconds due to error {code,503}
>> 
>> I see that the replicator is finding the server to be unresponsive.  The waves of
these messages show that replicator retries in 0.25 sec, then 0.5 sec, then 1sec, then 2sec.
 This is expected.  Everything settles done after about 4 retries.  
>> 
>> So my first thought is resource limits.  I threw the book at it and set :
>> 1) max_dbs_open: 500
>> 2) os_process_limit: 5000
>> 3) http_connections: 20000
>> 4) ulimit -Sn 4096 (the hard limit is 4096)
>> 
>> I really don't know whats reasonable for these values relative to how many replications
I am setting up.  So these values, save max_dbs_open,  are all stabs in the dark.
>> 
>> No change in performance.
>> 
>> So, I'm at a loss now.  what can I do to get all this to work? Or what am I doing
wrong?  And note that this is only a test.  I aim to quadruple  the amount of replications
and have lots and lots of insertions on the so called "test" database.  Actually, there will
be several of these one-to-many databases.
>> 
>> I've heard people get systems up to thousands of dbs and replicators running just
fine.  So I hope Im just not offering to right sacrifices up to couchdb yet.
>> 
>> Thanks for any insight,
>> 
>> sb
>> 
> 


Mime
View raw message