From user-return-23407-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Tue Feb 5 18:56:36 2013 Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0F635E513 for ; Tue, 5 Feb 2013 18:56:36 +0000 (UTC) Received: (qmail 68995 invoked by uid 500); 5 Feb 2013 18:56:34 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 68850 invoked by uid 500); 5 Feb 2013 18:56:34 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 68842 invoked by uid 99); 5 Feb 2013 18:56:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 18:56:34 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of snbartell@gmail.com designates 209.85.210.44 as permitted sender) Received: from [209.85.210.44] (HELO mail-da0-f44.google.com) (209.85.210.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 18:56:28 +0000 Received: by mail-da0-f44.google.com with SMTP id z20so193568dae.17 for ; Tue, 05 Feb 2013 10:56:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=sYdRDmHEXq6NpkQEV7oxPFmZyVlpemiv+SB7lwSvNWY=; b=wwMngVKPFX3HOccLeDZShn3n9ZLkXT92GiROWz4t8DbgaAT1NY1bul4JyQLNkfKx4g RqwmpzvOhPvLUUDxa9fOWwls0s9mfpG+3+gB6NmcfQi2RIbNMZpkMd6WAN52lcsJMhMU 87kfGIv8+/wY+5TtTgWZePxFqllGFnqk/fy0qIhxOXJijMo6WUV/ZMOxCxouS47Z4nDH 37LbWAkaZPasvCh+imBOby1+yhRtLfWsPPOzWm2lUYe4S1mUPRApYG60sh0jBNeZU1pO /spVXTKmPCU46rzI2hOTK+OIp+zycAVWeh0ro7Q1em1IdA62ROPh7e4IjySj3ap6RYMU FRWw== X-Received: by 10.66.73.138 with SMTP id l10mr67057204pav.44.1360090567257; Tue, 05 Feb 2013 10:56:07 -0800 (PST) Received: from [192.168.98.146] (static-108-23-87-130.lsanca.fios.verizon.net. [108.23.87.130]) by mx.google.com with ESMTPS id x2sm31375157paw.8.2013.02.05.10.56.05 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 05 Feb 2013 10:56:06 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: tinkering with limits while replicating From: Stephen Bartell In-Reply-To: Date: Tue, 5 Feb 2013 10:56:03 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <6DC7904E-33AA-413B-813C-E236E08C3C71@gmail.com> References: <097C0DAA-FF04-4EC7-905C-22310DA33EC1@gmail.com> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org Nathan,=20 I dropped the pool size down to 500 and still the same story. I also = tried lower the number of replicator processes down to 1 per replicator. = Again same thing. =20 All the while, I keep an eye on how much memory beam.smp consumes during = one replication "wave" and it never exceeds 2%. So Im reluctant to = think that the os is running out of memory. It does seem like theres = some sort of process contention however. The error code that the = replicators are reporting while trying to POST is 503. I assume that = this is for the web server being unavailable. Yes Im going to add filtering on top of this, and I think Im going to = need to do those in eel, although Id like to try first to avoid it. This is probably a dumb question, do I need to restart couch after = changes with these settings? On Feb 5, 2013, at 10:22 AM, Nathan Vander Wilt = wrote: > Hi Stephen, >=20 > I've been doing some tests related to replication lately too = (continuous+filtered in my case). I suspect the reason Futon hangs is = because your whole VM is running out of RAM due to your very high = os_process_limit. I went in to a bit more detail in = http://mail-archives.apache.org/mod_mbox/couchdb-dev/201302.mbox/%3c70278F= 4A-FD08-4818-89B7-EA1B0AF846F5@gmail.com%3e but this setting basically = determines the size of the couchjs worker pool =97 you'd probably rather = have a bit of contention for the pool at a reasonable size (maybe ~100 = per GB free, tops?) than start paging. >=20 > hth, > -natevw >=20 >=20 >=20 > On Feb 4, 2013, at 5:15 PM, Stephen Bartell wrote: >=20 >> Hi all, >>=20 >> I'm hitting some limits while replicating , I'm hoping someone could = advise. =20 >> Im running this in a VM on my macbook with the following allocated = resources: >> ubuntu 11.04 >> 4 cores @ 2.3ghz >> 8 gb mem >>=20 >> I'm doing a one-to-many replication. =20 >> 1) I create one db named test.=20 >> 2) Then create [test_0 .. test_99] databases. =20 >> 3) I then set up replications from test -> [test_0 .. test_99]. 100 = replications total. >> 4) I finally go to test and create a doc, hit save. >>=20 >> When I hit save, futon becomes completely unresponsive for around = 10sec. It eventually returns to normal behavior. >>=20 >> Tailing the couchdb log I find waves of the following errors: >> [Tue, 05 Feb 2013 00:46:26 GMT] [info] [<0.6936.1>] Retrying POST = request to http://admin:*****@localhost:5984/test_25/_revs_diff in 1.0 = seconds due to error {code,503} >>=20 >> I see that the replicator is finding the server to be unresponsive. = The waves of these messages show that replicator retries in 0.25 sec, = then 0.5 sec, then 1sec, then 2sec. This is expected. Everything = settles done after about 4 retries. =20 >>=20 >> So my first thought is resource limits. I threw the book at it and = set : >> 1) max_dbs_open: 500 >> 2) os_process_limit: 5000 >> 3) http_connections: 20000 >> 4) ulimit -Sn 4096 (the hard limit is 4096) >>=20 >> I really don't know whats reasonable for these values relative to how = many replications I am setting up. So these values, save max_dbs_open, = are all stabs in the dark. >>=20 >> No change in performance. >>=20 >> So, I'm at a loss now. what can I do to get all this to work? Or = what am I doing wrong? And note that this is only a test. I aim to = quadruple the amount of replications and have lots and lots of = insertions on the so called "test" database. Actually, there will be = several of these one-to-many databases. >>=20 >> I've heard people get systems up to thousands of dbs and replicators = running just fine. So I hope Im just not offering to right sacrifices = up to couchdb yet. >>=20 >> Thanks for any insight, >>=20 >> sb >>=20 >=20