From user-return-23409-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Tue Feb 5 19:06:11 2013 Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AAF71E5A3 for ; Tue, 5 Feb 2013 19:06:11 +0000 (UTC) Received: (qmail 17083 invoked by uid 500); 5 Feb 2013 19:06:09 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 17045 invoked by uid 500); 5 Feb 2013 19:06:09 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 17037 invoked by uid 99); 5 Feb 2013 19:06:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 19:06:09 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of snbartell@gmail.com designates 209.85.210.47 as permitted sender) Received: from [209.85.210.47] (HELO mail-da0-f47.google.com) (209.85.210.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 19:06:01 +0000 Received: by mail-da0-f47.google.com with SMTP id s35so189519dak.20 for ; Tue, 05 Feb 2013 11:05:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=DN++eONrERTGF0YPGtD+/5Wa8xPlFEURFCRkfl1hTwQ=; b=l7+qCNPHQTWw29PyBelEM65KPp+sh6kupwq/Pv+03VnRH4yEF2RrdcbkpeOwDlr/5I GaguvWIk25eGCQsifOn1fasjovJrILJKahUkZXxzrJbWwEIDWn01K4a5Bui/8onZl+Wl Lku9EIOpO3JHugmGyQ7gm7T7hgHnlRBgVezOWVGAVeO9BBk5hWaiqaLNqHLj925MLQKi mtcUES+UE5Axqs1pRUT5H0LM6K/ivGfE3KBX+6YXgxd46EYHv8mnZTfr/LMUuc4AeZbF scMpiCOlnaRQjY+MYs3Res2xA5f7Op3y7RBIU9Y4eFMxCTSfUGHZl8bJ86WbZfOXG4wv WZgA== X-Received: by 10.66.85.101 with SMTP id g5mr67217404paz.17.1360091140433; Tue, 05 Feb 2013 11:05:40 -0800 (PST) Received: from [192.168.98.146] (static-108-23-87-130.lsanca.fios.verizon.net. [108.23.87.130]) by mx.google.com with ESMTPS id y9sm31435062paw.1.2013.02.05.11.05.38 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 05 Feb 2013 11:05:39 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: tinkering with limits while replicating From: Stephen Bartell In-Reply-To: Date: Tue, 5 Feb 2013 11:05:36 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <2FE6C5A5-2F59-49C3-B15D-F7AAB2FEC6BB@gmail.com> References: <097C0DAA-FF04-4EC7-905C-22310DA33EC1@gmail.com> <6DC7904E-33AA-413B-813C-E236E08C3C71@gmail.com> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org On Feb 5, 2013, at 11:02 AM, Robert Newson wrote: > If you change settings via http requests to _config, no, but if you > just changed the .ini file on disk, yes. It's best to use PUT/GET to > _config/section/key imo. thanks. I've been making these config changes via request to _config. >=20 > B. >=20 > On 5 February 2013 18:56, Stephen Bartell wrote: >> Nathan, >>=20 >> I dropped the pool size down to 500 and still the same story. I also = tried lower the number of replicator processes down to 1 per replicator. = Again same thing. >>=20 >> All the while, I keep an eye on how much memory beam.smp consumes = during one replication "wave" and it never exceeds 2%. So Im reluctant = to think that the os is running out of memory. It does seem like theres = some sort of process contention however. The error code that the = replicators are reporting while trying to POST is 503. I assume that = this is for the web server being unavailable. >>=20 >> Yes Im going to add filtering on top of this, and I think Im going to = need to do those in eel, although Id like to try first to avoid it. >>=20 >> This is probably a dumb question, do I need to restart couch after = changes with these settings? >>=20 >>=20 >> On Feb 5, 2013, at 10:22 AM, Nathan Vander Wilt = wrote: >>=20 >>> Hi Stephen, >>>=20 >>> I've been doing some tests related to replication lately too = (continuous+filtered in my case). I suspect the reason Futon hangs is = because your whole VM is running out of RAM due to your very high = os_process_limit. I went in to a bit more detail in = http://mail-archives.apache.org/mod_mbox/couchdb-dev/201302.mbox/%3c70278F= 4A-FD08-4818-89B7-EA1B0AF846F5@gmail.com%3e but this setting basically = determines the size of the couchjs worker pool =97 you'd probably rather = have a bit of contention for the pool at a reasonable size (maybe ~100 = per GB free, tops?) than start paging. >>>=20 >>> hth, >>> -natevw >>>=20 >>>=20 >>>=20 >>> On Feb 4, 2013, at 5:15 PM, Stephen Bartell wrote: >>>=20 >>>> Hi all, >>>>=20 >>>> I'm hitting some limits while replicating , I'm hoping someone = could advise. >>>> Im running this in a VM on my macbook with the following allocated = resources: >>>> ubuntu 11.04 >>>> 4 cores @ 2.3ghz >>>> 8 gb mem >>>>=20 >>>> I'm doing a one-to-many replication. >>>> 1) I create one db named test. >>>> 2) Then create [test_0 .. test_99] databases. >>>> 3) I then set up replications from test -> [test_0 .. test_99]. = 100 replications total. >>>> 4) I finally go to test and create a doc, hit save. >>>>=20 >>>> When I hit save, futon becomes completely unresponsive for around = 10sec. It eventually returns to normal behavior. >>>>=20 >>>> Tailing the couchdb log I find waves of the following errors: >>>> [Tue, 05 Feb 2013 00:46:26 GMT] [info] [<0.6936.1>] Retrying POST = request to http://admin:*****@localhost:5984/test_25/_revs_diff in 1.0 = seconds due to error {code,503} >>>>=20 >>>> I see that the replicator is finding the server to be unresponsive. = The waves of these messages show that replicator retries in 0.25 sec, = then 0.5 sec, then 1sec, then 2sec. This is expected. Everything = settles done after about 4 retries. >>>>=20 >>>> So my first thought is resource limits. I threw the book at it and = set : >>>> 1) max_dbs_open: 500 >>>> 2) os_process_limit: 5000 >>>> 3) http_connections: 20000 >>>> 4) ulimit -Sn 4096 (the hard limit is 4096) >>>>=20 >>>> I really don't know whats reasonable for these values relative to = how many replications I am setting up. So these values, save = max_dbs_open, are all stabs in the dark. >>>>=20 >>>> No change in performance. >>>>=20 >>>> So, I'm at a loss now. what can I do to get all this to work? Or = what am I doing wrong? And note that this is only a test. I aim to = quadruple the amount of replications and have lots and lots of = insertions on the so called "test" database. Actually, there will be = several of these one-to-many databases. >>>>=20 >>>> I've heard people get systems up to thousands of dbs and = replicators running just fine. So I hope Im just not offering to right = sacrifices up to couchdb yet. >>>>=20 >>>> Thanks for any insight, >>>>=20 >>>> sb >>>>=20 >>>=20 >>=20