Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 17950 invoked from network); 30 Jan 2009 15:11:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Jan 2009 15:11:25 -0000 Received: (qmail 99173 invoked by uid 500); 30 Jan 2009 15:11:22 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 99143 invoked by uid 500); 30 Jan 2009 15:11:22 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 99132 invoked by uid 99); 30 Jan 2009 15:11:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jan 2009 07:11:22 -0800 X-ASF-Spam-Status: No, hits=-2.5 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Dirk-willem.Van.gulik@bbc.co.uk designates 132.185.240.143 as permitted sender) Received: from [132.185.240.143] (HELO mailgw3.thls.bbc.co.uk) (132.185.240.143) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jan 2009 15:11:15 +0000 Received: from bbcxm1005.national.core.bbc.co.uk ([10.161.130.188]) by mailgw3.thls.bbc.co.uk (8.13.8/8.13.7) with ESMTP id n0UFAchc015506 for ; Fri, 30 Jan 2009 15:10:38 GMT Received: from beeb.leiden.webweaving.org ([10.152.8.189]) by bbcxm1005.national.core.bbc.co.uk over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Fri, 30 Jan 2009 15:10:53 +0000 Message-ID: <4983187C.8090404@bbc.co.uk> Date: Fri, 30 Jan 2009 16:10:52 +0100 From: Dirk-Willem van Gulik User-Agent: Thunderbird 2.0.0.19 (Macintosh/20081209) MIME-Version: 1.0 To: user@couchdb.apache.org Subject: User, Erlang, Couchdb or kernel error ? References: <1233326313.49999.ezmlm@couchdb.apache.org> In-Reply-To: <1233326313.49999.ezmlm@couchdb.apache.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 30 Jan 2009 15:10:53.0264 (UTC) FILETIME=[F1844500:01C982EC] X-Virus-Checked: Checked by ClamAV on apache.org Folks, When blasting a CouchDB install (0.9.0a, r738928) with lots of requests (see script[1], basically 2-8 writers, 8-32 readers) I see (regardless of R:W ratio or anything) the following behavior: Running version 0.9.0a-incubating on db test_6204 #what num count ok ops/sec reader 1 1000 0% 4000 ops/sec reader 2 1000 0% 4000 ops/sec reader 3 1000 0% 3200 ops/sec reader 4 1000 0% 2667 ops/sec writer 1 1000 100% 640 ops/sec reader 10 1000 0% 1231 ops/sec reader 4 2000 0% 1600 ops/sec ... lots more...[4] Connection error: 500 Can't connect to localhost:5984 (connect: Cannot assign requested address) at /usr/lib/perl5/site_perl/5.8.8/CouchDB/ Client/Doc.pm line 85 At this point every client gets a 'Cannot assign requested address'. And the server is then down for some 20-30 seconds [2] before resuming. An 'lsof' shows that the socket is still in LISTEN. The server will recover by itself after some 30 seconds. Nothing in the couchDB log (debug, info or error log level)[3]. The issue happens on MacOSX (9.6.0) and Linux/Centos 2.6.18-92.1.17.el5) and I needed a dual core, etc machine to actually have the request hammer fast enough to cause this. On a laptop (or when copious debugging or 'info' level logging output slows the IO down to < 800 ops/second) one never hits this stage. SAS disks are easier than SATA disks. < 20% CPU load during the test; disk/io is totally maxed out when you either 1) the dataset exceeds usual buffers or 2) do any sync. Note that this is a single instance on a single spindle shared with the OS in each case. Traffic is up to few Gbits. Nothing in /var/log/messages or dmesg. Any hints as to wether this is a user error (me beeing stupid), a coucdb error or I need to start to dive into the kernel or erlang[5] ? Note that the behaviour on Linux and MacOS-X is identical. Note that various versions of /trunk seem to exhibit this. Any advice ? Or shall I file a bug ? Thanks, Dw. 1: http://people.apache.org/~dirkx/p.pl 2: With the command: perl ~/p.pl ; /usr/sbin/lsof | grep couchdb | grep TCP while ! curl http://localhost:5984/; do date; sleep 1; done one gets the output: .. all childs exiting.. beam.smp 5534 couchdb ... TCP localhost.localdomain:5984 (LISTEN) curl: (7) Failed to connect to 127.0.0.1: Cannot assign requested address Fri Jan 30 14:34:13 GMT 2009 ... Fri Jan 30 14:34:38 GMT 2009 $ 3: tail end: [info] [<0.3294.1>] 127.0.0.1 - - 'GET' /test_7986/06455 404 [info] [<0.3255.1>] 127.0.0.1 - - 'PUT' /test_7986/2797 201 [info] [<0.3270.1>] 127.0.0.1 - - 'PUT' /test_7986/31176 201 [info] [<0.3285.1>] 127.0.0.1 - - 'PUT' /test_7986/1870 201 [info] [<0.3298.1>] 127.0.0.1 - - 'PUT' /test_7986/0989 201 .. server very silent... [debug] [<0.3304.1>] 'GET' / {1,1} the 'first curl get' of above getting through. 4: Ignore the 'ok' field - that is ok. 5: Linux Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async-threads:0] [hipe] [kernel-poll:false] MacOSX Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0] [kernel-poll:false] http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.