Taking a wild stab, are you running out of ports on the client side?
The 20-30 second wait would be when all your sockets are in TIME_WAIT
mode before closing?
You might check and see if your cilent lib is doing Keep-Alive or not.
HTH,
Paul Davis
On Fri, Jan 30, 2009 at 10:10 AM, Dirk-Willem van Gulik
<Dirk-Willem.van.Gulik@bbc.co.uk> wrote:
> Folks,
>
> When blasting a CouchDB install (0.9.0a, r738928) with lots of requests
> (see script[1], basically 2-8 writers, 8-32 readers) I see (regardless of
> R:W ratio or anything) the following behavior:
>
> Running version 0.9.0a-incubating on db test_6204
> #what num count ok ops/sec
> reader 1 1000 0% 4000 ops/sec
> reader 2 1000 0% 4000 ops/sec
> reader 3 1000 0% 3200 ops/sec
> reader 4 1000 0% 2667 ops/sec
> writer 1 1000 100% 640 ops/sec
> reader 10 1000 0% 1231 ops/sec
> reader 4 2000 0% 1600 ops/sec
> ... lots more...[4]
>
> Connection error: 500 Can't connect to localhost:5984 (connect:
> Cannot assign requested address) at
> /usr/lib/perl5/site_perl/5.8.8/CouchDB/
> Client/Doc.pm line 85
>
> At this point every client gets a 'Cannot assign requested address'.
>
> And the server is then down for some 20-30 seconds [2] before resuming.
>
> An 'lsof' shows that the socket is still in LISTEN.
>
>
> The server will recover by itself after some 30 seconds. Nothing in the
> couchDB log (debug, info or error log level)[3].
>
> The issue happens on MacOSX (9.6.0) and Linux/Centos 2.6.18-92.1.17.el5) and
> I needed a dual core, etc machine to actually have the request hammer fast
> enough to cause this. On a laptop (or when copious debugging or 'info' level
> logging output slows the IO down to < 800 ops/second) one never hits this
> stage. SAS disks are easier than SATA disks.
>
> < 20% CPU load during the test; disk/io is totally maxed out when you either
> 1) the dataset exceeds usual buffers or 2) do any sync. Note that this is a
> single instance on a single spindle shared with the OS in each case. Traffic
> is up to few Gbits.
>
> Nothing in /var/log/messages or dmesg.
>
> Any hints as to wether this is a user error (me beeing stupid), a coucdb
> error or I need to start to dive into the kernel or erlang[5] ?
>
> Note that the behaviour on Linux and MacOS-X is identical. Note that various
> versions of /trunk seem to exhibit this.
>
> Any advice ? Or shall I file a bug ?
>
> Thanks,
>
> Dw.
>
> 1: http://people.apache.org/~dirkx/p.pl
>
> 2: With the command:
>
> perl ~/p.pl ; /usr/sbin/lsof | grep couchdb | grep TCP
> while ! curl http://localhost:5984/; do date; sleep 1; done
> one gets the output:
> .. all childs exiting..
> beam.smp 5534 couchdb ...
> TCP localhost.localdomain:5984 (LISTEN)
> curl: (7) Failed to connect to 127.0.0.1:
> Cannot assign requested address
> Fri Jan 30 14:34:13 GMT 2009
> ...
> Fri Jan 30 14:34:38 GMT 2009
> $
>
> 3: tail end:
> [info] [<0.3294.1>] 127.0.0.1 - - 'GET' /test_7986/06455 404
> [info] [<0.3255.1>] 127.0.0.1 - - 'PUT' /test_7986/2797 201
> [info] [<0.3270.1>] 127.0.0.1 - - 'PUT' /test_7986/31176 201
> [info] [<0.3285.1>] 127.0.0.1 - - 'PUT' /test_7986/1870 201
> [info] [<0.3298.1>] 127.0.0.1 - - 'PUT' /test_7986/0989 201
> .. server very silent...
> [debug] [<0.3304.1>] 'GET' / {1,1}
> the 'first curl get' of above getting through.
>
> 4: Ignore the 'ok' field - that is ok.
>
> 5: Linux
> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit]
> [smp:8] [async-threads:0] [hipe] [kernel-poll:false]
> MacOSX
> Erlang (BEAM) emulator version 5.6.3 [source]
> [async-threads:0] [kernel-poll:false]
>
>
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance
> on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
>
|